Deduplicating software bugs with Machine Learning at Google

I’ve been working at Google for about three and a half years. Back in 2021, we were facing a little problem. A team we were working with had too many duplicate bugs. How did they get in this situation? Well, Google runs automated tests. Thousands of them. Millions of them. Some run automatically every time there’s code changes in a developer workspace, some when engineers send a code review, some after they checkin, some as code reaches a deployment stage, some on a cadence, on demand or triggered by an event. Because we run so many tests, we don’t always have the human cycles to analyze the results and manually log bugs, so we’ve created a number of bug auto-filers. An auto-filer analyzes the results of a test run, and logs a bug automatically. Some of our test execution frameworks offer auto-filers as a feature. Bug auto-filers are great in that they reduce a lot of toil and ensure failures are not ignored. But eventually a human does need to be involved and this is where the process can bottleneck. Somebody needs to look at the bugs and decide whether they are a real problem or not. In other words, we created bots to save human toil, that generated human toil at the backend! Sometimes, a single root cause can trigger many tests to fail: if the auto-filer doesn’t know that this is the same actual cause, each test failure ends in its own filed bug, therefore having duplicated bugs. <a href="https://carloarg02.medium.com/deduplicating-software-bugs-with-machine-learning-at-google-857b5d3036ef">Visit Now</a>