Deduplicating software bugs with Machine Learning at Google
<p>I’ve been working at Google for about three and a half years. Back in 2021, we were facing a little problem. A team we were working with had too many duplicate bugs.</p>
<p>How did they get in this situation?</p>
<p>Well, Google runs automated tests. Thousands of them. Millions of them. Some run automatically every time there’s code changes in a developer workspace, some when engineers send a code review, some after they checkin, some as code reaches a deployment stage, some on a cadence, on demand or triggered by an event. Because we run so many tests, we don’t always have the human cycles to analyze the results and manually log bugs, so we’ve created a number of <strong>bug auto-filers.</strong> An auto-filer analyzes the results of a test run, and logs a bug automatically. Some of our test execution frameworks offer auto-filers as a feature.</p>
<p>Bug auto-filers are great in that they reduce a lot of toil and ensure failures are not ignored. But eventually a human does need to be involved and this is where the process can bottleneck. Somebody needs to look at the bugs and decide whether they are a real problem or not. In other words, we created bots to save human toil, that generated human toil at the backend!</p>
<p>Sometimes, a <strong><em>single</em></strong> root cause can trigger <strong><em>many</em></strong> tests to fail: if the auto-filer doesn’t know that this is the same actual cause, each test failure ends in its own filed bug, therefore having <em>duplicated bugs</em>.</p>
<p><a href="https://carloarg02.medium.com/deduplicating-software-bugs-with-machine-learning-at-google-857b5d3036ef">Website</a></p>