How we use hermetic, ephemeral test environments at Google to reduce flakiness

A couple of weeks ago, I published “When sharks chew on network cables — The complexity behind eradicating test flakiness.” In that article, I explored the common sources of flakiness in Integration Testing. There’s a surprising amount of complexity in the domain, like unreliable dependencies, unreliable network connections, and test instantiation and test cleanup problems with system state. This is Part 2 to that. I’m in the org that owns the Developer Infrastructure for Integration Testing at Google, so this is very dear to my heart. We run millions of integration tests every single day, so this flakiness presents itself often simply by virtue of scale. Throughout the years, we have invested in a significant amount of engineering productivity infrastructure to reduce, mitigate, and in some cases eradicate some of these problems. One of those interesting pieces of infra is ephemeral, hermetic test environments. Googlers love Three Letter Acronyms, so rather than saying “test environment” all the time, we use the term “SUT” (which stands for “System Under Test”). By this I mean: an instance of the server or pipeline that contains the changes you’re intending to test, so that you can run your integration tests against it. “Hermetic” and “Ephemeral” are pretty fancy words, so let’s dissect them, and see what properties each one brings to the table. <a href="https://carloarg02.medium.com/how-we-use-hermetic-ephemeral-test-environments-at-google-to-reduce-test-flakiness-a87be42b37aa">Read More</a>