Why Data Quality Is Harder than Code Quality

<p>As a data engineer, I always feel less confident about the quality of data I handle than the quality of code I write. Code, at least, I can run it interactively and write tests before deploying to production. Data, I most often have to wait for it to flow through the system and be used to encounter data quality issues. And it&rsquo;s not only the errors that are raised. It&rsquo;s also this feeling that there are more unknown data quality issues than code bugs waiting to be discovered. But, is data quality a more complex problem to solve than code quality?</p> <p>Code quality is the process of ensuring code meets expectations. Likewise,&nbsp;<strong>data quality is the process of ensuring data meets expectations</strong>. In this article, I want to abstract away from expectations (known as&nbsp;<a href="https://www.metaplane.dev/blog/data-quality-metrics-for-data-warehouses" rel="noopener ugc nofollow" target="_blank">data quality dimensions</a>&nbsp;when talking about data) as these will likely be different based on your usage. Instead, I discuss the different steps to handle data quality issues:&nbsp;<strong>detecting, understanding, fixing, and reducing quality issues</strong>. Then I argue why I find each one of these steps harder to implement when applied to data than when applied to code.</p> <p>&nbsp;</p>
Tags: Code Data Harder