Optimize your Delta Tables & ETLs with Change Data Feed (CDF) in Databricks
<p>After explaining what <a href="https://medium.com/@matthewsalminen/real-time-data-processing-with-delta-live-tables-use-cases-and-best-practices-for-databricks-2009a9a6fc16" rel="noopener">Delta Live Tables</a> are and then going in depth on how we can record data source changes of those tables with <a href="https://medium.com/@matthewsalminen/handling-real-time-insights-of-delta-live-tables-with-change-data-capture-in-databricks-42f4394611a" rel="noopener">Change Data Capture</a> (CDC), there is yet another useful feature for your Delta Tables called <strong>Change Data Feed or CDF</strong>. This feature will record changes in your data at the row-level while also optimizing your ETL pipelines performance.</p>
<p>But before I explain Change Data Feed, for those of you reading my articles for the first time, let me provide a brief summary of what Delta Tables, Delta Live Tables, and Change Data Capture are:</p>
<blockquote>
<p>W<strong>hat are Delta Tables in Databricks?</strong></p>
</blockquote>
<p>Remember that all things <em>Delta</em> in Databricks refers to the storage layer of the Delta Lake, with the capabilities of handling real-time and batch big data. A Delta Table is the default data table structure used within data lakes for data ingestion via streaming or batches. A general way of creating a DT in databricks is provided below. Please note, you do not have to import and initialize your spark session as Databricks already includes this but adding for reference:</p>
<p><a href="https://matthewsalminen.medium.com/optimize-your-delta-tables-etls-with-change-data-feed-cdf-in-databricks-0ded9b01cf22"><strong>Learn More</strong></a></p>