Optimize your Delta Tables & ETLs with Change Data Feed (CDF) in Databricks

After explaining what <a href="https://medium.com/@matthewsalminen/real-time-data-processing-with-delta-live-tables-use-cases-and-best-practices-for-databricks-2009a9a6fc16" rel="noopener">Delta Live Tables</a> are and then going in depth on how we can record data source changes of those tables with <a href="https://medium.com/@matthewsalminen/handling-real-time-insights-of-delta-live-tables-with-change-data-capture-in-databricks-42f4394611a" rel="noopener">Change Data Capture</a> (CDC), there is yet another useful feature for your Delta Tables called Change Data Feed or CDF. This feature will record changes in your data at the row-level while also optimizing your ETL pipelines performance. But before I explain Change Data Feed, for those of you reading my articles for the first time, let me provide a brief summary of what Delta Tables, Delta Live Tables, and Change Data Capture are: <blockquote> What are Delta Tables in Databricks? </blockquote> Remember that all things Delta in Databricks refers to the storage layer of the Delta Lake, with the capabilities of handling real-time and batch big data. A Delta Table is the default data table structure used within data lakes for data ingestion via streaming or batches. A general way of creating a DT in databricks is provided below. Please note, you do not have to import and initialize your spark session as Databricks already includes this but adding for reference: <a href="https://matthewsalminen.medium.com/optimize-your-delta-tables-etls-with-change-data-feed-cdf-in-databricks-0ded9b01cf22">Learn More</a>