Liquid Clustering with Databricks Delta Lake

<p>Databricks unveiled&nbsp;<em>Liquid Clustering</em>&nbsp;at this year&rsquo;s Data + AI Summit, a new approach aimed at improving both read and write performance through a dynamic data layout.</p> <h1><strong>Recap:</strong>&nbsp;<strong>Partitioning and Z-Ordering</strong></h1> <p>Both partitioning and z-ordering rely on data layout to perform data processing optimizations. They are complementary since they operate on different levels, and apply to different types of columns.</p> <p><strong><em>Partition on most queried, low-cardinality columns</em></strong>.</p> <ul> <li>Do not partition tables that contains less than 1TB of data.</li> <li>Rule of thumb: All partitions to contain at least 1GB of data.</li> </ul> <p><strong><em>Z-order on most queried, high-cardinality column</em>s</strong>.</p> <ul> <li>Use Z-order indexes alongside partitions to speed up queries on large datasets.</li> <li>Z-order clustering only occurs within a partition, and cannot be applied to fields used for partitioning.</li> </ul> <p><a href="https://medium.com/@tsiciliani/liquid-clustering-with-databricks-delta-lake-57dc251d7870"><strong>Read More</strong></a></p>