Liquid Clustering with Databricks Delta Lake
<p>Databricks unveiled <em>Liquid Clustering</em> at this year’s Data + AI Summit, a new approach aimed at improving both read and write performance through a dynamic data layout.</p>
<h1><strong>Recap:</strong> <strong>Partitioning and Z-Ordering</strong></h1>
<p>Both partitioning and z-ordering rely on data layout to perform data processing optimizations. They are complementary since they operate on different levels, and apply to different types of columns.</p>
<p><strong><em>Partition on most queried, low-cardinality columns</em></strong>.</p>
<ul>
<li>Do not partition tables that contains less than 1TB of data.</li>
<li>Rule of thumb: All partitions to contain at least 1GB of data.</li>
</ul>
<p><strong><em>Z-order on most queried, high-cardinality column</em>s</strong>.</p>
<ul>
<li>Use Z-order indexes alongside partitions to speed up queries on large datasets.</li>
<li>Z-order clustering only occurs within a partition, and cannot be applied to fields used for partitioning.</li>
</ul>
<p><a href="https://medium.com/@tsiciliani/liquid-clustering-with-databricks-delta-lake-57dc251d7870"><strong>Read More</strong></a></p>