Liquid Clustering: First Impressions
<h1>Current challenge</h1>
<p>When designing your lakehouse tables, defining the partition strategy can be challenging.</p>
<p>The general rules for <code><strong>partitioning</strong></code> and <code><strong>ZORDER</strong></code> columns are known [1], but, not infrequently, data requirements, growth, and usage change over time.</p>
<p>That can present a challenge to the previously defined and fixed data layout, making workloads inefficient.</p>
<h1>Presented solution</h1>
<p>Databricks has announced a new feature for Delta Lake 3.0 called <strong>Liquid Clustering </strong>[2].</p>
<p>This new data management technique can adapt the data layout based on changing patterns, making table design and management easier [2]. It also clusters new data incrementally [3].</p>
<p>In practice, it is only necessary to select as clustering keys the columns that will be queried more often.</p>
<p>Besides the configuration benefits (both initial and ongoing), it is also stated that there is a <strong>2.5x faster ingestion time </strong>with a <code><strong>1 TB</strong></code> table.</p>
<p><a href="https://medium.com/closer-consulting/liquid-clustering-first-impressions-113e2517b251"><strong>Read More</strong></a></p>