Liquid Clustering: First Impressions

<h1>Current challenge</h1> <p>When designing your lakehouse tables, defining the partition strategy can be challenging.</p> <p>The general rules for&nbsp;<code><strong>partitioning</strong></code>&nbsp;and&nbsp;<code><strong>ZORDER</strong></code>&nbsp;columns are known [1], but, not infrequently, data requirements, growth, and usage change over time.</p> <p>That can present a challenge to the previously defined and fixed data layout, making workloads inefficient.</p> <h1>Presented solution</h1> <p>Databricks has announced a new feature for Delta Lake 3.0 called&nbsp;<strong>Liquid Clustering&nbsp;</strong>[2].</p> <p>This new data management technique can adapt the data layout based on changing patterns, making table design and management easier [2]. It also clusters new data incrementally [3].</p> <p>In practice, it is only necessary to select as clustering keys the columns that will be queried more often.</p> <p>Besides the configuration benefits (both initial and ongoing), it is also stated that there is a&nbsp;<strong>2.5x faster ingestion time&nbsp;</strong>with a&nbsp;<code><strong>1 TB</strong></code>&nbsp;table.</p> <p><a href=""><strong>Read More</strong></a></p>