Liquid Clustering: First Impressions

<h1>Current challenge</h1> When designing your lakehouse tables, defining the partition strategy can be challenging. The general rules for <code>partitioning</code> and <code>ZORDER</code> columns are known [1], but, not infrequently, data requirements, growth, and usage change over time. That can present a challenge to the previously defined and fixed data layout, making workloads inefficient. <h1>Presented solution</h1> Databricks has announced a new feature for Delta Lake 3.0 called Liquid Clustering [2]. This new data management technique can adapt the data layout based on changing patterns, making table design and management easier [2]. It also clusters new data incrementally [3]. In practice, it is only necessary to select as clustering keys the columns that will be queried more often. Besides the configuration benefits (both initial and ongoing), it is also stated that there is a 2.5x faster ingestion time with a <code>1 TB</code> table. <a href="https://medium.com/closer-consulting/liquid-clustering-first-impressions-113e2517b251">Read More</a>