How to read a delta table’s .snappy.parquet file in databricks

<p>In Databricks, learn how to read .snappy.parquet files of your delta tables.</p> <h1>TLDR</h1> <ol> <li>Copy the .snappy.parquet file you want to read from the table&rsquo;s location to a different directory in your storage</li> <li>Verify that the &ldquo;_delta_log&rdquo; folder for that table does not exist in the copied path where the Parquet file is located.</li> <li>read the .snappy.parquet file by running&nbsp;<code>spark.read.parquet()</code>&nbsp;command.</li> </ol> <h1>Detail</h1> <p>Allow me to provide a concise overview of the reasons for reading a Delta table&rsquo;s Snappy Parquet file, how to do so, and what to avoid when doing so.</p> <h2>Common reasons to directly read a .snappy.parquet file</h2> <ol> <li>To forcibly access a prior version of the Delta table, which might not be accessible via&nbsp;<code>select * from table version as of X&nbsp;</code>command.</li> <li>To restore data from older Parquet files in case there is an unreturnable data issue.</li> <li>To reverse engineer and recreate the source data by analyzing and reconstructing the operations from Parquet files that were used to write to the Delta table.</li> </ol> <h2>To avoid</h2> <p>Using read_parquet method in pandas. This is the most common solution I used to find while searching for a solution in the internet</p> <p><a href="https://medium.com/@ishanpradhan/how-to-read-a-snappy-parquet-file-in-databricks-696538cd0efc"><strong>Visit Now</strong></a></p>
Tags: Databricks