How to read a delta table’s .snappy.parquet file in databricks
<p>In Databricks, learn how to read .snappy.parquet files of your delta tables.</p>
<h1>TLDR</h1>
<ol>
<li>Copy the .snappy.parquet file you want to read from the table’s location to a different directory in your storage</li>
<li>Verify that the “_delta_log” folder for that table does not exist in the copied path where the Parquet file is located.</li>
<li>read the .snappy.parquet file by running <code>spark.read.parquet()</code> command.</li>
</ol>
<h1>Detail</h1>
<p>Allow me to provide a concise overview of the reasons for reading a Delta table’s Snappy Parquet file, how to do so, and what to avoid when doing so.</p>
<h2>Common reasons to directly read a .snappy.parquet file</h2>
<ol>
<li>To forcibly access a prior version of the Delta table, which might not be accessible via <code>select * from table version as of X </code>command.</li>
<li>To restore data from older Parquet files in case there is an unreturnable data issue.</li>
<li>To reverse engineer and recreate the source data by analyzing and reconstructing the operations from Parquet files that were used to write to the Delta table.</li>
</ol>
<h2>To avoid</h2>
<p>Using read_parquet method in pandas. This is the most common solution I used to find while searching for a solution in the internet</p>
<p><a href="https://medium.com/@ishanpradhan/how-to-read-a-snappy-parquet-file-in-databricks-696538cd0efc"><strong>Visit Now</strong></a></p>