Databricks Autoloader Cookbook — Part 1
<h1>In this article, we are going to discuss the following topics:</h1>
<ol>
<li>How Autoloader handles empty files and file names starting with an underscore</li>
<li>When<strong> </strong>to use the compression codec in Autoloader and what are the best practices for compressed files and various file formats</li>
<li>modifiedAfter and modifiedBefore in Autoloader</li>
<li>partitionColumns</li>
<li>overWrites</li>
<li>ignoreMissingFiles</li>
<li>pathGlobFilter</li>
<li>Moving Autoloader Job from one place to another Workspace to Another</li>
</ol>
<p><strong>1. How Autoloader handles empty files and file names starting with an underscore</strong></p>
<p>In Databricks, when data is streamed using an autoloader, it should be made sure that the file names must not begin with an underscore ’_’, Otherwise, files will be ignored by the autoloader.</p>
<p>This can be explained with an example. Initially, three CSV files are kept in a directory in Azure Data Lake Storage, a non-empty CSV (sample1.csv), an empty CSV (empty_file.csv) and a non-empty CSV file whose name starts with an underscore ‘_’ (_sample2.csv). This can be verified using the dbutils.fs.ls.</p>
<p><a href="https://medium.com/@rahuljax26/autoloader-cookbook-part-1-d8b658268345"><strong>Visit Now</strong></a></p>