Databricks Autoloader Cookbook — Part 1

<h1>In this article, we are going to discuss the following topics:</h1> <ol> <li>How Autoloader handles empty files and file names starting with an underscore</li> <li>When<strong>&nbsp;</strong>to use the compression codec in Autoloader and what are the best practices for compressed files and various file formats</li> <li>modifiedAfter and modifiedBefore in Autoloader</li> <li>partitionColumns</li> <li>overWrites</li> <li>ignoreMissingFiles</li> <li>pathGlobFilter</li> <li>Moving Autoloader Job from one place to another Workspace to Another</li> </ol> <p><strong>1. How Autoloader handles empty files and file names starting with an underscore</strong></p> <p>In Databricks, when data is streamed using an autoloader, it should be made sure that the file names must not begin with an underscore &rsquo;_&rsquo;, Otherwise, files will be ignored by the autoloader.</p> <p>This can be explained with an example. Initially, three CSV files are kept in a directory in Azure Data Lake Storage, a non-empty CSV (sample1.csv), an empty CSV (empty_file.csv) and a non-empty CSV file whose name starts with an underscore &lsquo;_&rsquo; (_sample2.csv). This can be verified using the dbutils.fs.ls.</p> <p><a href="https://medium.com/@rahuljax26/autoloader-cookbook-part-1-d8b658268345"><strong>Visit Now</strong></a></p>