Databricks Autoloader Cookbook ??? Part 1

In this article, we are going to discuss the following topics:

  1. How Autoloader handles empty files and file names starting with an underscore
  2. When to use the compression codec in Autoloader and what are the best practices for compressed files and various file formats
  3. modifiedAfter and modifiedBefore in Autoloader
  4. partitionColumns
  5. overWrites
  6. ignoreMissingFiles
  7. pathGlobFilter
  8. Moving Autoloader Job from one place to another Workspace to Another

1. How Autoloader handles empty files and file names starting with an underscore

In Databricks, when data is streamed using an autoloader, it should be made sure that the file names must not begin with an underscore ’_’, Otherwise, files will be ignored by the autoloader.

This can be explained with an example. Initially, three CSV files are kept in a directory in Azure Data Lake Storage, a non-empty CSV (sample1.csv), an empty CSV (empty_file.csv) and a non-empty CSV file whose name starts with an underscore ‘_’ (_sample2.csv). This can be verified using the dbutils.fs.ls.

Visit Now