JSON in Databricks and PySpark
<p>In the simple case, JSON is easy to handle within Databricks. You can read a file of JSON objects directly into a DataFrame or table, and Databricks knows how to parse the JSON into individual fields. But, as with most things software-related, there are wrinkles and variations. This article shows how to handle the most common situations and includes detailed coding examples.</p>
<p>My use-case was HL7 healthcare data that had been translated to JSON, but the methods here apply to any JSON data. The three formats considered are:</p>
<ul>
<li>A text file containing complete JSON objects, one per line. This is typical when you are loading JSON files to Databricks tables.</li>
<li>A text file containing various fields (columns) of data, one of which is a JSON object. This is often seen in computer logs, where there is some plain-text meta-data followed by more detail in a JSON string.</li>
<li>A variation of the above where the JSON field is an array of objects.</li>
</ul>
<p>Getting each of these types of input into Databricks requires different techniques.</p>
<p><a href="https://towardsdatascience.com/json-in-databricks-and-pyspark-26437352f0e9"><strong>Read More</strong></a></p>