Writing PySpark logs in Apache Spark and Databricks
<p>The closer your data product is getting to the production, the bigger is the importance of properly collecting and analysing logs. Logs help both during debugging in-depth issues and analysing the behaviour of your application.</p>
<p>For general Python applications the classical choice would be to use the built-in <a href="https://docs.python.org/3/library/logging.html" rel="noopener ugc nofollow" target="_blank">logging</a> library which has all the necessary components and provides very convenient interfaces for both configuring and working with the logs.</p>
<p>For PySpark applications, the logging configuration is a little bit more intricate, but still very controllable — it’s just done in a slightly different way, contrary to the classical Python logging.</p>
<p>In this blogpost I would like to describe approach to effectively create and manage log setup in PySpark applications, both in local environment and on the Databricks clusters.</p>
<p><a href="https://polarpersonal.medium.com/writing-pyspark-logs-in-apache-spark-and-databricks-8590c28d1d51"><strong>Click Here</strong></a></p>