Writing PySpark logs in Apache Spark and Databricks

<p>The closer your data product is getting to the production, the bigger is the importance of properly collecting and analysing logs. Logs help both during debugging in-depth issues and analysing the behaviour of your application.</p> <p>For general Python applications the classical choice would be to use the built-in&nbsp;<a href="https://docs.python.org/3/library/logging.html" rel="noopener ugc nofollow" target="_blank">logging</a>&nbsp;library which has all the necessary components and provides very convenient interfaces for both configuring and working with the logs.</p> <p>For PySpark applications, the logging configuration is a little bit more intricate, but still very controllable &mdash; it&rsquo;s just done in a slightly different way, contrary to the classical Python logging.</p> <p>In this blogpost I would like to describe approach to effectively create and manage log setup in PySpark applications, both in local environment and on the Databricks clusters.</p> <p><a href="https://polarpersonal.medium.com/writing-pyspark-logs-in-apache-spark-and-databricks-8590c28d1d51"><strong>Click Here</strong></a></p>