How to Read and Write Streaming Data using Pyspark

<p>Spark is being integrated with the cloud data platform in the modern data world. Manipulating data with Spark became curial to any data persona like data engineers, data scientists, and data analysts.</p> <p>Last time, we covered a trivial exercise in big data on reading and writing&nbsp;<strong>static</strong>&nbsp;data on Spark. The previous blog on reading and writing static data can be found&nbsp;<a href="https://medium.com/@yoloshe302/pyspark-tutorial-read-and-write-data-with-pyspark-7826b95f29f9" rel="noopener"><strong><em>here</em></strong></a>. In this article, we will cover a similar topic about using Pyspark to read and write&nbsp;<strong>streaming</strong>&nbsp;data using&nbsp;<a href="https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html" rel="noopener ugc nofollow" target="_blank">Spark Structured Streaming</a>&nbsp;through&nbsp;<code>readStream</code>&nbsp;and&nbsp;<code>writeStream</code>.</p> <p>In this article, we will learn:</p> <ul> <li>how to read the stream data using Pyspark</li> <li>how to sink the stream data using Pyspark</li> <li>examples on reading/writing the streaming data using Pyspark on Databricks</li> </ul> <h2>Basic Concepts on Streaming data</h2> <p><strong>Streaming data</strong>&nbsp;is data that is&nbsp;<strong>continuously</strong>&nbsp;generated by different sources, and such data should be processed incrementally using&nbsp;<a href="https://en.wikipedia.org/wiki/Stream_processing" rel="noopener ugc nofollow" target="_blank">stream processing</a>&nbsp;techniques without having access to all of the data.</p> <p><a href="https://medium.com/@yoloshe302/pyspark-tutorial-read-and-write-streaming-data-401ed3d860e7"><strong>Read More</strong></a></p>
Tags: Data PySpark