Tag: DataFrame

It’s Time to Say GoodBye to pd.read_csv() and pd.to_csv()

Input-output operations with Pandas to a CSV are serialized, making them incredibly inefficient and time-consuming. It's frustrating when I see ample scope for parallelization here, but unfortunately, Pandas does not provide this functionality (yet). Although I am never in favor of creating CSVs...

3 Pandas Functions for DataFrame Merging

It's common in the data work to have multiple datasets from the data source or as the result of data analysis. Sometimes, we want to merge two or more different datasets for various reasons. For example: We want to integrate data from multiple data sources into one dataset for deeper an...

3 Pandas Functions for DataFrame Merging

It's common in the data work to have multiple datasets from the data source or as the result of data analysis. Sometimes, we want to merge two or more different datasets for various reasons. For example: We want to integrate data from multiple data sources into one dataset for deeper an...

Creating a PySpark DataFrame with Timestamp Column for a Given Range of Dates: Two Methods

This article explains two ways one can write a PySpark DataFrame with timestamp column for a given range of time. A) Plain way Here are the steps to create a PySpark DataFrame with a timestamp column using the range of dates: Import libraries: from pyspark.sql import SparkSession ...