Python’s Pandas library is a fundamental tool for data scientists, offering powerful data manipulation and analysis capabilities. In this article, we’ll explore 15 advanced Pandas code snippets that every data scientist should have in their toolkit. These snippets will help you streamline your data analysis tasks and extract valuable insights from your datasets.
1. Filtering Data
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)
2. Grouping and Aggregating Data
# Grouping by a column and calculating the mean
grouped = df.groupby('Age').mean()
print(grouped)
3. Handling Missing Data
# Check for missing values missing_values = df.isnull().sum() # Fill missing values with a specific value df['Age'].fillna(0, inplace=True)
4. Applying Functions to Columns
# Applying a custom function to a column df['Age'] = df['Age'].apply(lambda x: x * 2)
5. Concatenating DataFrames
# Concatenate two DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
result = pd.concat([df1, df2], ignore_index=True)
print(result)
6. Merging DataFrames
# Merge two DataFrames
left = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
right = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
merged = pd.merge(left, right, on='key', how='inner')
print(merged)
7. Pivot Tables
# Creating a pivot table pivot_table = df.pivot_table(index='Name', columns='Age', values='Value') print(pivot_table)
8. Handling DateTime Data
# Converting a column to DateTime df['Date'] = pd.to_datetime(df['Date'])
9. Reshaping Data
# Melting a DataFrame melted_df = pd.melt(df, id_vars=['Name'], value_vars=['A', 'B']) print(melted_df)
10. Working with Categorical Data
# Encoding categorical variables
df['Category'] = df['Category'].astype('category')
df['Category'] = df['Category'].cat.codes