Pandas is designed to simplify data manipulation and analysis in Python. It provides a comprehensive set of functions and methods for cleaning, transforming, analyzing, and visualizing data.
import pandas as pd
# Creating a sample DataFrame
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
# Selecting a single column
df['A']
# Selecting multiple columns
df[['A', 'B']]
# Selecting rows by index
df.iloc[0] # First row
df.iloc[0:2] # First two rows
# Selecting rows by label
df.loc[0] # Row with index 0
df.loc[0:2] # Rows with indices 0 to 2
# Filtering rows based on a condition
df[df['A'] > 2]
# Checking for missing values
df.isnull()
# Dropping rows with missing values
df.dropna()
# Filling missing values
df.fillna(value=0)
# Renaming columns
df.rename(columns={'A': 'Alpha', 'B': 'Beta'})
# Adding a new column
df['D'] = df['A'] + df['B']
# Modifying an existing column
df['A'] = df['A'] * 2
# Applying a function to a column
df['A'] = df['A'].apply(lambda x: x * 2)
# Sorting by a single column
df.sort_values(by='A')
# Sorting by multiple columns
df.sort_values(by=['A', 'B'])
# Merging DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key')
# Joining DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['a', 'b'])
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]}, index=['a', 'b'])
joined_df = df1.join(df2)
Pandas provides a range of methods to calculate descriptive statistics:
# Calculating summary statistics
df.describe()
# Calculating individual statistics
df.mean()
df.median()
df.std()
df.var()
df.min()
df.max()
df.sum()
df.cumsum() # Cumulative sum
# Calculating pairwise correlation
df.corr()
# Calculating pairwise covariance
df.cov()
Pivot tables are used to summarize data:
# Creating a pivot table
pivot_table = df.pivot_table(values='D', index='A', columns='B', aggfunc='mean')