This content originally appeared on DEV Community and was authored by Eshan Roy (eshanized)
Hey folks
It’s been a while since my last post here — but I’m back!
Today I’m sharing a one-stop Pandas cheat sheet + explanations that I’ve been using in my own projects. Whether you’re importing, cleaning, analyzing, or exporting data, this guide will save you time and clicks.
1. Data Import — Get data into Pandas
pd.read_csv('file.csv') # Load CSV
pd.read_excel('file.xlsx', sheet_name='Sheet1') # Load Excel
pd.read_sql(query, connection) # Run SQL query
pd.read_json('file.json') # Load JSON
pd.read_parquet('file.parquet') # Load Parquet
Why: Bring data from almost anywhere into your DataFrame.
2. Data Selection — Access what you need
df['column'] # Single column
df.loc['row', 'col'] # By label
df.iloc[0:5, 0:2] # By position
df.query('col > 5') # SQL-like filter
df[df['col'].isin(['A','B'])] # Multiple matches
Why: Quickly filter and grab specific rows or columns.
3. Data Manipulation — Shape your data
df.groupby('col').agg({'col2': ['mean','sum']}) # Group stats
df.merge(df2, on='key', how='left') # Join data
df.pivot_table(values='val', index='idx') # Pivot table
df.sort_values(['col1','col2'], ascending=[1,0]) # Sort
df.melt(id_vars=['id'], value_vars=['A','B']) # Unpivot
df.apply(lambda x: x**2) # Apply func
Why: Transform your dataset to match your analysis goals.
4. Statistics — Quick insights
df.describe() # Summary stats
df['col'].agg(['mean','median','std']) # Key metrics
df['col'].value_counts(normalize=True) # Value %s
df.corr(method='pearson') # Correlation
df.cov() # Covariance
df.quantile([0.25, 0.5, 0.75]) # Quartiles
Why: Get instant understanding of your data distribution.
5. Data Cleaning — Make it usable
df.dropna(subset=['col'], how='any') # Remove NaNs
df.fillna(method='ffill') # Fill forward
df.drop_duplicates(subset=['col']) # Deduplicate
df['col'].replace({'old': 'new'}) # Replace values
df['col'].astype('category') # Convert type
df.interpolate(method='linear') # Fill by trend
Why: Reliable data means reliable analysis.
6. Time Series — Work with dates
df.resample('M').mean() # Monthly avg
df.rolling(window=7).mean() # Rolling avg
df.shift(periods=1) # Shift data
pd.date_range('2024', periods=12, freq='M') # Date range
df.asfreq('D', method='ffill') # Daily freq
df['date1'].dt.strftime('%Y-%m-%d') # Format date
Why: Time-based data often needs resampling or shifting.
7. String Operations — Text wrangling
df['col'].str.contains('pattern') # Match
df['col'].str.extract('(\d+)') # Extract nums
df['col'].str.split('_', 1).str[0] # Split text
df['col'].str.lower() # Lowercase
df['col'].str.strip() # Trim spaces
df['col'].str.replace(r'\s+', ' ') # Normalize spaces
Why: Clean messy text fields directly in Pandas.
8. Advanced Features — Power moves
df.pipe(func) # Chain funcs
pd.eval('df1 + df2') # Fast eval
df.memory_usage(deep=True) # Memory check
df.select_dtypes(include=['number']) # Filter types
df.nlargest(5, 'col') # Top values
df.explode('col') # Expand lists
Why: Write cleaner, faster, and more efficient code.
9. Data Export — Save your work
df.to_csv('output.csv', index=False) # CSV
df.to_excel('output.xlsx', sheet_name='Sheet1') # Excel
df.to_parquet('output.parquet') # Parquet
df.to_json('output.json', orient='records') # JSON
Why: Share results in the right format instantly.
Pro Tips
Use
.copy()
to avoid accidental changes.
Chain methods for cleaner code.
Use
dtype='category'
to save memory.
Avoid
inplace=True
— reassign instead.
Question for you: Which Pandas trick here do you use the most? Or do you have a favorite that’s not listed? Let’s share and make this an even better cheat sheet.
This content originally appeared on DEV Community and was authored by Eshan Roy (eshanized)