Add Seaborn integration for enhanced statistical visualization techniques in scientific plots. Include quick start guides, common plot types, and best practices for publication-ready figures, emphasizing colorblind-safe palettes and automatic confidence intervals.

This commit is contained in:
Vinayak Agarwal
2025-10-21 00:16:38 -07:00
parent 02dae0f30c
commit 6e8d1c29b9

View File

@@ -80,6 +80,35 @@ fig, ax = plt.subplots()
# ... your plotting code ... # ... your plotting code ...
``` ```
### Quick Start with Seaborn
For statistical plots, use seaborn with publication styling:
```python
import seaborn as sns
import matplotlib.pyplot as plt
from style_presets import apply_publication_style
# Apply publication style
apply_publication_style('default')
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
sns.set_palette('colorblind')
# Create statistical comparison figure
fig, ax = plt.subplots(figsize=(3.5, 3))
sns.boxplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'], palette='Set2', ax=ax)
sns.stripplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'],
color='black', alpha=0.3, size=3, ax=ax)
ax.set_ylabel('Response (μM)')
sns.despine()
# Save figure
from figure_export import save_publication_figure
save_publication_figure(fig, 'treatment_comparison', formats=['pdf', 'png'], dpi=300)
```
## Core Principles and Best Practices ## Core Principles and Best Practices
### 1. Resolution and File Format ### 1. Resolution and File Format
@@ -204,6 +233,18 @@ See `references/matplotlib_examples.md` Example 1 for complete code.
6. Remove unnecessary spines 6. Remove unnecessary spines
7. Save in vector format 7. Save in vector format
**Using seaborn for automatic confidence intervals:**
```python
import seaborn as sns
fig, ax = plt.subplots(figsize=(5, 3))
sns.lineplot(data=timeseries, x='time', y='measurement',
hue='treatment', errorbar=('ci', 95),
markers=True, ax=ax)
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Measurement (AU)')
sns.despine()
```
### Task 2: Create a Multi-Panel Figure ### Task 2: Create a Multi-Panel Figure
See `references/matplotlib_examples.md` Example 2 for complete code. See `references/matplotlib_examples.md` Example 2 for complete code.
@@ -226,6 +267,17 @@ See `references/matplotlib_examples.md` Example 4 for complete code.
4. Set appropriate center value for diverging maps 4. Set appropriate center value for diverging maps
5. Test appearance in grayscale 5. Test appearance in grayscale
**Using seaborn for correlation matrices:**
```python
import seaborn as sns
fig, ax = plt.subplots(figsize=(5, 4))
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
cmap='RdBu_r', center=0, square=True,
linewidths=1, cbar_kws={'shrink': 0.8}, ax=ax)
```
### Task 4: Prepare Figure for Specific Journal ### Task 4: Prepare Figure for Specific Journal
**Workflow:** **Workflow:**
@@ -306,17 +358,296 @@ ax.text(1.5, max_y * 1.1, '***', ha='center', fontsize=8)
- See `references/matplotlib_examples.md` for extensive examples - See `references/matplotlib_examples.md` for extensive examples
### Seaborn ### Seaborn
- Built on matplotlib, inherits all matplotlib customizations
- Good for statistical plots Seaborn provides a high-level, dataset-oriented interface for statistical graphics, built on matplotlib. It excels at creating publication-quality statistical visualizations with minimal code while maintaining full compatibility with matplotlib customization.
- Apply matplotlib styles first, then use seaborn
**Key advantages for scientific visualization:**
- Automatic statistical estimation and confidence intervals
- Built-in support for multi-panel figures (faceting)
- Colorblind-friendly palettes by default
- Dataset-oriented API using pandas DataFrames
- Semantic mapping of variables to visual properties
#### Quick Start with Publication Style
Always apply matplotlib publication styles first, then configure seaborn:
```python ```python
import seaborn as sns import seaborn as sns
import matplotlib.pyplot as plt
from style_presets import apply_publication_style from style_presets import apply_publication_style
# Apply publication style
apply_publication_style('default') apply_publication_style('default')
sns.set_palette(['#E69F00', '#56B4E9', '#009E73']) # Okabe-Ito colors
# Configure seaborn for publication
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
sns.set_palette('colorblind') # Use colorblind-safe palette
# Create figure
fig, ax = plt.subplots(figsize=(3.5, 2.5))
sns.scatterplot(data=df, x='time', y='response',
hue='treatment', style='condition', ax=ax)
sns.despine() # Remove top and right spines
``` ```
#### Common Plot Types for Publications
**Statistical comparisons:**
```python
# Box plot with individual points for transparency
fig, ax = plt.subplots(figsize=(3.5, 3))
sns.boxplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'], palette='Set2', ax=ax)
sns.stripplot(data=df, x='treatment', y='response',
order=['Control', 'Low', 'High'],
color='black', alpha=0.3, size=3, ax=ax)
ax.set_ylabel('Response (μM)')
sns.despine()
```
**Distribution analysis:**
```python
# Violin plot with split comparison
fig, ax = plt.subplots(figsize=(4, 3))
sns.violinplot(data=df, x='timepoint', y='expression',
hue='treatment', split=True, inner='quartile', ax=ax)
ax.set_ylabel('Gene Expression (AU)')
sns.despine()
```
**Correlation matrices:**
```python
# Heatmap with proper colormap and annotations
fig, ax = plt.subplots(figsize=(5, 4))
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool)) # Show only lower triangle
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
cmap='RdBu_r', center=0, square=True,
linewidths=1, cbar_kws={'shrink': 0.8}, ax=ax)
plt.tight_layout()
```
**Time series with confidence bands:**
```python
# Line plot with automatic CI calculation
fig, ax = plt.subplots(figsize=(5, 3))
sns.lineplot(data=timeseries, x='time', y='measurement',
hue='treatment', style='replicate',
errorbar=('ci', 95), markers=True, dashes=False, ax=ax)
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Measurement (AU)')
sns.despine()
```
#### Multi-Panel Figures with Seaborn
**Using FacetGrid for automatic faceting:**
```python
# Create faceted plot
g = sns.relplot(data=df, x='dose', y='response',
hue='treatment', col='cell_line', row='timepoint',
kind='line', height=2.5, aspect=1.2,
errorbar=('ci', 95), markers=True)
g.set_axis_labels('Dose (μM)', 'Response (AU)')
g.set_titles('{row_name} | {col_name}')
sns.despine()
# Save with correct DPI
from figure_export import save_publication_figure
save_publication_figure(g.figure, 'figure_facets',
formats=['pdf', 'png'], dpi=300)
```
**Combining seaborn with matplotlib subplots:**
```python
# Create custom multi-panel layout
fig, axes = plt.subplots(2, 2, figsize=(7, 6))
# Panel A: Scatter with regression
sns.regplot(data=df, x='predictor', y='response', ax=axes[0, 0])
axes[0, 0].text(-0.15, 1.05, 'A', transform=axes[0, 0].transAxes,
fontsize=10, fontweight='bold')
# Panel B: Distribution comparison
sns.violinplot(data=df, x='group', y='value', ax=axes[0, 1])
axes[0, 1].text(-0.15, 1.05, 'B', transform=axes[0, 1].transAxes,
fontsize=10, fontweight='bold')
# Panel C: Heatmap
sns.heatmap(correlation_data, cmap='viridis', ax=axes[1, 0])
axes[1, 0].text(-0.15, 1.05, 'C', transform=axes[1, 0].transAxes,
fontsize=10, fontweight='bold')
# Panel D: Time series
sns.lineplot(data=timeseries, x='time', y='signal',
hue='condition', ax=axes[1, 1])
axes[1, 1].text(-0.15, 1.05, 'D', transform=axes[1, 1].transAxes,
fontsize=10, fontweight='bold')
plt.tight_layout()
sns.despine()
```
#### Color Palettes for Publications
Seaborn includes several colorblind-safe palettes:
```python
# Use built-in colorblind palette (recommended)
sns.set_palette('colorblind')
# Or specify custom colorblind-safe colors (Okabe-Ito)
okabe_ito = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
'#0072B2', '#D55E00', '#CC79A7', '#000000']
sns.set_palette(okabe_ito)
# For heatmaps and continuous data
sns.heatmap(data, cmap='viridis') # Perceptually uniform
sns.heatmap(corr, cmap='RdBu_r', center=0) # Diverging, centered
```
#### Choosing Between Axes-Level and Figure-Level Functions
**Axes-level functions** (e.g., `scatterplot`, `boxplot`, `heatmap`):
- Use when building custom multi-panel layouts
- Accept `ax=` parameter for precise placement
- Better integration with matplotlib subplots
- More control over figure composition
```python
fig, ax = plt.subplots(figsize=(3.5, 2.5))
sns.scatterplot(data=df, x='x', y='y', hue='group', ax=ax)
```
**Figure-level functions** (e.g., `relplot`, `catplot`, `displot`):
- Use for automatic faceting by categorical variables
- Create complete figures with consistent styling
- Great for exploratory analysis
- Use `height` and `aspect` for sizing
```python
g = sns.relplot(data=df, x='x', y='y', col='category', kind='scatter')
```
#### Statistical Rigor with Seaborn
Seaborn automatically computes and displays uncertainty:
```python
# Line plot: shows mean ± 95% CI by default
sns.lineplot(data=df, x='time', y='value', hue='treatment',
errorbar=('ci', 95)) # Can change to 'sd', 'se', etc.
# Bar plot: shows mean with bootstrapped CI
sns.barplot(data=df, x='treatment', y='response',
errorbar=('ci', 95), capsize=0.1)
# Always specify error type in figure caption:
# "Error bars represent 95% confidence intervals"
```
#### Best Practices for Publication-Ready Seaborn Figures
1. **Always set publication theme first:**
```python
sns.set_theme(style='ticks', context='paper', font_scale=1.1)
```
2. **Use colorblind-safe palettes:**
```python
sns.set_palette('colorblind')
```
3. **Remove unnecessary elements:**
```python
sns.despine() # Remove top and right spines
```
4. **Control figure size appropriately:**
```python
# Axes-level: use matplotlib figsize
fig, ax = plt.subplots(figsize=(3.5, 2.5))
# Figure-level: use height and aspect
g = sns.relplot(..., height=3, aspect=1.2)
```
5. **Show individual data points when possible:**
```python
sns.boxplot(...) # Summary statistics
sns.stripplot(..., alpha=0.3) # Individual points
```
6. **Include proper labels with units:**
```python
ax.set_xlabel('Time (hours)')
ax.set_ylabel('Expression (AU)')
```
7. **Export at correct resolution:**
```python
from figure_export import save_publication_figure
save_publication_figure(fig, 'figure_name',
formats=['pdf', 'png'], dpi=300)
```
#### Advanced Seaborn Techniques
**Pairwise relationships for exploratory analysis:**
```python
# Quick overview of all relationships
g = sns.pairplot(data=df, hue='condition',
vars=['gene1', 'gene2', 'gene3'],
corner=True, diag_kind='kde', height=2)
```
**Hierarchical clustering heatmap:**
```python
# Cluster samples and features
g = sns.clustermap(expression_data, method='ward',
metric='euclidean', z_score=0,
cmap='RdBu_r', center=0,
figsize=(10, 8),
row_colors=condition_colors,
cbar_kws={'label': 'Z-score'})
```
**Joint distributions with marginals:**
```python
# Bivariate distribution with context
g = sns.jointplot(data=df, x='gene1', y='gene2',
hue='treatment', kind='scatter',
height=6, ratio=4, marginal_kws={'kde': True})
```
#### Common Seaborn Issues and Solutions
**Issue: Legend outside plot area**
```python
g = sns.relplot(...)
g._legend.set_bbox_to_anchor((0.9, 0.5))
```
**Issue: Overlapping labels**
```python
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
```
**Issue: Text too small at final size**
```python
sns.set_context('paper', font_scale=1.2) # Increase if needed
```
#### Additional Resources
For more detailed seaborn information, see:
- `scientific-packages/seaborn/SKILL.md` - Comprehensive seaborn documentation
- `scientific-packages/seaborn/references/examples.md` - Practical use cases
- `scientific-packages/seaborn/references/function_reference.md` - Complete API reference
- `scientific-packages/seaborn/references/objects_interface.md` - Modern declarative API
### Plotly ### Plotly
- Interactive figures for exploration - Interactive figures for exploration
- Export static images for publication - Export static images for publication