diff --git a/scientific-thinking/exploratory-data-analysis/SKILL.md b/scientific-thinking/exploratory-data-analysis/SKILL.md
index c26ff15..950d851 100644
--- a/scientific-thinking/exploratory-data-analysis/SKILL.md
+++ b/scientific-thinking/exploratory-data-analysis/SKILL.md
@@ -1,275 +1,202 @@
 ---
 name: exploratory-data-analysis
-description: "EDA toolkit. Analyze CSV/Excel/JSON/Parquet files, statistical summaries, distributions, correlations, outliers, missing data, visualizations, markdown reports, for data profiling and insights."
+description: "Analyze datasets to discover patterns, anomalies, and relationships. Use when exploring data files, generating statistical summaries, checking data quality, or creating visualizations. Supports CSV, Excel, JSON, Parquet, and more."
 ---
 
 # Exploratory Data Analysis
 
-## Overview
+Discover patterns, anomalies, and relationships in tabular data through statistical analysis and visualization.
 
-EDA is a process for discovering patterns, anomalies, and relationships in data. Analyze CSV/Excel/JSON/Parquet files to generate statistical summaries, distributions, correlations, outliers, and visualizations. All outputs are markdown-formatted for integration into workflows.
+**Supported formats**: CSV, Excel (.xlsx, .xls), JSON, Parquet, TSV, Feather, HDF5, Pickle
 
-## When to Use This Skill
-
-This skill should be used when:
-- User provides a data file and requests analysis or exploration
-- User asks to "explore this dataset", "analyze this data", or "what's in this file?"
-- User needs statistical summaries, distributions, or correlations
-- User requests data visualizations or insights
-- User wants to understand data quality issues or patterns
-- User mentions EDA, exploratory analysis, or data profiling
-
-**Supported file formats**: CSV, Excel (.xlsx, .xls), JSON, Parquet, TSV, Feather, HDF5, Pickle
-
-## Quick Start Workflow
-
-1. **Receive data file** from user
-2. **Run comprehensive analysis** using `scripts/eda_analyzer.py`
-3. **Generate visualizations** using `scripts/visualizer.py`
-4. **Create markdown report** using insights and the `assets/report_template.md` template
-5. **Present findings** to user with key insights highlighted
-
-## Core Capabilities
-
-### 1. Comprehensive Data Analysis
-
-Execute full statistical analysis using the `eda_analyzer.py` script:
+## Standard Workflow
 
+1. Run statistical analysis:
 ```bash
-python scripts/eda_analyzer.py <data_file_path> -o <output_directory>
+python scripts/eda_analyzer.py <data_file> -o <output_dir>
 ```
 
-**What it provides**:
-- Auto-detection and loading of file formats
-- Basic dataset information (shape, types, memory usage)
-- Missing data analysis (patterns, percentages)
-- Summary statistics for numeric and categorical variables
-- Outlier detection using IQR and Z-score methods
-- Distribution analysis with normality tests (Shapiro-Wilk, Anderson-Darling)
-- Correlation analysis (Pearson and Spearman)
-- Data quality assessment (completeness, duplicates, issues)
-- Automated insight generation
-
-**Output**: JSON file containing all analysis results at `<output_directory>/eda_analysis.json`
-
-### 2. Comprehensive Visualizations
-
-Generate complete visualization suite using the `visualizer.py` script:
-
+2. Generate visualizations:
 ```bash
-python scripts/visualizer.py <data_file_path> -o <output_directory>
+python scripts/visualizer.py <data_file> -o <output_dir>
 ```
 
-**Generated visualizations**:
-- **Missing data patterns**: Heatmap and bar chart showing missing data
-- **Distribution plots**: Histograms with KDE overlays for all numeric variables
-- **Box plots with violin plots**: Outlier detection visualizations
-- **Correlation heatmap**: Both Pearson and Spearman correlation matrices
-- **Scatter matrix**: Pairwise relationships between numeric variables
-- **Categorical analysis**: Bar charts for top categories
-- **Time series plots**: Temporal trends with trend lines (if datetime columns exist)
+3. Read analysis results from `<output_dir>/eda_analysis.json`
 
-**Output**: High-quality PNG files saved to `<output_directory>/eda_visualizations/`
+4. Create report using `assets/report_template.md` structure
 
-All visualizations are production-ready with:
-- 300 DPI resolution
-- Clear titles and labels
-- Statistical annotations
-- Professional styling using seaborn
+5. Present findings with key insights and visualizations
 
-### 3. Automated Insight Generation
+## Analysis Capabilities
 
-The analyzer automatically generates actionable insights including:
+### Statistical Analysis
 
-- **Data scale insights**: Dataset size considerations for processing
-- **Missing data alerts**: Warnings when missing data exceeds thresholds
-- **Correlation discoveries**: Strong relationships identified for feature engineering
-- **Outlier warnings**: Variables with high outlier rates flagged
-- **Distribution assessments**: Skewness issues requiring transformations
-- **Duplicate alerts**: Duplicate row detection
-- **Imbalance warnings**: Categorical variable imbalance detection
+Run `scripts/eda_analyzer.py` to generate comprehensive analysis:
 
-Access insights from the analysis results JSON under the `"insights"` key.
+```bash
+python scripts/eda_analyzer.py sales_data.csv -o ./output
+```
 
-### 4. Statistical Interpretation
+Produces `output/eda_analysis.json` containing:
+- Dataset shape, types, memory usage
+- Missing data patterns and percentages
+- Summary statistics (numeric and categorical)
+- Outlier detection (IQR and Z-score methods)
+- Distribution analysis with normality tests
+- Correlation matrices (Pearson and Spearman)
+- Data quality metrics (completeness, duplicates)
+- Automated insights
 
-For detailed interpretation of statistical tests and measures, reference:
+### Visualizations
 
-**`references/statistical_tests_guide.md`** - Comprehensive guide covering:
+Run `scripts/visualizer.py` to generate plots:
+
+```bash
+python scripts/visualizer.py sales_data.csv -o ./output
+```
+
+Creates high-resolution (300 DPI) PNG files in `output/eda_visualizations/`:
+- Missing data heatmaps and bar charts
+- Distribution plots (histograms with KDE)
+- Box plots and violin plots for outliers
+- Correlation heatmaps
+- Scatter matrices for numeric relationships
+- Categorical bar charts
+- Time series plots (if datetime columns detected)
+
+### Automated Insights
+
+Access generated insights from the `"insights"` key in the analysis JSON:
+- Dataset size considerations
+- Missing data warnings (when exceeding thresholds)
+- Strong correlations for feature engineering
+- High outlier rate flags
+- Skewness requiring transformations
+- Duplicate detection
+- Categorical imbalance warnings
+
+## Reference Materials
+
+### Statistical Interpretation
+
+See `references/statistical_tests_guide.md` for detailed guidance on:
 - Normality tests (Shapiro-Wilk, Anderson-Darling, Kolmogorov-Smirnov)
 - Distribution characteristics (skewness, kurtosis)
-- Correlation tests (Pearson, Spearman)
-- Outlier detection methods (IQR, Z-score)
-- Hypothesis testing guidelines
-- Data transformation strategies
+- Correlation methods (Pearson, Spearman)
+- Outlier detection (IQR, Z-score)
+- Hypothesis testing and data transformations
 
-Load this reference when needing to interpret specific statistical tests or explain results to users.
+Use when interpreting statistical results or explaining findings.
 
-### 5. Best Practices Guidance
+### Methodology
 
-For methodological guidance, reference:
+See `references/eda_best_practices.md` for comprehensive guidance on:
+- 6-step EDA process framework
+- Univariate, bivariate, multivariate analysis approaches
+- Visualization and statistical analysis guidelines
+- Common pitfalls and domain-specific considerations
+- Communication strategies for different audiences
 
-**`references/eda_best_practices.md`** - Detailed best practices including:
-- EDA process framework (6-step methodology)
-- Univariate, bivariate, and multivariate analysis approaches
-- Visualization guidelines
-- Statistical analysis guidelines
-- Common pitfalls to avoid
-- Domain-specific considerations
-- Communication tips for technical and non-technical audiences
+Use when planning analysis or handling specific scenarios.
 
-Load this reference when planning analysis approach or needing guidance on specific EDA scenarios.
+## Report Template
 
-## Creating Analysis Reports
-
-Use the provided template to structure comprehensive EDA reports:
-
-**`assets/report_template.md`** - Professional report template with sections for:
+Use `assets/report_template.md` to structure findings. Template includes:
 - Executive summary
 - Dataset overview
 - Data quality assessment
 - Univariate, bivariate, and multivariate analysis
 - Outlier analysis
-- Key insights and findings
-- Recommendations
+- Key insights and recommendations
 - Limitations and appendices
 
-**To use the template**:
-1. Copy the template content
-2. Fill in sections with analysis results from JSON output
-3. Embed visualization images using markdown syntax
-4. Populate insights and recommendations
-5. Save as markdown for user consumption
+Fill sections with analysis JSON results and embed visualizations using markdown image syntax.
 
-## Typical Workflow Example
+## Example: Complete Analysis
 
-When user provides a data file:
+User request: "Explore this sales_data.csv file"
 
-```
-User: "Can you explore this sales_data.csv file and tell me what you find?"
+```bash
+# 1. Run analysis
+python scripts/eda_analyzer.py sales_data.csv -o ./output
 
-1. Run analysis:
-   python scripts/eda_analyzer.py sales_data.csv -o ./analysis_output
-
-2. Generate visualizations:
-   python scripts/visualizer.py sales_data.csv -o ./analysis_output
-
-3. Read analysis results:
-   Read ./analysis_output/eda_analysis.json
-
-4. Create markdown report using template:
-   - Copy assets/report_template.md structure
-   - Fill in sections with analysis results
-   - Reference visualizations from ./analysis_output/eda_visualizations/
-   - Include automated insights from JSON
-
-5. Present to user:
-   - Show key insights prominently
-   - Highlight data quality issues
-   - Provide visualizations inline
-   - Make actionable recommendations
-   - Save complete report as .md file
+# 2. Generate visualizations
+python scripts/visualizer.py sales_data.csv -o ./output
 ```
 
-## Advanced Analysis Scenarios
+```python
+# 3. Read results
+import json
+with open('./output/eda_analysis.json') as f:
+    results = json.load(f)
 
-### Large Datasets (>1M rows)
-- Run analysis on sampled data first for quick exploration
-- Note sample size in report
-- Recommend distributed computing for full analysis
+# 4. Build report from assets/report_template.md
+# - Fill sections with results
+# - Embed images: ![Missing Data](./output/eda_visualizations/missing_data.png)
+# - Include insights from results['insights']
+# - Add recommendations
+```
 
-### High-Dimensional Data (>50 columns)
-- Focus on most important variables first
-- Consider PCA or feature selection
-- Generate correlation analysis to identify variable groups
-- Reference `eda_best_practices.md` section on high-dimensional data
+## Special Cases
 
-### Time Series Data
-- Ensure datetime columns are properly detected
-- Time series visualizations will be automatically generated
-- Consider temporal patterns, trends, and seasonality
-- Reference `eda_best_practices.md` section on time series
+### Dataset Size Strategy
 
-### Imbalanced Data
-- Categorical analysis will flag imbalances
-- Report class distributions prominently
-- Recommend stratified sampling if needed
+**If < 100 rows**: Note sample size limitations, use non-parametric methods
 
-### Small Sample Sizes (<100 rows)
-- Non-parametric methods automatically used where appropriate
-- Be conservative in statistical conclusions
-- Note sample size limitations in report
+**If 100-1M rows**: Standard workflow applies
 
-## Output Best Practices
+**If > 1M rows**: Sample first for quick exploration, note sample size in report, recommend distributed computing for full analysis
 
-**Always output as markdown**:
-- Structure findings using markdown headers, tables, and lists
-- Embed visualizations using `![Description](path/to/image.png)` syntax
-- Use tables for statistical summaries
-- Include code blocks for any suggested transformations
-- Highlight key insights with bold or bullet points
+### Data Characteristics
 
-**Ensure reports are actionable**:
-- Provide clear recommendations based on findings
-- Flag data quality issues that need attention
-- Suggest next steps for modeling or further analysis
-- Identify feature engineering opportunities
+**High-dimensional (>50 columns)**: Focus on key variables first, use correlation analysis to identify groups, consider PCA or feature selection. See `references/eda_best_practices.md` for guidance.
 
-**Make insights accessible**:
-- Explain statistical concepts in plain language
-- Use reference guides to provide detailed interpretations
-- Include both technical details and executive summary
+**Time series**: Datetime columns auto-detected, temporal visualizations generated automatically. Consider trends, seasonality, patterns.
+
+**Imbalanced**: Categorical analysis flags imbalances automatically. Report distributions prominently, recommend stratified sampling if needed.
+
+## Output Guidelines
+
+**Format findings as markdown**:
+- Use headers, tables, and lists for structure
+- Embed visualizations: `![Description](path/to/image.png)`
+- Include code blocks for suggested transformations
+- Highlight key insights
+
+**Make reports actionable**:
+- Provide clear recommendations
+- Flag data quality issues requiring attention
+- Suggest next steps (modeling, feature engineering, further analysis)
 - Tailor communication to user's technical level
 
-## Handling Edge Cases
+## Error Handling
 
-**Unsupported file formats**:
-- Request user to convert to supported format
-- Suggest using pandas-compatible formats
+**Unsupported formats**: Request conversion to supported format (CSV, Excel, JSON, Parquet)
 
-**Files too large to load**:
-- Recommend sampling approach
-- Suggest chunked processing
-- Consider alternative tools for big data
+**Files too large**: Recommend sampling or chunked processing
 
-**Corrupted or malformed data**:
-- Report specific errors encountered
-- Suggest data cleaning steps
-- Try to salvage partial analysis if possible
+**Corrupted data**: Report specific errors, suggest cleaning steps, attempt partial analysis
 
-**All missing data in columns**:
-- Flag completely empty columns
-- Recommend removal or investigation
-- Document in data quality section
+**Empty columns**: Flag in data quality section, recommend removal or investigation
 
-## Resources Summary
+## Resources
 
-### scripts/
-- **`eda_analyzer.py`**: Main analysis engine - comprehensive statistical analysis
-- **`visualizer.py`**: Visualization generator - creates all chart types
+**Scripts** (handle all formats automatically):
+- `scripts/eda_analyzer.py` - Statistical analysis engine
+- `scripts/visualizer.py` - Visualization generator
 
-Both scripts are fully executable and handle multiple file formats automatically.
+**References** (load as needed):
+- `references/statistical_tests_guide.md` - Test interpretation and methodology
+- `references/eda_best_practices.md` - EDA process and best practices
 
-### references/
-- **`statistical_tests_guide.md`**: Statistical test interpretation and methodology
-- **`eda_best_practices.md`**: Comprehensive EDA methodology and best practices
+**Template**:
+- `assets/report_template.md` - Professional report structure
 
-Load these references as needed to inform analysis approach and interpretation.
+## Key Points
 
-### assets/
-- **`report_template.md`**: Professional markdown report template
-
-Use this template structure for creating consistent, comprehensive EDA reports.
-
-## Key Reminders
-
-1. **Always generate markdown output** for textual results
-2. **Run both scripts** (analyzer and visualizer) for complete analysis
-3. **Use the template** to structure comprehensive reports
-4. **Include visualizations** by referencing generated PNG files
-5. **Provide actionable insights** - don't just present statistics
-6. **Interpret findings** using reference guides
-7. **Document limitations** and data quality issues
-8. **Make recommendations** for next steps
-
-This skill transforms raw data into actionable insights through systematic exploration, advanced statistics, rich visualizations, and clear communication.
+- Run both scripts for complete analysis
+- Structure reports using the template
+- Provide actionable insights, not just statistics
+- Use reference guides for detailed interpretations
+- Document data quality issues and limitations
+- Make clear recommendations for next steps
diff --git a/scientific-thinking/exploratory-data-analysis/assets/report_template.md b/scientific-thinking/exploratory-data-analysis/assets/report_template.md
index 865316a..32fb0a9 100644
--- a/scientific-thinking/exploratory-data-analysis/assets/report_template.md
+++ b/scientific-thinking/exploratory-data-analysis/assets/report_template.md
@@ -1,14 +1,10 @@
-# Exploratory Data Analysis Report
+# EDA Report: [Dataset Name]
 
-**Dataset**: [Dataset Name]
-**Analysis Date**: [Date]
-**Analyst**: [Name]
-
----
+**Date**: [Date] | **Analyst**: [Name]
 
 ## Executive Summary
 
-[2-3 paragraph summary of key findings, major insights, and recommendations]
+[Concise summary of key findings and recommendations]
 
 **Key Findings**:
 - [Finding 1]
@@ -23,366 +19,197 @@
 
 ## 1. Dataset Overview
 
-### 1.1 Data Source
-- **Source**: [Source name and location]
-- **Collection Period**: [Date range]
-- **Last Updated**: [Date]
-- **Format**: [CSV, Excel, JSON, etc.]
+**Source**: [Source name] | **Format**: [CSV/Excel/JSON/etc.] | **Period**: [Date range]
 
-### 1.2 Data Structure
-- **Observations (Rows)**: [Number]
-- **Variables (Columns)**: [Number]
-- **Memory Usage**: [Size in MB]
+**Structure**: [Rows] observations × [Columns] variables | **Memory**: [Size] MB
 
-### 1.3 Variable Types
-- **Numeric Variables** ([Count]): [List column names]
-- **Categorical Variables** ([Count]): [List column names]
-- **Datetime Variables** ([Count]): [List column names]
-- **Boolean Variables** ([Count]): [List column names]
+**Variable Types**:
+- Numeric ([Count]): [List names]
+- Categorical ([Count]): [List names]
+- Datetime ([Count]): [List names]
+- Boolean ([Count]): [List names]
 
 ---
 
-## 2. Data Quality Assessment
+## 2. Data Quality
 
-### 2.1 Completeness
+**Completeness**: [Percentage]% | **Duplicates**: [Count] ([%]%)
 
-**Overall Data Completeness**: [Percentage]%
+**Missing Data**:
+| Column | Missing % | Assessment |
+|--------|-----------|------------|
+| [Column 1] | [%] | [High/Medium/Low] |
+| [Column 2] | [%] | [High/Medium/Low] |
 
-**Missing Data Summary**:
-| Column | Missing Count | Missing % | Assessment |
-|--------|--------------|-----------|------------|
-| [Column 1] | [Count] | [%] | [High/Medium/Low] |
-| [Column 2] | [Count] | [%] | [High/Medium/Low] |
+![Missing Data](path/to/missing_data.png)
 
-**Missing Data Pattern**: [Description of patterns, if any]
-
-**Visualization**: ![Missing Data](path/to/missing_data.png)
-
-### 2.2 Duplicates
-
-- **Duplicate Rows**: [Count] ([Percentage]%)
-- **Action Required**: [Yes/No - describe if needed]
-
-### 2.3 Data Quality Issues
-
-[List any identified issues]
-- [ ] Issue 1: [Description]
-- [ ] Issue 2: [Description]
-- [ ] Issue 3: [Description]
+**Quality Issues**:
+- [Issue 1]
+- [Issue 2]
 
 ---
 
 ## 3. Univariate Analysis
 
-### 3.1 Numeric Variables
+### Numeric: [Variable Name]
 
-[For each key numeric variable:]
+**Stats**: Mean: [Value] | Median: [Value] | Std: [Value] | Range: [[Min]-[Max]]
 
-#### [Variable Name]
+**Distribution**: Skewness: [Value] | Kurtosis: [Value] | Normality: [Yes/No]
 
-**Summary Statistics**:
-- **Mean**: [Value]
-- **Median**: [Value]
-- **Std Dev**: [Value]
-- **Min**: [Value]
-- **Max**: [Value]
-- **Range**: [Value]
-- **IQR**: [Value]
+**Outliers**: IQR: [Count] ([%]%) | Z-score: [Count] ([%]%)
 
-**Distribution Characteristics**:
-- **Skewness**: [Value] - [Interpretation]
-- **Kurtosis**: [Value] - [Interpretation]
-- **Normality**: [Normal/Not Normal based on tests]
+![Distribution](path/to/distribution.png)
 
-**Outliers**:
-- **IQR Method**: [Count] outliers ([Percentage]%)
-- **Z-Score Method**: [Count] outliers ([Percentage]%)
+**Insights**: [Key observations]
 
-**Visualization**: ![Distribution of [Variable]](path/to/distribution.png)
+### Categorical: [Variable Name]
 
-**Insights**:
-- [Key insight 1]
-- [Key insight 2]
+**Stats**: [Count] unique values | Most common: [Value] ([%]%) | Balance: [Balanced/Imbalanced]
 
----
-
-### 3.2 Categorical Variables
-
-[For each key categorical variable:]
-
-#### [Variable Name]
-
-**Summary**:
-- **Unique Values**: [Count]
-- **Most Common**: [Value] ([Percentage]%)
-- **Least Common**: [Value] ([Percentage]%)
-- **Balance**: [Balanced/Imbalanced]
-
-**Top Categories**:
-| Category | Count | Percentage |
-|----------|-------|------------|
+| Category | Count | % |
+|----------|-------|---|
 | [Cat 1] | [Count] | [%] |
 | [Cat 2] | [Count] | [%] |
-| [Cat 3] | [Count] | [%] |
 
-**Visualization**: ![Distribution of [Variable]](path/to/categorical.png)
+![Distribution](path/to/categorical.png)
 
-**Insights**:
-- [Key insight 1]
-- [Key insight 2]
+**Insights**: [Key observations]
 
----
+### Temporal: [Variable Name]
 
-### 3.3 Temporal Variables
+**Range**: [Start] to [End] ([Duration]) | **Trend**: [Increasing/Decreasing/Stable] | **Seasonality**: [Yes/No]
 
-[If datetime columns exist:]
+![Time Series](path/to/timeseries.png)
 
-#### [Variable Name]
-
-**Time Range**: [Start Date] to [End Date]
-**Duration**: [Time span]
-**Temporal Coverage**: [Complete/Gaps identified]
-
-**Temporal Patterns**:
-- **Trend**: [Increasing/Decreasing/Stable]
-- **Seasonality**: [Yes/No - describe if present]
-- **Gaps**: [List any gaps in timeline]
-
-**Visualization**: ![Time Series of [Variable]](path/to/timeseries.png)
-
-**Insights**:
-- [Key insight 1]
-- [Key insight 2]
+**Insights**: [Key observations]
 
 ---
 
 ## 4. Bivariate Analysis
 
-### 4.1 Correlation Analysis
-
-**Overall Correlation Structure**:
-- **Strong Positive Correlations**: [Count]
-- **Strong Negative Correlations**: [Count]
-- **Weak/No Correlations**: [Count]
-
-**Correlation Matrix**:
+**Correlation Summary**: [Count] strong positive | [Count] strong negative | [Count] weak/none
 
 ![Correlation Heatmap](path/to/correlation_heatmap.png)
 
 **Notable Correlations**:
-| Variable 1 | Variable 2 | Pearson r | Spearman ρ | Strength | Interpretation |
-|-----------|-----------|-----------|------------|----------|----------------|
-| [Var 1] | [Var 2] | [Value] | [Value] | [Strong/Moderate/Weak] | [Interpretation] |
-| [Var 1] | [Var 3] | [Value] | [Value] | [Strong/Moderate/Weak] | [Interpretation] |
+| Var 1 | Var 2 | Pearson | Spearman | Strength |
+|-------|-------|---------|----------|----------|
+| [Var 1] | [Var 2] | [Value] | [Value] | [Strong/Moderate/Weak] |
 
-**Insights**:
-- [Key insight about correlations]
-- [Potential multicollinearity issues]
-- [Feature engineering opportunities]
+**Insights**: [Multicollinearity issues, feature engineering opportunities]
 
----
+### Key Relationship: [Var 1] vs [Var 2]
 
-### 4.2 Key Relationships
+**Type**: [Linear/Non-linear/None] | **r**: [Value] | **p-value**: [Value]
 
-[For important variable pairs:]
+![Scatter Plot](path/to/scatter.png)
 
-#### [Variable 1] vs [Variable 2]
-
-**Relationship Type**: [Linear/Non-linear/None]
-**Correlation**: [Value]
-**Statistical Test**: [Test name, p-value]
-
-**Visualization**: ![Scatter Plot](path/to/scatter.png)
-
-**Insights**:
-- [Description of relationship]
-- [Implications]
+**Insights**: [Description and implications]
 
 ---
 
 ## 5. Multivariate Analysis
 
-### 5.1 Scatter Matrix
-
 ![Scatter Matrix](path/to/scatter_matrix.png)
 
-**Observations**:
-- [Pattern 1]
-- [Pattern 2]
-- [Pattern 3]
+**Patterns**: [Key observations]
 
-### 5.2 Clustering Patterns
-
-[If clustering analysis performed:]
-
-**Method**: [Method used]
-**Number of Clusters**: [Count]
-
-**Cluster Characteristics**:
-- **Cluster 1**: [Description]
-- **Cluster 2**: [Description]
-
-**Visualization**: [Link to visualization]
+**Clustering** (if performed): [Method] | [Count] clusters identified
 
 ---
 
-## 6. Outlier Analysis
+## 6. Outliers
 
-### 6.1 Outlier Summary
+**Overall Rate**: [%]%
 
-**Overall Outlier Rate**: [Percentage]%
+| Variable | Outlier % | Method | Action |
+|----------|-----------|--------|--------|
+| [Var 1] | [%] | [IQR/Z-score] | [Keep/Investigate/Remove] |
+| [Var 2] | [%] | [IQR/Z-score] | [Keep/Investigate/Remove] |
 
-**Variables with High Outlier Rates**:
-| Variable | Outlier Count | Outlier % | Method | Action |
-|----------|--------------|-----------|--------|--------|
-| [Var 1] | [Count] | [%] | [IQR/Z-score] | [Keep/Investigate/Remove] |
-| [Var 2] | [Count] | [%] | [IQR/Z-score] | [Keep/Investigate/Remove] |
+![Box Plots](path/to/boxplots.png)
 
-**Visualization**: ![Box Plots](path/to/boxplots.png)
-
-### 6.2 Outlier Investigation
-
-[For significant outliers:]
-
-#### [Variable Name]
-
-**Outlier Characteristics**:
-- [Description of outliers]
-- [Potential causes]
-- [Validity assessment]
-
-**Recommendation**: [Keep/Remove/Transform/Investigate further]
+**Investigation**: [Description of significant outliers, causes, validity]
 
 ---
 
-## 7. Key Insights and Findings
+## 7. Key Insights
 
-### 7.1 Data Quality Insights
+**Data Quality**:
+- [Insight with implication]
+- [Insight with implication]
 
-1. **[Insight 1]**: [Description and implication]
-2. **[Insight 2]**: [Description and implication]
-3. **[Insight 3]**: [Description and implication]
+**Statistical Patterns**:
+- [Insight with implication]
+- [Insight with implication]
 
-### 7.2 Statistical Insights
+**Domain/Research Insights**:
+- [Insight with implication]
+- [Insight with implication]
 
-1. **[Insight 1]**: [Description and implication]
-2. **[Insight 2]**: [Description and implication]
-3. **[Insight 3]**: [Description and implication]
-
-### 7.3 Business/Research Insights
-
-1. **[Insight 1]**: [Description and implication]
-2. **[Insight 2]**: [Description and implication]
-3. **[Insight 3]**: [Description and implication]
-
-### 7.4 Unexpected Findings
-
-1. **[Finding 1]**: [Description and significance]
-2. **[Finding 2]**: [Description and significance]
+**Unexpected Findings**:
+- [Finding and significance]
 
 ---
 
 ## 8. Recommendations
 
-### 8.1 Data Quality Actions
+**Data Quality Actions**:
+- [ ] [Action - priority]
+- [ ] [Action - priority]
 
-- [ ] **[Action 1]**: [Description and priority]
-- [ ] **[Action 2]**: [Description and priority]
-- [ ] **[Action 3]**: [Description and priority]
+**Next Steps**:
+- [Step with rationale]
+- [Step with rationale]
 
-### 8.2 Analysis Next Steps
+**Feature Engineering**:
+- [Opportunity]
+- [Opportunity]
 
-1. **[Step 1]**: [Description and rationale]
-2. **[Step 2]**: [Description and rationale]
-3. **[Step 3]**: [Description and rationale]
-
-### 8.3 Feature Engineering Opportunities
-
-- **[Opportunity 1]**: [Description]
-- **[Opportunity 2]**: [Description]
-- **[Opportunity 3]**: [Description]
-
-### 8.4 Modeling Considerations
-
-- **[Consideration 1]**: [Description]
-- **[Consideration 2]**: [Description]
-- **[Consideration 3]**: [Description]
+**Modeling Considerations**:
+- [Consideration]
+- [Consideration]
 
 ---
 
-## 9. Limitations and Caveats
+## 9. Limitations
 
-### 9.1 Data Limitations
+**Data**: [Key limitations]
 
-- [Limitation 1]
-- [Limitation 2]
-- [Limitation 3]
+**Analysis**: [Key limitations]
 
-### 9.2 Analysis Limitations
-
-- [Limitation 1]
-- [Limitation 2]
-- [Limitation 3]
-
-### 9.3 Assumptions Made
-
-- [Assumption 1]
-- [Assumption 2]
-- [Assumption 3]
+**Assumptions**: [Key assumptions made]
 
 ---
 
-## 10. Appendices
+## Appendices
 
-### Appendix A: Technical Details
+### A: Technical Details
 
-**Software Environment**:
-- Python: [Version]
-- Key Libraries: pandas ([Version]), numpy ([Version]), scipy ([Version]), matplotlib ([Version])
+**Environment**: Python with pandas, numpy, scipy, matplotlib, seaborn
 
-**Analysis Scripts**: [Link to repository or location]
+**Scripts**: [Repository/location]
 
-### Appendix B: Variable Dictionary
+### B: Variable Dictionary
 
-| Variable Name | Type | Description | Unit | Valid Range | Missing % |
-|--------------|------|-------------|------|-------------|-----------|
+| Variable | Type | Description | Unit | Range | Missing % |
+|----------|------|-------------|------|-------|-----------|
 | [Var 1] | [Type] | [Description] | [Unit] | [Range] | [%] |
-| [Var 2] | [Type] | [Description] | [Unit] | [Range] | [%] |
 
-### Appendix C: Statistical Test Results
+### C: Statistical Tests
 
-[Detailed statistical test outputs]
-
-**Normality Tests**:
+**Normality**:
 | Variable | Test | Statistic | p-value | Result |
 |----------|------|-----------|---------|--------|
 | [Var 1] | Shapiro-Wilk | [Value] | [Value] | [Normal/Non-normal] |
 
-**Correlation Tests**:
-| Var 1 | Var 2 | Coefficient | p-value | Significance |
-|-------|-------|-------------|---------|--------------|
+**Correlations**:
+| Var 1 | Var 2 | r | p-value | Significant |
+|-------|-------|---|---------|-------------|
 | [Var 1] | [Var 2] | [Value] | [Value] | [Yes/No] |
 
-### Appendix D: Full Visualization Gallery
+### D: Visualizations
 
-[Links to all generated visualizations]
-
-1. [Visualization 1 description](path/to/viz1.png)
-2. [Visualization 2 description](path/to/viz2.png)
-3. [Visualization 3 description](path/to/viz3.png)
-
----
-
-## Contact Information
-
-**Analyst**: [Name]
-**Email**: [Email]
-**Date**: [Date]
-**Version**: [Version number]
-
----
-
-**Document History**:
-| Version | Date | Changes | Author |
-|---------|------|---------|--------|
-| 1.0 | [Date] | Initial analysis | [Name] |
+1. [Description](path/to/viz1.png)
+2. [Description](path/to/viz2.png)
diff --git a/scientific-thinking/exploratory-data-analysis/references/eda_best_practices.md b/scientific-thinking/exploratory-data-analysis/references/eda_best_practices.md
index 1699073..529e50d 100644
--- a/scientific-thinking/exploratory-data-analysis/references/eda_best_practices.md
+++ b/scientific-thinking/exploratory-data-analysis/references/eda_best_practices.md
@@ -1,379 +1,125 @@
-# Exploratory Data Analysis Best Practices
+# EDA Best Practices
 
-This guide provides best practices and methodologies for conducting thorough exploratory data analysis.
+Methodologies for conducting thorough exploratory data analysis.
 
-## EDA Process Framework
+## 6-Step EDA Framework
 
-### 1. Initial Data Understanding
+### 1. Initial Understanding
 
-**Objectives**:
-- Understand data structure and format
-- Identify data types and schema
-- Get familiar with domain context
-
-**Key Questions**:
+**Questions**:
 - What does each column represent?
-- What is the unit of observation?
-- What is the time period covered?
+- What is the unit of observation and time period?
 - What is the data collection methodology?
-- Are there any known data quality issues?
+- Are there known quality issues?
 
-**Actions**:
-- Load and inspect first/last rows
-- Check data dimensions (rows × columns)
-- Review column names and types
-- Document data source and context
+**Actions**: Load data, inspect structure, review types, document context
 
-### 2. Data Quality Assessment
+### 2. Quality Assessment
 
-**Objectives**:
-- Identify data quality issues
-- Assess data completeness and reliability
-- Document data limitations
-
-**Key Checks**:
-- **Missing data**: Patterns, extent, randomness
-- **Duplicates**: Exact and near-duplicates
-- **Outliers**: Valid extremes vs. data errors
-- **Consistency**: Cross-field validation
-- **Accuracy**: Domain knowledge validation
+**Check**: Missing data patterns, duplicates, outliers, consistency, accuracy
 
 **Red Flags**:
-- High missing data rate (>20%)
+- Missing data >20%
 - Unexpected duplicates
-- Constant or near-constant columns
-- Impossible values (negative ages, dates in future)
-- High cardinality in ID-like columns
+- Constant columns
+- Impossible values (negative ages, future dates)
 - Suspicious patterns (too many round numbers)
 
 ### 3. Univariate Analysis
 
-**Objectives**:
-- Understand individual variable distributions
-- Identify anomalies and patterns
-- Determine variable characteristics
+**Numeric**: Central tendency, dispersion, shape (skewness, kurtosis), distribution plots, outliers
 
-**For Numeric Variables**:
-- Central tendency (mean, median, mode)
-- Dispersion (range, variance, std, IQR)
-- Shape (skewness, kurtosis)
-- Distribution visualization (histogram, KDE, box plot)
-- Outlier detection
+**Categorical**: Frequency distributions, unique counts, balance, bar charts
 
-**For Categorical Variables**:
-- Frequency distributions
-- Unique value counts
-- Most/least common categories
-- Category balance/imbalance
-- Bar charts and count plots
-
-**For Temporal Variables**:
-- Time range coverage
-- Gaps in timeline
-- Temporal patterns (trends, seasonality)
-- Time series plots
+**Temporal**: Time range, gaps, trends, seasonality, time series plots
 
 ### 4. Bivariate Analysis
 
-**Objectives**:
-- Understand relationships between variables
-- Identify correlations and dependencies
-- Find potential predictors
+**Numeric vs Numeric**: Scatter plots, correlations (Pearson, Spearman), detect non-linearity
 
-**Numeric vs Numeric**:
-- Scatter plots
-- Correlation coefficients (Pearson, Spearman)
-- Line of best fit
-- Detect non-linear relationships
+**Numeric vs Categorical**: Group statistics, box plots by category, t-test/ANOVA
 
-**Numeric vs Categorical**:
-- Group statistics (mean, median by category)
-- Box plots by category
-- Distribution plots by category
-- Statistical tests (t-test, ANOVA)
-
-**Categorical vs Categorical**:
-- Cross-tabulation / contingency tables
-- Stacked bar charts
-- Chi-square tests
-- Cramér's V for association strength
+**Categorical vs Categorical**: Cross-tabs, stacked bars, chi-square, Cramér's V
 
 ### 5. Multivariate Analysis
 
-**Objectives**:
-- Understand complex interactions
-- Identify patterns across multiple variables
-- Explore dimensionality
+**Techniques**: Correlation matrices, pair plots, parallel coordinates, PCA, clustering
 
-**Techniques**:
-- Correlation matrices and heatmaps
-- Pair plots / scatter matrices
-- Parallel coordinates plots
-- Principal Component Analysis (PCA)
-- Clustering analysis
-
-**Key Questions**:
-- Are there groups of correlated features?
-- Can we reduce dimensionality?
-- Are there natural clusters?
-- Do patterns change when conditioning on other variables?
+**Questions**: Groups of correlated features? Reduce dimensionality? Natural clusters? Conditional patterns?
 
 ### 6. Insight Generation
 
-**Objectives**:
-- Synthesize findings into actionable insights
-- Formulate hypotheses
-- Identify next steps
+**Look for**: Unexpected patterns, strong correlations, quality issues, feature engineering opportunities, domain implications
 
-**What to Look For**:
-- Unexpected patterns or anomalies
-- Strong relationships or correlations
-- Data quality issues requiring attention
-- Feature engineering opportunities
-- Business or research implications
+## Visualization Guidelines
 
-## Best Practices
+**Chart Selection**:
+- Distribution: Histogram, KDE, box/violin plots
+- Relationships: Scatter, line, heatmap
+- Composition: Stacked bar
+- Comparison: Bar, grouped bar
 
-### Visualization Guidelines
+**Best Practices**: Label axes with units, descriptive titles, purposeful color, appropriate scales, avoid clutter
 
-1. **Choose appropriate chart types**:
-   - Distribution: Histogram, KDE, box plot, violin plot
-   - Relationships: Scatter plot, line plot, heatmap
-   - Composition: Stacked bar, pie chart (use sparingly)
-   - Comparison: Bar chart, grouped bar chart
+## Statistical Analysis Guidelines
 
-2. **Make visualizations clear and informative**:
-   - Always label axes with units
-   - Add descriptive titles
-   - Use color purposefully
-   - Include legends when needed
-   - Choose appropriate scales
-   - Avoid chart junk
+**Check Assumptions**: Normality, homoscedasticity, independence, linearity
 
-3. **Use multiple views**:
-   - Show data from different angles
-   - Combine complementary visualizations
-   - Use small multiples for faceting
+**Method Selection**: Parametric when assumptions met, non-parametric otherwise, report effect sizes
 
-### Statistical Analysis Guidelines
+**Context Matters**: Statistical ≠ practical significance, domain knowledge trumps statistics, correlation ≠ causation
 
-1. **Check assumptions**:
-   - Test for normality before parametric tests
-   - Check for homoscedasticity
-   - Verify independence of observations
-   - Assess linearity for linear models
+## Documentation Guidelines
 
-2. **Use appropriate methods**:
-   - Parametric tests when assumptions met
-   - Non-parametric alternatives when violated
-   - Robust methods for outlier-prone data
-   - Effect sizes alongside p-values
+**Notes**: Document assumptions, decisions, issues, findings
 
-3. **Consider context**:
-   - Statistical significance ≠ practical significance
-   - Domain knowledge trumps statistical patterns
-   - Correlation ≠ causation
-   - Sample size affects what you can detect
+**Reproducibility**: Use scripts, version control, document sources, set random seeds
 
-### Documentation Guidelines
+**Reporting**: Clear summaries, supporting visualizations, highlighted insights, actionable recommendations
 
-1. **Keep detailed notes**:
-   - Document assumptions and decisions
-   - Record data issues discovered
-   - Note interesting findings
-   - Track questions that arise
+## Common Pitfalls
 
-2. **Create reproducible analysis**:
-   - Use scripts, not manual Excel operations
-   - Version control your code
-   - Document data sources and versions
-   - Include random seeds for reproducibility
-
-3. **Summarize findings**:
-   - Write clear summaries
-   - Use visualizations to support points
-   - Highlight key insights
-   - Provide recommendations
-
-## Common Pitfalls to Avoid
-
-### 1. Confirmation Bias
-- **Problem**: Looking only for evidence supporting preconceptions
-- **Solution**: Actively seek disconfirming evidence, use blind analysis
-
-### 2. Ignoring Data Quality
-- **Problem**: Proceeding with analysis despite known data issues
-- **Solution**: Address quality issues first, document limitations
-
-### 3. Over-reliance on Automation
-- **Problem**: Running analyses without understanding or verifying results
-- **Solution**: Manually inspect subsets, verify automated findings
-
-### 4. Neglecting Outliers
-- **Problem**: Removing outliers without investigation
-- **Solution**: Always investigate outliers - they may contain important information
-
-### 5. Multiple Testing Without Correction
-- **Problem**: Running many tests increases false positive rate
-- **Solution**: Use correction methods (Bonferroni, FDR) or be explicit about exploratory nature
-
-### 6. Mistaking Association for Causation
-- **Problem**: Inferring causation from correlation
-- **Solution**: Use careful language, acknowledge alternative explanations
-
-### 7. Cherry-picking Results
-- **Problem**: Reporting only interesting/significant findings
-- **Solution**: Report complete analysis, including negative results
-
-### 8. Ignoring Sample Size
-- **Problem**: Not considering how sample size affects conclusions
-- **Solution**: Report effect sizes, confidence intervals, and sample sizes
+1. **Confirmation Bias**: Seek disconfirming evidence, use blind analysis
+2. **Ignoring Quality**: Address issues first, document limitations
+3. **Over-automation**: Manually inspect subsets, verify results
+4. **Neglecting Outliers**: Investigate before removing - may be informative
+5. **Multiple Testing**: Use correction (Bonferroni, FDR) or note exploratory nature
+6. **Association ≠ Causation**: Use careful language, acknowledge alternatives
+7. **Cherry-picking**: Report complete analysis, including negative results
+8. **Ignoring Sample Size**: Report effect sizes, CIs, and sample sizes
 
 ## Domain-Specific Considerations
 
-### Time Series Data
-- Check for stationarity
-- Identify trends and seasonality
-- Look for autocorrelation
-- Handle missing time points
-- Consider temporal splits for validation
+**Time Series**: Check stationarity, identify trends/seasonality, autocorrelation, temporal splits
 
-### High-Dimensional Data
-- Start with dimensionality reduction
-- Focus on feature importance
-- Be cautious of curse of dimensionality
-- Use regularization in modeling
-- Consider domain knowledge for feature selection
+**High-Dimensional**: Dimensionality reduction, feature importance, regularization, domain-guided selection
 
-### Imbalanced Data
-- Report class distributions
-- Use appropriate metrics (not just accuracy)
-- Consider resampling techniques
-- Stratify sampling and cross-validation
-- Be aware of biases in learning
+**Imbalanced**: Report distributions, appropriate metrics, resampling, stratified CV
 
-### Small Sample Sizes
-- Use non-parametric methods
-- Be conservative with conclusions
-- Report confidence intervals
-- Consider Bayesian approaches
-- Acknowledge limitations
+**Small Samples**: Non-parametric methods, conservative conclusions, CIs, Bayesian approaches
 
-### Big Data
-- Sample intelligently for exploration
-- Use efficient data structures
-- Leverage parallel/distributed computing
-- Be aware computational complexity
-- Consider scalability in methods
+**Big Data**: Intelligent sampling, efficient structures, parallel computing, scalability
 
 ## Iterative Process
 
-EDA is not linear - iterate and refine:
+EDA is iterative: Explore → Questions → Focused Analysis → Insights → New Questions → Deeper Investigation → Synthesis
 
-1. **Initial exploration** → Identify questions
-2. **Focused analysis** → Answer specific questions
-3. **New insights** → Generate new questions
-4. **Deeper investigation** → Refine understanding
-5. **Synthesis** → Integrate findings
+**Done When**: Understand structure/quality, characterized variables, identified relationships, documented limitations, answered questions, have actionable insights
 
-### When to Stop
+**Deliverables**: Data understanding, quality issue list, relationship insights, hypotheses, feature ideas, recommendations
 
-You've done enough EDA when:
-- ✅ You understand the data structure and quality
-- ✅ You've characterized key variables
-- ✅ You've identified important relationships
-- ✅ You've documented limitations
-- ✅ You can answer your research questions
-- ✅ You have actionable insights
+## Communication
 
-### Moving Forward
+**Technical Audiences**: Methodological details, statistical tests, assumptions, reproducible code
 
-After EDA, you should have:
-- Clear understanding of data
-- List of quality issues and how to handle them
-- Insights about relationships and patterns
-- Hypotheses to test
-- Ideas for feature engineering
-- Recommendations for next steps
+**Non-Technical Audiences**: Focus on insights, clear visualizations, avoid jargon, concrete recommendations
 
-## Communication Tips
+**Report Structure**: Executive summary → Data overview → Analysis → Insights → Recommendations → Appendix
 
-### For Technical Audiences
-- Include methodological details
-- Show statistical test results
-- Discuss assumptions and limitations
-- Provide reproducible code
-- Reference relevant literature
+## Checklists
 
-### For Non-Technical Audiences
-- Focus on insights, not methods
-- Use clear visualizations
-- Avoid jargon
-- Provide context and implications
-- Make recommendations concrete
+**Before**: Understand context, define objectives, identify audience, set up environment
 
-### Report Structure
-1. **Executive Summary**: Key findings and recommendations
-2. **Data Overview**: Source, structure, limitations
-3. **Analysis**: Findings organized by theme
-4. **Insights**: Patterns, anomalies, implications
-5. **Recommendations**: Next steps and actions
-6. **Appendix**: Technical details, full statistics
+**During**: Inspect structure, assess quality, analyze distributions, explore relationships, document continuously
 
-## Useful Checklists
-
-### Before Starting
-- [ ] Understand business/research context
-- [ ] Define analysis objectives
-- [ ] Identify stakeholders and audience
-- [ ] Secure necessary permissions
-- [ ] Set up reproducible environment
-
-### During Analysis
-- [ ] Load and inspect data structure
-- [ ] Assess data quality
-- [ ] Analyze univariate distributions
-- [ ] Explore bivariate relationships
-- [ ] Investigate multivariate patterns
-- [ ] Generate and validate insights
-- [ ] Document findings continuously
-
-### Before Concluding
-- [ ] Verify all findings
-- [ ] Check for alternative explanations
-- [ ] Document limitations
-- [ ] Prepare clear visualizations
-- [ ] Write actionable recommendations
-- [ ] Review with domain experts
-- [ ] Ensure reproducibility
-
-## Tools and Libraries
-
-### Python Ecosystem
-- **pandas**: Data manipulation
-- **numpy**: Numerical operations
-- **matplotlib/seaborn**: Visualization
-- **scipy**: Statistical tests
-- **scikit-learn**: ML preprocessing
-- **plotly**: Interactive visualizations
-
-### Best Tool Practices
-- Use appropriate tool for task
-- Leverage vectorization
-- Chain operations efficiently
-- Handle missing data properly
-- Validate results independently
-- Document custom functions
-
-## Further Resources
-
-- **Books**:
-  - "Exploratory Data Analysis" by John Tukey
-  - "The Art of Statistics" by David Spiegelhalter
-- **Guidelines**:
-  - ASA Statistical Significance Statement
-  - FAIR data principles
-- **Communities**:
-  - Cross Validated (Stack Exchange)
-  - /r/datascience
-  - Local data science meetups
+**After**: Verify findings, check alternatives, document limitations, prepare visualizations, ensure reproducibility
diff --git a/scientific-thinking/exploratory-data-analysis/references/statistical_tests_guide.md b/scientific-thinking/exploratory-data-analysis/references/statistical_tests_guide.md
index 9aa0992..557c465 100644
--- a/scientific-thinking/exploratory-data-analysis/references/statistical_tests_guide.md
+++ b/scientific-thinking/exploratory-data-analysis/references/statistical_tests_guide.md
@@ -1,252 +1,126 @@
-# Statistical Tests Guide for EDA
+# Statistical Tests Guide
 
-This guide provides interpretation guidelines for statistical tests commonly used in exploratory data analysis.
+Interpretation guidelines for common EDA statistical tests.
 
 ## Normality Tests
 
-### Shapiro-Wilk Test
+### Shapiro-Wilk
 
-**Purpose**: Test if a sample comes from a normally distributed population
+**Use**: Small to medium samples (n < 5000)
 
-**When to use**: Best for small to medium sample sizes (n < 5000)
+**H0**: Data is normal | **H1**: Data is not normal
 
-**Interpretation**:
-- **Null Hypothesis (H0)**: The data follows a normal distribution
-- **Alternative Hypothesis (H1)**: The data does not follow a normal distribution
-- **p-value > 0.05**: Fail to reject H0 → Data is likely normally distributed
-- **p-value ≤ 0.05**: Reject H0 → Data is not normally distributed
+**Interpretation**: p > 0.05 → likely normal | p ≤ 0.05 → not normal
 
-**Notes**:
-- Very sensitive to sample size
-- Small deviations from normality may be detected as significant in large samples
-- Consider practical significance alongside statistical significance
+**Note**: Very sensitive to sample size; small deviations may be significant in large samples
 
-### Anderson-Darling Test
+### Anderson-Darling
 
-**Purpose**: Test if a sample comes from a specific distribution (typically normal)
+**Use**: More powerful than Shapiro-Wilk, emphasizes tails
 
-**When to use**: More powerful than Shapiro-Wilk for detecting departures from normality
+**Interpretation**: Test statistic > critical value → reject normality
 
-**Interpretation**:
-- Compares test statistic against critical values at different significance levels
-- If test statistic > critical value at given significance level, reject normality
-- More weight given to tails of distribution than other tests
+### Kolmogorov-Smirnov
 
-### Kolmogorov-Smirnov Test
+**Use**: Large samples or testing against non-normal distributions
 
-**Purpose**: Test if a sample comes from a reference distribution
-
-**When to use**: When you have a large sample or want to test against distributions other than normal
-
-**Interpretation**:
-- **p-value > 0.05**: Sample distribution matches reference distribution
-- **p-value ≤ 0.05**: Sample distribution differs from reference distribution
+**Interpretation**: p > 0.05 → matches reference | p ≤ 0.05 → differs from reference
 
 ## Distribution Characteristics
 
 ### Skewness
 
-**Purpose**: Measure asymmetry of the distribution
+**Measures asymmetry**:
+- ≈ 0: Symmetric
+- \> 0: Right-skewed (tail right)
+- < 0: Left-skewed (tail left)
 
-**Interpretation**:
-- **Skewness ≈ 0**: Symmetric distribution
-- **Skewness > 0**: Right-skewed (tail extends to right, most values on left)
-- **Skewness < 0**: Left-skewed (tail extends to left, most values on right)
+**Magnitude**: |s| < 0.5 (symmetric) | 0.5-1 (moderate) | ≥ 1 (high)
 
-**Magnitude interpretation**:
-- **|Skewness| < 0.5**: Approximately symmetric
-- **0.5 ≤ |Skewness| < 1**: Moderately skewed
-- **|Skewness| ≥ 1**: Highly skewed
-
-**Implications**:
-- Highly skewed data may require transformation (log, sqrt, Box-Cox)
-- Mean is pulled toward tail; median more robust for skewed data
-- Many statistical tests assume symmetry/normality
+**Action**: High skew → consider transformation (log, sqrt, Box-Cox); use median over mean
 
 ### Kurtosis
 
-**Purpose**: Measure tailedness and peak of distribution
+**Measures tailedness** (excess kurtosis, normal = 0):
+- ≈ 0: Normal tails
+- \> 0: Heavy tails, more outliers
+- < 0: Light tails, fewer outliers
 
-**Interpretation** (Excess Kurtosis, where normal distribution = 0):
-- **Kurtosis ≈ 0**: Normal tail behavior (mesokurtic)
-- **Kurtosis > 0**: Heavy tails, sharp peak (leptokurtic)
-  - More outliers than normal distribution
-  - Higher probability of extreme values
-- **Kurtosis < 0**: Light tails, flat peak (platykurtic)
-  - Fewer outliers than normal distribution
-  - More uniform distribution
+**Magnitude**: |k| < 0.5 (normal) | 0.5-1 (moderate) | ≥ 1 (very different)
 
-**Magnitude interpretation**:
-- **|Kurtosis| < 0.5**: Normal-like tails
-- **0.5 ≤ |Kurtosis| < 1**: Moderately different tails
-- **|Kurtosis| ≥ 1**: Very different tail behavior from normal
+**Action**: High kurtosis → investigate outliers carefully
 
-**Implications**:
-- High kurtosis → Be cautious with outliers
-- Low kurtosis → Distribution lacks distinct peak
+## Correlation
 
-## Correlation Tests
+### Pearson
 
-### Pearson Correlation
+**Measures**: Linear relationship (-1 to +1)
 
-**Purpose**: Measure linear relationship between two continuous variables
+**Strength**: |r| < 0.3 (weak) | 0.3-0.5 (moderate) | 0.5-0.7 (strong) | ≥ 0.7 (very strong)
 
-**Range**: -1 to +1
+**Assumptions**: Linear, continuous, normal, no outliers, homoscedastic
 
-**Interpretation**:
-- **r = +1**: Perfect positive linear relationship
-- **r = 0**: No linear relationship
-- **r = -1**: Perfect negative linear relationship
+**Use**: Expected linear relationship, assumptions met
 
-**Strength guidelines**:
-- **|r| < 0.3**: Weak correlation
-- **0.3 ≤ |r| < 0.5**: Moderate correlation
-- **0.5 ≤ |r| < 0.7**: Strong correlation
-- **|r| ≥ 0.7**: Very strong correlation
+### Spearman
 
-**Assumptions**:
-- Linear relationship between variables
-- Both variables continuous and normally distributed
-- No significant outliers
-- Homoscedasticity (constant variance)
+**Measures**: Monotonic relationship (-1 to +1), rank-based
 
-**When to use**: When relationship is expected to be linear and data meets assumptions
+**Advantages**: Robust to outliers, no linearity assumption, works with ordinal, no normality required
 
-### Spearman Correlation
+**Use**: Outliers present, non-linear monotonic relationship, ordinal data, non-normal
 
-**Purpose**: Measure monotonic relationship between two variables (rank-based)
+## Outlier Detection
 
-**Range**: -1 to +1
+### IQR Method
 
-**Interpretation**: Same as Pearson, but measures monotonic (not just linear) relationships
+**Bounds**: Q1 - 1.5×IQR to Q3 + 1.5×IQR
 
-**Advantages over Pearson**:
-- Robust to outliers (uses ranks)
-- Doesn't assume linear relationship
-- Works with ordinal data
-- Doesn't require normality assumption
+**Characteristics**: Simple, robust, works with skewed data
 
-**When to use**:
-- Data has outliers
-- Relationship is monotonic but not linear
-- Data is ordinal
-- Distribution is non-normal
-
-## Outlier Detection Methods
-
-### IQR Method (Interquartile Range)
-
-**Definition**:
-- Lower bound: Q1 - 1.5 × IQR
-- Upper bound: Q3 + 1.5 × IQR
-- Values outside these bounds are outliers
-
-**Characteristics**:
-- Simple and interpretable
-- Robust to extreme values
-- Works well for skewed distributions
-- Conservative approach (Tukey's fences)
-
-**Interpretation**:
-- **< 5% outliers**: Typical for most datasets
-- **5-10% outliers**: Moderate, investigate causes
-- **> 10% outliers**: High rate, may indicate data quality issues or interesting phenomena
+**Typical Rates**: < 5% (normal) | 5-10% (moderate) | > 10% (high, investigate)
 
 ### Z-Score Method
 
-**Definition**: Outliers are data points with |z-score| > 3
+**Definition**: |z| > 3 where z = (x - μ) / σ
 
-**Formula**: z = (x - μ) / σ
+**Use**: Normal data, n > 30
 
-**Characteristics**:
-- Assumes normal distribution
-- Sensitive to extreme values
-- Standard threshold is |z| > 3 (99.7% of data within ±3σ)
+**Avoid**: Small samples, skewed data, many outliers (contaminates mean/SD)
 
-**When to use**:
-- Data is approximately normally distributed
-- Large sample sizes (n > 30)
+## Hypothesis Testing
 
-**When NOT to use**:
-- Small samples
-- Heavily skewed data
-- Data with many outliers (contaminates mean and SD)
+**Significance Levels**: α = 0.05 (standard) | 0.01 (conservative) | 0.10 (liberal)
 
-## Hypothesis Testing Guidelines
+**p-value Interpretation**: ≤ 0.001 (***) | ≤ 0.01 (**) | ≤ 0.05 (*) | ≤ 0.10 (weak) | > 0.10 (none)
 
-### Significance Levels
+**Key Considerations**:
+- Statistical ≠ practical significance
+- Multiple testing → use correction (Bonferroni, FDR)
+- Large samples detect trivial effects
+- Always report effect sizes with p-values
 
-- **α = 0.05**: Standard significance level (5% chance of Type I error)
-- **α = 0.01**: More conservative (1% chance of Type I error)
-- **α = 0.10**: More liberal (10% chance of Type I error)
+## Transformations
 
-### p-value Interpretation
+**Right-skewed**: Log, sqrt, Box-Cox
 
-- **p ≤ 0.001**: Very strong evidence against H0 (***)
-- **0.001 < p ≤ 0.01**: Strong evidence against H0 (**)
-- **0.01 < p ≤ 0.05**: Moderate evidence against H0 (*)
-- **0.05 < p ≤ 0.10**: Weak evidence against H0
-- **p > 0.10**: Little to no evidence against H0
+**Left-skewed**: Square, cube, exponential
 
-### Important Considerations
+**Heavy tails**: Robust scaling, winsorization, log
 
-1. **Statistical vs Practical Significance**: A small p-value doesn't always mean the effect is important
-2. **Multiple Testing**: When performing many tests, use correction methods (Bonferroni, FDR)
-3. **Sample Size**: Large samples can detect trivial effects as significant
-4. **Effect Size**: Always report and interpret effect sizes alongside p-values
+**Non-constant variance**: Log, Box-Cox
 
-## Data Transformation Strategies
-
-### When to Transform
-
-- **Right-skewed data**: Log, square root, or Box-Cox transformation
-- **Left-skewed data**: Square, cube, or exponential transformation
-- **Heavy tails/outliers**: Robust scaling, winsorization, or log transformation
-- **Non-constant variance**: Log or Box-Cox transformation
-
-### Common Transformations
-
-1. **Log transformation**: log(x) or log(x + 1)
-   - Best for: Positive skewed data, multiplicative relationships
-   - Cannot use with zero or negative values
-
-2. **Square root transformation**: √x
-   - Best for: Count data, moderate positive skew
-   - Less aggressive than log
-
-3. **Box-Cox transformation**: (x^λ - 1) / λ
-   - Best for: Automatically finds optimal transformation
-   - Requires positive values
-
-4. **Standardization**: (x - μ) / σ
-   - Best for: Scaling features to same range
-   - Centers data at 0 with unit variance
-
-5. **Min-Max scaling**: (x - min) / (max - min)
-   - Best for: Scaling to [0, 1] range
-   - Preserves zero values
+**Common Methods**:
+- **Log**: log(x+1) for positive skew, multiplicative relationships
+- **Sqrt**: Count data, moderate skew
+- **Box-Cox**: Auto-finds optimal (requires positive values)
+- **Standardization**: (x-μ)/σ for scaling to unit variance
+- **Min-Max**: (x-min)/(max-min) for [0,1] scaling
 
 ## Practical Guidelines
 
-### Sample Size Considerations
+**Sample Size**: n < 30 (non-parametric, cautious) | 30-100 (parametric OK) | ≥ 100 (robust) | ≥ 1000 (may detect trivial effects)
 
-- **n < 30**: Use non-parametric tests, be cautious with assumptions
-- **30 ≤ n < 100**: Moderate sample, parametric tests usually acceptable
-- **n ≥ 100**: Large sample, parametric tests robust to violations
-- **n ≥ 1000**: Very large sample, may detect trivial effects as significant
+**Missing Data**: < 5% (simple methods) | 5-10% (imputation) | > 10% (investigate patterns, advanced methods)
 
-### Dealing with Missing Data
-
-- **< 5% missing**: Usually not a problem, simple methods OK
-- **5-10% missing**: Use appropriate imputation methods
-- **> 10% missing**: Investigate patterns, consider advanced imputation or modeling missingness
-
-### Reporting Results
-
-Always include:
-1. Test statistic value
-2. p-value
-3. Confidence interval (when applicable)
-4. Effect size
-5. Sample size
-6. Assumptions checked and violations noted
+**Reporting**: Include test statistic, p-value, CI, effect size, n, assumption checks