8.6 KiB
Exploratory Data Analysis Report
Dataset: [Dataset Name] Analysis Date: [Date] Analyst: [Name]
Executive Summary
[2-3 paragraph summary of key findings, major insights, and recommendations]
Key Findings:
- [Finding 1]
- [Finding 2]
- [Finding 3]
Recommendations:
- [Recommendation 1]
- [Recommendation 2]
1. Dataset Overview
1.1 Data Source
- Source: [Source name and location]
- Collection Period: [Date range]
- Last Updated: [Date]
- Format: [CSV, Excel, JSON, etc.]
1.2 Data Structure
- Observations (Rows): [Number]
- Variables (Columns): [Number]
- Memory Usage: [Size in MB]
1.3 Variable Types
- Numeric Variables ([Count]): [List column names]
- Categorical Variables ([Count]): [List column names]
- Datetime Variables ([Count]): [List column names]
- Boolean Variables ([Count]): [List column names]
2. Data Quality Assessment
2.1 Completeness
Overall Data Completeness: [Percentage]%
Missing Data Summary:
| Column | Missing Count | Missing % | Assessment |
|---|---|---|---|
| [Column 1] | [Count] | [%] | [High/Medium/Low] |
| [Column 2] | [Count] | [%] | [High/Medium/Low] |
Missing Data Pattern: [Description of patterns, if any]
2.2 Duplicates
- Duplicate Rows: [Count] ([Percentage]%)
- Action Required: [Yes/No - describe if needed]
2.3 Data Quality Issues
[List any identified issues]
- Issue 1: [Description]
- Issue 2: [Description]
- Issue 3: [Description]
3. Univariate Analysis
3.1 Numeric Variables
[For each key numeric variable:]
[Variable Name]
Summary Statistics:
- Mean: [Value]
- Median: [Value]
- Std Dev: [Value]
- Min: [Value]
- Max: [Value]
- Range: [Value]
- IQR: [Value]
Distribution Characteristics:
- Skewness: [Value] - [Interpretation]
- Kurtosis: [Value] - [Interpretation]
- Normality: [Normal/Not Normal based on tests]
Outliers:
- IQR Method: [Count] outliers ([Percentage]%)
- Z-Score Method: [Count] outliers ([Percentage]%)
Insights:
- [Key insight 1]
- [Key insight 2]
3.2 Categorical Variables
[For each key categorical variable:]
[Variable Name]
Summary:
- Unique Values: [Count]
- Most Common: [Value] ([Percentage]%)
- Least Common: [Value] ([Percentage]%)
- Balance: [Balanced/Imbalanced]
Top Categories:
| Category | Count | Percentage |
|---|---|---|
| [Cat 1] | [Count] | [%] |
| [Cat 2] | [Count] | [%] |
| [Cat 3] | [Count] | [%] |
Insights:
- [Key insight 1]
- [Key insight 2]
3.3 Temporal Variables
[If datetime columns exist:]
[Variable Name]
Time Range: [Start Date] to [End Date] Duration: [Time span] Temporal Coverage: [Complete/Gaps identified]
Temporal Patterns:
- Trend: [Increasing/Decreasing/Stable]
- Seasonality: [Yes/No - describe if present]
- Gaps: [List any gaps in timeline]
Insights:
- [Key insight 1]
- [Key insight 2]
4. Bivariate Analysis
4.1 Correlation Analysis
Overall Correlation Structure:
- Strong Positive Correlations: [Count]
- Strong Negative Correlations: [Count]
- Weak/No Correlations: [Count]
Correlation Matrix:
Notable Correlations:
| Variable 1 | Variable 2 | Pearson r | Spearman ρ | Strength | Interpretation |
|---|---|---|---|---|---|
| [Var 1] | [Var 2] | [Value] | [Value] | [Strong/Moderate/Weak] | [Interpretation] |
| [Var 1] | [Var 3] | [Value] | [Value] | [Strong/Moderate/Weak] | [Interpretation] |
Insights:
- [Key insight about correlations]
- [Potential multicollinearity issues]
- [Feature engineering opportunities]
4.2 Key Relationships
[For important variable pairs:]
[Variable 1] vs [Variable 2]
Relationship Type: [Linear/Non-linear/None] Correlation: [Value] Statistical Test: [Test name, p-value]
Insights:
- [Description of relationship]
- [Implications]
5. Multivariate Analysis
5.1 Scatter Matrix
Observations:
- [Pattern 1]
- [Pattern 2]
- [Pattern 3]
5.2 Clustering Patterns
[If clustering analysis performed:]
Method: [Method used] Number of Clusters: [Count]
Cluster Characteristics:
- Cluster 1: [Description]
- Cluster 2: [Description]
Visualization: [Link to visualization]
6. Outlier Analysis
6.1 Outlier Summary
Overall Outlier Rate: [Percentage]%
Variables with High Outlier Rates:
| Variable | Outlier Count | Outlier % | Method | Action |
|---|---|---|---|---|
| [Var 1] | [Count] | [%] | [IQR/Z-score] | [Keep/Investigate/Remove] |
| [Var 2] | [Count] | [%] | [IQR/Z-score] | [Keep/Investigate/Remove] |
6.2 Outlier Investigation
[For significant outliers:]
[Variable Name]
Outlier Characteristics:
- [Description of outliers]
- [Potential causes]
- [Validity assessment]
Recommendation: [Keep/Remove/Transform/Investigate further]
7. Key Insights and Findings
7.1 Data Quality Insights
- [Insight 1]: [Description and implication]
- [Insight 2]: [Description and implication]
- [Insight 3]: [Description and implication]
7.2 Statistical Insights
- [Insight 1]: [Description and implication]
- [Insight 2]: [Description and implication]
- [Insight 3]: [Description and implication]
7.3 Business/Research Insights
- [Insight 1]: [Description and implication]
- [Insight 2]: [Description and implication]
- [Insight 3]: [Description and implication]
7.4 Unexpected Findings
- [Finding 1]: [Description and significance]
- [Finding 2]: [Description and significance]
8. Recommendations
8.1 Data Quality Actions
- [Action 1]: [Description and priority]
- [Action 2]: [Description and priority]
- [Action 3]: [Description and priority]
8.2 Analysis Next Steps
- [Step 1]: [Description and rationale]
- [Step 2]: [Description and rationale]
- [Step 3]: [Description and rationale]
8.3 Feature Engineering Opportunities
- [Opportunity 1]: [Description]
- [Opportunity 2]: [Description]
- [Opportunity 3]: [Description]
8.4 Modeling Considerations
- [Consideration 1]: [Description]
- [Consideration 2]: [Description]
- [Consideration 3]: [Description]
9. Limitations and Caveats
9.1 Data Limitations
- [Limitation 1]
- [Limitation 2]
- [Limitation 3]
9.2 Analysis Limitations
- [Limitation 1]
- [Limitation 2]
- [Limitation 3]
9.3 Assumptions Made
- [Assumption 1]
- [Assumption 2]
- [Assumption 3]
10. Appendices
Appendix A: Technical Details
Software Environment:
- Python: [Version]
- Key Libraries: pandas ([Version]), numpy ([Version]), scipy ([Version]), matplotlib ([Version])
Analysis Scripts: [Link to repository or location]
Appendix B: Variable Dictionary
| Variable Name | Type | Description | Unit | Valid Range | Missing % |
|---|---|---|---|---|---|
| [Var 1] | [Type] | [Description] | [Unit] | [Range] | [%] |
| [Var 2] | [Type] | [Description] | [Unit] | [Range] | [%] |
Appendix C: Statistical Test Results
[Detailed statistical test outputs]
Normality Tests:
| Variable | Test | Statistic | p-value | Result |
|---|---|---|---|---|
| [Var 1] | Shapiro-Wilk | [Value] | [Value] | [Normal/Non-normal] |
Correlation Tests:
| Var 1 | Var 2 | Coefficient | p-value | Significance |
|---|---|---|---|---|
| [Var 1] | [Var 2] | [Value] | [Value] | [Yes/No] |
Appendix D: Full Visualization Gallery
[Links to all generated visualizations]
Contact Information
Analyst: [Name] Email: [Email] Date: [Date] Version: [Version number]
Document History:
| Version | Date | Changes | Author |
|---|---|---|---|
| 1.0 | [Date] | Initial analysis | [Name] |

![Distribution of [Variable]](/skills/claude-scientific-skills/media/commit/c6277142093e6ca96365940028cb37233545404f/scientific-thinking/exploratory-data-analysis/assets/path/to/distribution.png)
![Distribution of [Variable]](/skills/claude-scientific-skills/media/commit/c6277142093e6ca96365940028cb37233545404f/scientific-thinking/exploratory-data-analysis/assets/path/to/categorical.png)
![Time Series of [Variable]](/skills/claude-scientific-skills/media/commit/c6277142093e6ca96365940028cb37233545404f/scientific-thinking/exploratory-data-analysis/assets/path/to/timeseries.png)



