More thinking skills

2026-01-26 16:58:56 +08:00 · 2025-10-19 15:29:11 -07:00
parent 1afa6ffed6
commit dbabf9fe7d
26 changed files with 9471 additions and 1 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -69,7 +69,9 @@
      "source": "./",
      "strict": false,
      "skills": [
-        "./scientific-thinking/hypothesis-generation"
+        "./scientific-thinking/hypothesis-generation",
        "./scientific-thinking/scientific-critical-thinking",
        "./scientific-thinking/statistical-analysis"
      ]
    }
  ]
--- a/scientific-thinking/scientific-critical-thinking/SKILL.md
+++ b/scientific-thinking/scientific-critical-thinking/SKILL.md
@@ -0,0 +1,519 @@
 ---
 name: scientific-critical-thinking
 description: Apply rigorous scientific critical thinking to evaluate research, methodology, claims, and evidence. Use this skill when analyzing scientific papers, reviewing experimental designs, evaluating statistical analyses, identifying biases or logical fallacies, assessing evidence quality, designing studies, or critically examining any scientific claims or arguments.
 ---
 # Scientific Critical Thinking
 ## Overview
 Apply systematic, rigorous critical thinking to scientific work using established methodological principles, evidence evaluation frameworks, and logical reasoning. Analyze research methodology, identify biases and fallacies, evaluate statistical claims, assess evidence quality, and provide constructive critique grounded in scientific principles.
 ## Core Capabilities
 ### 1. Methodology Critique
 Evaluate research methodology for rigor, validity, and potential flaws.
 **Apply when:**
 - Reviewing research papers
 - Assessing experimental designs
 - Evaluating study protocols
 - Planning new research
 **Evaluation framework:**
 1. **Study Design Assessment**
   - Is the design appropriate for the research question?
   - Can the design support causal claims being made?
   - Are comparison groups appropriate and adequate?
   - Consider whether experimental, quasi-experimental, or observational design is justified
 2. **Validity Analysis**
   - **Internal validity:** Can we trust the causal inference?
     - Check randomization quality
     - Evaluate confounding control
     - Assess selection bias
     - Review attrition/dropout patterns
   - **External validity:** Do results generalize?
     - Evaluate sample representativeness
     - Consider ecological validity of setting
     - Assess whether conditions match target application
   - **Construct validity:** Do measures capture intended constructs?
     - Review measurement validation
     - Check operational definitions
     - Assess whether measures are direct or proxy
   - **Statistical conclusion validity:** Are statistical inferences sound?
     - Verify adequate power/sample size
     - Check assumption compliance
     - Evaluate test appropriateness
 3. **Control and Blinding**
   - Was randomization properly implemented (sequence generation, allocation concealment)?
   - Was blinding feasible and implemented (participants, providers, assessors)?
   - Are control conditions appropriate (placebo, active control, no treatment)?
   - Could performance or detection bias affect results?
 4. **Measurement Quality**
   - Are instruments validated and reliable?
   - Are measures objective when possible, or subjective with acknowledged limitations?
   - Is outcome assessment standardized?
   - Are multiple measures used to triangulate findings?
 **Reference:** See `references/scientific_method.md` for detailed principles and `references/experimental_design.md` for comprehensive design checklist.
 ### 2. Bias Detection
 Identify and evaluate potential sources of bias that could distort findings.
 **Apply when:**
 - Reviewing published research
 - Designing new studies
 - Interpreting conflicting evidence
 - Assessing research quality
 **Systematic bias review:**
 1. **Cognitive Biases (Researcher)**
   - **Confirmation bias:** Are only supporting findings highlighted?
   - **HARKing:** Were hypotheses stated a priori or formed after seeing results?
   - **Publication bias:** Are negative results missing from literature?
   - **Cherry-picking:** Is evidence selectively reported?
   - Check for preregistration and analysis plan transparency
 2. **Selection Biases**
   - **Sampling bias:** Is sample representative of target population?
   - **Volunteer bias:** Do participants self-select in systematic ways?
   - **Attrition bias:** Is dropout differential between groups?
   - **Survivorship bias:** Are only "survivors" visible in sample?
   - Examine participant flow diagrams and compare baseline characteristics
 3. **Measurement Biases**
   - **Observer bias:** Could expectations influence observations?
   - **Recall bias:** Are retrospective reports systematically inaccurate?
   - **Social desirability:** Are responses biased toward acceptability?
   - **Instrument bias:** Do measurement tools systematically err?
   - Evaluate blinding, validation, and measurement objectivity
 4. **Analysis Biases**
   - **P-hacking:** Were multiple analyses conducted until significance emerged?
   - **Outcome switching:** Were non-significant outcomes replaced with significant ones?
   - **Selective reporting:** Are all planned analyses reported?
   - **Subgroup fishing:** Were subgroup analyses conducted without correction?
   - Check for study registration and compare to published outcomes
 5. **Confounding**
   - What variables could affect both exposure and outcome?
   - Were confounders measured and controlled (statistically or by design)?
   - Could unmeasured confounding explain findings?
   - Are there plausible alternative explanations?
 **Reference:** See `references/common_biases.md` for comprehensive bias taxonomy with detection and mitigation strategies.
 ### 3. Statistical Analysis Evaluation
 Critically assess statistical methods, interpretation, and reporting.
 **Apply when:**
 - Reviewing quantitative research
 - Evaluating data-driven claims
 - Assessing clinical trial results
 - Reviewing meta-analyses
 **Statistical review checklist:**
 1. **Sample Size and Power**
   - Was a priori power analysis conducted?
   - Is sample adequate for detecting meaningful effects?
   - Is the study underpowered (common problem)?
   - Do significant results from small samples raise flags for inflated effect sizes?
 2. **Statistical Tests**
   - Are tests appropriate for data type and distribution?
   - Were test assumptions checked and met?
   - Are parametric tests justified, or should non-parametric alternatives be used?
   - Is the analysis matched to study design (e.g., paired vs. independent)?
 3. **Multiple Comparisons**
   - Were multiple hypotheses tested?
   - Was correction applied (Bonferroni, FDR, other)?
   - Are primary outcomes distinguished from secondary/exploratory?
   - Could findings be false positives from multiple testing?
 4. **P-Value Interpretation**
   - Are p-values interpreted correctly (probability of data if null is true)?
   - Is non-significance incorrectly interpreted as "no effect"?
   - Is statistical significance conflated with practical importance?
   - Are exact p-values reported, or only "p < .05"?
   - Is there suspicious clustering just below .05?
 5. **Effect Sizes and Confidence Intervals**
   - Are effect sizes reported alongside significance?
   - Are confidence intervals provided to show precision?
   - Is the effect size meaningful in practical terms?
   - Are standardized effect sizes interpreted with field-specific context?
 6. **Missing Data**
   - How much data is missing?
   - Is missing data mechanism considered (MCAR, MAR, MNAR)?
   - How is missing data handled (deletion, imputation, maximum likelihood)?
   - Could missing data bias results?
 7. **Regression and Modeling**
   - Is the model overfitted (too many predictors, no cross-validation)?
   - Are predictions made outside the data range (extrapolation)?
   - Are multicollinearity issues addressed?
   - Are model assumptions checked?
 8. **Common Pitfalls**
   - Correlation treated as causation
   - Ignoring regression to the mean
   - Base rate neglect
   - Texas sharpshooter fallacy (pattern finding in noise)
   - Simpson's paradox (confounding by subgroups)
 **Reference:** See `references/statistical_pitfalls.md` for detailed pitfalls and correct practices.
 ### 4. Evidence Quality Assessment
 Evaluate the strength and quality of evidence systematically.
 **Apply when:**
 - Weighing evidence for decisions
 - Conducting literature reviews
 - Comparing conflicting findings
 - Determining confidence in conclusions
 **Evidence evaluation framework:**
 1. **Study Design Hierarchy**
   - Systematic reviews/meta-analyses (highest for intervention effects)
   - Randomized controlled trials
   - Cohort studies
   - Case-control studies
   - Cross-sectional studies
   - Case series/reports
   - Expert opinion (lowest)
   **Important:** Higher-level designs aren't always better quality. A well-designed observational study can be stronger than a poorly-conducted RCT.
 2. **Quality Within Design Type**
   - Risk of bias assessment (use appropriate tool: Cochrane ROB, Newcastle-Ottawa, etc.)
   - Methodological rigor
   - Transparency and reporting completeness
   - Conflicts of interest
 3. **GRADE Considerations (if applicable)**
   - Start with design type (RCT = high, observational = low)
   - **Downgrade for:**
     - Risk of bias
     - Inconsistency across studies
     - Indirectness (wrong population/intervention/outcome)
     - Imprecision (wide confidence intervals, small samples)
     - Publication bias
   - **Upgrade for:**
     - Large effect sizes
     - Dose-response relationships
     - Confounders would reduce (not increase) effect
 4. **Convergence of Evidence**
   - **Stronger when:**
     - Multiple independent replications
     - Different research groups and settings
     - Different methodologies converge on same conclusion
     - Mechanistic and empirical evidence align
   - **Weaker when:**
     - Single study or research group
     - Contradictory findings in literature
     - Publication bias evident
     - No replication attempts
 5. **Contextual Factors**
   - Biological/theoretical plausibility
   - Consistency with established knowledge
   - Temporality (cause precedes effect)
   - Specificity of relationship
   - Strength of association
 **Reference:** See `references/evidence_hierarchy.md` for detailed hierarchy, GRADE system, and quality assessment tools.
 ### 5. Logical Fallacy Identification
 Detect and name logical errors in scientific arguments and claims.
 **Apply when:**
 - Evaluating scientific claims
 - Reviewing discussion/conclusion sections
 - Assessing popular science communication
 - Identifying flawed reasoning
 **Common fallacies in science:**
 1. **Causation Fallacies**
   - **Post hoc ergo propter hoc:** "B followed A, so A caused B"
   - **Correlation = causation:** Confusing association with causality
   - **Reverse causation:** Mistaking cause for effect
   - **Single cause fallacy:** Attributing complex outcomes to one factor
 2. **Generalization Fallacies**
   - **Hasty generalization:** Broad conclusions from small samples
   - **Anecdotal fallacy:** Personal stories as proof
   - **Cherry-picking:** Selecting only supporting evidence
   - **Ecological fallacy:** Group patterns applied to individuals
 3. **Authority and Source Fallacies**
   - **Appeal to authority:** "Expert said it, so it's true" (without evidence)
   - **Ad hominem:** Attacking person, not argument
   - **Genetic fallacy:** Judging by origin, not merits
   - **Appeal to nature:** "Natural = good/safe"
 4. **Statistical Fallacies**
   - **Base rate neglect:** Ignoring prior probability
   - **Texas sharpshooter:** Finding patterns in random data
   - **Multiple comparisons:** Not correcting for multiple tests
   - **Prosecutor's fallacy:** Confusing P(E|H) with P(H|E)
 5. **Structural Fallacies**
   - **False dichotomy:** "Either A or B" when more options exist
   - **Moving goalposts:** Changing evidence standards after they're met
   - **Begging the question:** Circular reasoning
   - **Straw man:** Misrepresenting arguments to attack them
 6. **Science-Specific Fallacies**
   - **Galileo gambit:** "They laughed at Galileo, so my fringe idea is correct"
   - **Argument from ignorance:** "Not proven false, so true"
   - **Nirvana fallacy:** Rejecting imperfect solutions
   - **Unfalsifiability:** Making untestable claims
 **When identifying fallacies:**
 - Name the specific fallacy
 - Explain why the reasoning is flawed
 - Identify what evidence would be needed for valid inference
 - Note that fallacious reasoning doesn't prove the conclusion false—just that this argument doesn't support it
 **Reference:** See `references/logical_fallacies.md` for comprehensive fallacy catalog with examples and detection strategies.
 ### 6. Research Design Guidance
 Provide constructive guidance for planning rigorous studies.
 **Apply when:**
 - Helping design new experiments
 - Planning research projects
 - Reviewing research proposals
 - Improving study protocols
 **Design process:**
 1. **Research Question Refinement**
   - Ensure question is specific, answerable, and falsifiable
   - Verify it addresses a gap or contradiction in literature
   - Confirm feasibility (resources, ethics, time)
   - Define variables operationally
 2. **Design Selection**
   - Match design to question (causal → experimental; associational → observational)
   - Consider feasibility and ethical constraints
   - Choose between-subjects, within-subjects, or mixed designs
   - Plan factorial designs if testing multiple factors
 3. **Bias Minimization Strategy**
   - Implement randomization when possible
   - Plan blinding at all feasible levels (participants, providers, assessors)
   - Identify and plan to control confounds (randomization, matching, stratification, statistical adjustment)
   - Standardize all procedures
   - Plan to minimize attrition
 4. **Sample Planning**
   - Conduct a priori power analysis (specify expected effect, desired power, alpha)
   - Account for attrition in sample size
   - Define clear inclusion/exclusion criteria
   - Consider recruitment strategy and feasibility
   - Plan for sample representativeness
 5. **Measurement Strategy**
   - Select validated, reliable instruments
   - Use objective measures when possible
   - Plan multiple measures of key constructs (triangulation)
   - Ensure measures are sensitive to expected changes
   - Establish inter-rater reliability procedures
 6. **Analysis Planning**
   - Prespecify all hypotheses and analyses
   - Designate primary outcome clearly
   - Plan statistical tests with assumption checks
   - Specify how missing data will be handled
   - Plan to report effect sizes and confidence intervals
   - Consider multiple comparison corrections
 7. **Transparency and Rigor**
   - Preregister study and analysis plan
   - Use reporting guidelines (CONSORT, STROBE, PRISMA)
   - Plan to report all outcomes, not just significant ones
   - Distinguish confirmatory from exploratory analyses
   - Commit to data/code sharing
 **Reference:** See `references/experimental_design.md` for comprehensive design checklist covering all stages from question to dissemination.
 ### 7. Claim Evaluation
 Systematically evaluate scientific claims for validity and support.
 **Apply when:**
 - Assessing conclusions in papers
 - Evaluating media reports of research
 - Reviewing abstract or introduction claims
 - Checking if data support conclusions
 **Claim evaluation process:**
 1. **Identify the Claim**
   - What exactly is being claimed?
   - Is it a causal claim, associational claim, or descriptive claim?
   - How strong is the claim (proven, likely, suggested, possible)?
 2. **Assess the Evidence**
   - What evidence is provided?
   - Is evidence direct or indirect?
   - Is evidence sufficient for the strength of claim?
   - Are alternative explanations ruled out?
 3. **Check Logical Connection**
   - Do conclusions follow from the data?
   - Are there logical leaps?
   - Is correlational data used to support causal claims?
   - Are limitations acknowledged?
 4. **Evaluate Proportionality**
   - Is confidence proportional to evidence strength?
   - Are hedging words used appropriately?
   - Are limitations downplayed?
   - Is speculation clearly labeled?
 5. **Check for Overgeneralization**
   - Do claims extend beyond the sample studied?
   - Are population restrictions acknowledged?
   - Is context-dependence recognized?
   - Are caveats about generalization included?
 6. **Red Flags**
   - Causal language from correlational studies
   - "Proves" or absolute certainty
   - Cherry-picked citations
   - Ignoring contradictory evidence
   - Dismissing limitations
   - Extrapolation beyond data
 **Provide specific feedback:**
 - Quote the problematic claim
 - Explain what evidence would be needed to support it
 - Suggest appropriate hedging language if warranted
 - Distinguish between data (what was found) and interpretation (what it means)
 ## Application Guidelines
 ### General Approach
 1. **Be Constructive**
   - Identify strengths as well as weaknesses
   - Suggest improvements rather than just criticizing
   - Distinguish between fatal flaws and minor limitations
   - Recognize that all research has limitations
 2. **Be Specific**
   - Point to specific instances (e.g., "Table 2 shows..." or "In the Methods section...")
   - Quote problematic statements
   - Provide concrete examples of issues
   - Reference specific principles or standards violated
 3. **Be Proportionate**
   - Match criticism severity to issue importance
   - Distinguish between major threats to validity and minor concerns
   - Consider whether issues affect primary conclusions
   - Acknowledge uncertainty in your own assessments
 4. **Apply Consistent Standards**
   - Use same criteria across all studies
   - Don't apply stricter standards to findings you dislike
   - Acknowledge your own potential biases
   - Base judgments on methodology, not results
 5. **Consider Context**
   - Acknowledge practical and ethical constraints
   - Consider field-specific norms for effect sizes and methods
   - Recognize exploratory vs. confirmatory contexts
   - Account for resource limitations in evaluating studies
 ### When Providing Critique
 **Structure feedback as:**
 1. **Summary:** Brief overview of what was evaluated
 2. **Strengths:** What was done well (important for credibility and learning)
 3. **Concerns:** Issues organized by severity
   - Critical issues (threaten validity of main conclusions)
   - Important issues (affect interpretation but not fatally)
   - Minor issues (worth noting but don't change conclusions)
 4. **Specific Recommendations:** Actionable suggestions for improvement
 5. **Overall Assessment:** Balanced conclusion about evidence quality and what can be concluded
 **Use precise terminology:**
 - Name specific biases, fallacies, and methodological issues
 - Reference established standards and guidelines
 - Cite principles from scientific methodology
 - Use technical terms accurately
 ### When Uncertain
 - **Acknowledge uncertainty:** "This could be X or Y; additional information needed is Z"
 - **Ask clarifying questions:** "Was [methodological detail] done? This affects interpretation."
 - **Provide conditional assessments:** "If X was done, then Y follows; if not, then Z is concern"
 - **Note what additional information would resolve uncertainty**
 ## Reference Materials
 This skill includes comprehensive reference materials that provide detailed frameworks for critical evaluation:
 - **`references/scientific_method.md`** - Core principles of scientific methodology, the scientific process, critical evaluation criteria, red flags in scientific claims, causal inference standards, peer review, and open science principles
 - **`references/common_biases.md`** - Comprehensive taxonomy of cognitive, experimental, methodological, statistical, and analysis biases with detection and mitigation strategies
 - **`references/statistical_pitfalls.md`** - Common statistical errors and misinterpretations including p-value misunderstandings, multiple comparisons problems, sample size issues, effect size mistakes, correlation/causation confusion, regression pitfalls, and meta-analysis issues
 - **`references/evidence_hierarchy.md`** - Traditional evidence hierarchy, GRADE system, study quality assessment criteria, domain-specific considerations, evidence synthesis principles, and practical decision frameworks
 - **`references/logical_fallacies.md`** - Logical fallacies common in scientific discourse organized by type (causation, generalization, authority, relevance, structure, statistical) with examples and detection strategies
 - **`references/experimental_design.md`** - Comprehensive experimental design checklist covering research questions, hypotheses, study design selection, variables, sampling, blinding, randomization, control groups, procedures, measurement, bias minimization, data management, statistical planning, ethical considerations, validity threats, and reporting standards
 **When to consult references:**
 - Load references into context when detailed frameworks are needed
 - Use grep to search references for specific topics: `grep -r "pattern" references/`
 - References provide depth; SKILL.md provides procedural guidance
 - Consult references for comprehensive lists, detailed criteria, and specific examples
 ## Remember
 **Scientific critical thinking is about:**
 - Systematic evaluation using established principles
 - Constructive critique that improves science
 - Proportional confidence to evidence strength
 - Transparency about uncertainty and limitations
 - Consistent application of standards
 - Recognition that all research has limitations
 - Balance between skepticism and openness to evidence
 **Always distinguish between:**
 - Data (what was observed) and interpretation (what it means)
 - Correlation and causation
 - Statistical significance and practical importance
 - Exploratory and confirmatory findings
 - What is known and what is uncertain
 - Evidence against a claim and evidence for the null
 **Goals of critical thinking:**
 1. Identify strengths and weaknesses accurately
 2. Determine what conclusions are supported
 3. Recognize limitations and uncertainties
 4. Suggest improvements for future work
 5. Advance scientific understanding
--- a/scientific-thinking/scientific-critical-thinking/references/common_biases.md
+++ b/scientific-thinking/scientific-critical-thinking/references/common_biases.md
@@ -0,0 +1,364 @@
 # Common Biases in Scientific Research
 ## Cognitive Biases Affecting Researchers
 ### 1. Confirmation Bias
 **Description:** Tendency to search for, interpret, and recall information that confirms preexisting beliefs.
 **Manifestations:**
 - Designing studies that can only support the hypothesis
 - Interpreting ambiguous results as supportive
 - Remembering hits and forgetting misses
 - Selectively citing literature that agrees
 **Mitigation:**
 - Preregister hypotheses and analysis plans
 - Actively seek disconfirming evidence
 - Use blinded data analysis
 - Consider alternative hypotheses
 ### 2. Hindsight Bias (I-Knew-It-All-Along Effect)
 **Description:** After an event, people perceive it as having been more predictable than it actually was.
 **Manifestations:**
 - HARKing (Hypothesizing After Results are Known)
 - Claiming predictions that weren't made
 - Underestimating surprise at results
 **Mitigation:**
 - Document predictions before data collection
 - Preregister studies
 - Distinguish exploratory from confirmatory analyses
 ### 3. Publication Bias (File Drawer Problem)
 **Description:** Positive/significant results are more likely to be published than negative/null results.
 **Manifestations:**
 - Literature appears to support effects that don't exist
 - Overestimation of effect sizes
 - Inability to estimate true effects from published literature
 **Mitigation:**
 - Publish null results
 - Use preregistration and registered reports
 - Conduct systematic reviews with grey literature
 - Check for funnel plot asymmetry in meta-analyses
 ### 4. Anchoring Bias
 **Description:** Over-reliance on the first piece of information encountered.
 **Manifestations:**
 - Initial hypotheses unduly influence interpretation
 - First studies in a field set expectations
 - Pilot data biases main study interpretation
 **Mitigation:**
 - Consider multiple initial hypotheses
 - Evaluate evidence independently
 - Use structured decision-making
 ### 5. Availability Heuristic
 **Description:** Overestimating likelihood of events based on how easily examples come to mind.
 **Manifestations:**
 - Overemphasizing recent or dramatic findings
 - Neglecting base rates
 - Anecdotal evidence overshadowing statistics
 **Mitigation:**
 - Consult systematic reviews, not memorable papers
 - Consider base rates explicitly
 - Use statistical thinking, not intuition
 ### 6. Bandwagon Effect
 **Description:** Adopting beliefs because many others hold them.
 **Manifestations:**
 - Following research trends without critical evaluation
 - Citing widely-cited papers without reading
 - Accepting "textbook knowledge" uncritically
 **Mitigation:**
 - Evaluate evidence independently
 - Read original sources
 - Question assumptions
 ### 7. Belief Perseverance
 **Description:** Maintaining beliefs even after evidence disproving them.
 **Manifestations:**
 - Defending theories despite contradictory evidence
 - Finding ad hoc explanations for discrepant results
 - Dismissing replication failures
 **Mitigation:**
 - Explicitly consider what evidence would change your mind
 - Update beliefs based on evidence
 - Distinguish between theories and ego
 ### 8. Outcome Bias
 **Description:** Judging decisions based on outcomes rather than the quality of the decision at the time.
 **Manifestations:**
 - Valuing lucky guesses over sound methodology
 - Dismissing good studies with null results
 - Rewarding sensational findings over rigorous methods
 **Mitigation:**
 - Evaluate methodology independently of results
 - Value rigor and transparency
 - Recognize role of chance
 ## Experimental and Methodological Biases
 ### 9. Selection Bias
 **Description:** Systematic differences between those selected for study and those not selected.
 **Types:**
 - **Sampling bias:** Non-random sample
 - **Attrition bias:** Systematic dropout
 - **Volunteer bias:** Self-selected participants differ
 - **Berkson's bias:** Hospital patients differ from general population
 - **Survivorship bias:** Only examining "survivors"
 **Detection:**
 - Compare characteristics of participants vs. target population
 - Analyze dropout patterns
 - Consider who is missing from the sample
 **Mitigation:**
 - Random sampling
 - Track and analyze non-responders
 - Use strategies to minimize dropout
 - Report participant flow diagrams
 ### 10. Observer Bias (Detection Bias)
 **Description:** Researchers' expectations influence observations or measurements.
 **Manifestations:**
 - Measuring outcomes differently across groups
 - Interpreting ambiguous results based on group assignment
 - Unconsciously cueing participants
 **Mitigation:**
 - Blinding of observers/assessors
 - Objective, automated measurements
 - Standardized protocols
 - Inter-rater reliability checks
 ### 11. Performance Bias
 **Description:** Systematic differences in care provided to comparison groups.
 **Manifestations:**
 - Treating experimental group differently
 - Providing additional attention to one group
 - Differential adherence to protocols
 **Mitigation:**
 - Standardize all procedures
 - Blind participants and providers
 - Use placebo controls
 - Monitor protocol adherence
 ### 12. Measurement Bias (Information Bias)
 **Description:** Systematic errors in how variables are measured.
 **Types:**
 - **Recall bias:** Systematic differences in accuracy of recall
 - **Social desirability bias:** Responding in socially acceptable ways
 - **Interviewer bias:** Interviewer's characteristics affect responses
 - **Instrument bias:** Measurement tools systematically err
 **Mitigation:**
 - Use validated, objective measures
 - Standardize data collection
 - Blind participants to hypotheses
 - Verify self-reports with objective data
 ### 13. Confounding Bias
 **Description:** Effect of extraneous variable mixed with the variable of interest.
 **Examples:**
 - Age confounding relationship between exercise and health
 - Socioeconomic status confounding education and outcomes
 - Indication bias in treatment studies
 **Mitigation:**
 - Randomization
 - Matching
 - Statistical adjustment
 - Stratification
 - Restriction
 ### 14. Reporting Bias
 **Description:** Selective reporting of results.
 **Types:**
 - **Outcome reporting bias:** Selectively reporting outcomes
 - **Time-lag bias:** Delayed publication of negative results
 - **Language bias:** Publishing positive results in English
 - **Citation bias:** Preferentially citing positive studies
 **Mitigation:**
 - Preregister all outcomes
 - Report all planned analyses
 - Distinguish primary from secondary outcomes
 - Use study registries
 ### 15. Spectrum Bias
 **Description:** Test performance varies depending on the spectrum of disease severity in the sample.
 **Manifestations:**
 - Diagnostic tests appearing more accurate in extreme cases
 - Treatment effects differing by severity
 **Mitigation:**
 - Test in representative samples
 - Report performance across disease spectrum
 - Avoid case-control designs for diagnostic studies
 ### 16. Lead-Time Bias
 **Description:** Apparent survival benefit due to earlier detection, not improved outcomes.
 **Example:**
 - Screening detecting disease earlier makes survival seem longer, even if death occurs at same age
 **Mitigation:**
 - Measure mortality, not just survival from diagnosis
 - Use randomized screening trials
 - Consider length-time and overdiagnosis bias
 ### 17. Length-Time Bias
 **Description:** Screening disproportionately detects slower-growing, less aggressive cases.
 **Example:**
 - Slow-growing cancers detected more often than fast-growing ones, making screening appear beneficial
 **Mitigation:**
 - Randomized trials with mortality endpoints
 - Consider disease natural history
 ### 18. Response Bias
 **Description:** Systematic pattern in how participants respond.
 **Types:**
 - **Acquiescence bias:** Tendency to agree
 - **Extreme responding:** Always choosing extreme options
 - **Neutral responding:** Avoiding extreme responses
 - **Demand characteristics:** Responding based on perceived expectations
 **Mitigation:**
 - Mix positive and negative items
 - Use multiple response formats
 - Blind participants to hypotheses
 - Use behavioral measures
 ## Statistical and Analysis Biases
 ### 19. P-Hacking (Data Dredging)
 **Description:** Manipulating data or analyses until significant results emerge.
 **Manifestations:**
 - Collecting data until significance reached
 - Testing multiple outcomes, reporting only significant ones
 - Trying multiple analysis methods
 - Excluding "outliers" to reach significance
 - Subgroup analyses until finding significance
 **Detection:**
 - Suspiciously perfect p-values (just below .05)
 - Many researcher degrees of freedom
 - Undisclosed analyses
 - Fishing expeditions
 **Mitigation:**
 - Preregister analysis plans
 - Report all analyses conducted
 - Correct for multiple comparisons
 - Distinguish exploratory from confirmatory
 ### 20. HARKing (Hypothesizing After Results are Known)
 **Description:** Presenting post hoc hypotheses as if they were predicted a priori.
 **Why problematic:**
 - Inflates apparent evidence
 - Conflates exploration with confirmation
 - Misrepresents the scientific process
 **Mitigation:**
 - Preregister hypotheses
 - Clearly label exploratory analyses
 - Require replication of unexpected findings
 ### 21. Base Rate Neglect
 **Description:** Ignoring prior probability when evaluating evidence.
 **Example:**
 - Test with 95% accuracy in rare disease (1% prevalence): positive result only 16% likely to indicate disease
 **Mitigation:**
 - Always consider base rates/prior probability
 - Use Bayesian reasoning
 - Report positive and negative predictive values
 ### 22. Regression to the Mean
 **Description:** Extreme measurements tend to be followed by less extreme ones.
 **Manifestations:**
 - Treatment effects in extreme groups may be regression artifacts
 - "Sophomore slump" in high performers
 **Mitigation:**
 - Use control groups
 - Consider natural variation
 - Don't select based on extreme baseline values without controls
 ### 23. Texas Sharpshooter Fallacy
 **Description:** Selecting data after seeing patterns, like shooting arrows then drawing targets around clusters.
 **Manifestations:**
 - Finding patterns in random data
 - Subgroup analyses selected post hoc
 - Geographic clustering studies without correction
 **Mitigation:**
 - Prespecify hypotheses
 - Correct for multiple comparisons
 - Replicate findings in independent data
 ## Reducing Bias: Best Practices
 ### Study Design
 1. Randomization
 2. Blinding (single, double, triple)
 3. Control groups
 4. Adequate sample size
 5. Preregistration
 ### Data Collection
 1. Standardized protocols
 2. Validated instruments
 3. Objective measures when possible
 4. Multiple observers/raters
 5. Complete data collection
 ### Analysis
 1. Intention-to-treat analysis
 2. Prespecified analyses
 3. Appropriate statistical tests
 4. Multiple comparison corrections
 5. Sensitivity analyses
 ### Reporting
 1. Complete transparency
 2. CONSORT, PRISMA, or similar guidelines
 3. Report all outcomes
 4. Distinguish exploratory from confirmatory
 5. Share data and code
 ### Meta-Level
 1. Adversarial collaboration
 2. Replication studies
 3. Open science practices
 4. Peer review
 5. Systematic reviews
--- a/scientific-thinking/scientific-critical-thinking/references/evidence_hierarchy.md
+++ b/scientific-thinking/scientific-critical-thinking/references/evidence_hierarchy.md
@@ -0,0 +1,484 @@
 # Evidence Hierarchy and Quality Assessment
 ## Traditional Evidence Hierarchy (Medical/Clinical)
 ### Level 1: Systematic Reviews and Meta-Analyses
 **Description:** Comprehensive synthesis of all available evidence on a question.
 **Strengths:**
 - Combines multiple studies for greater power
 - Reduces impact of single-study anomalies
 - Can identify patterns across studies
 - Quantifies overall effect size
 **Weaknesses:**
 - Quality depends on included studies ("garbage in, garbage out")
 - Publication bias can distort findings
 - Heterogeneity may make pooling inappropriate
 - Can mask important differences between studies
 **Critical evaluation:**
 - Was search comprehensive (multiple databases, grey literature)?
 - Were inclusion criteria appropriate and prespecified?
 - Was study quality assessed?
 - Was heterogeneity explored?
 - Was publication bias assessed (funnel plots, fail-safe N)?
 - Were appropriate statistical methods used?
 ### Level 2: Randomized Controlled Trials (RCTs)
 **Description:** Experimental studies with random assignment to conditions.
 **Strengths:**
 - Gold standard for establishing causation
 - Controls for known and unknown confounders
 - Minimizes selection bias
 - Enables causal inference
 **Weaknesses:**
 - May not be ethical or feasible
 - Artificial settings may limit generalizability
 - Often short-term with selected populations
 - Expensive and time-consuming
 **Critical evaluation:**
 - Was randomization adequate (sequence generation, allocation concealment)?
 - Was blinding implemented (participants, providers, assessors)?
 - Was sample size adequate (power analysis)?
 - Was intention-to-treat analysis used?
 - Was attrition rate acceptable and balanced?
 - Are results generalizable?
 ### Level 3: Cohort Studies
 **Description:** Observational studies following groups over time.
 **Types:**
 - **Prospective:** Follow forward from exposure to outcome
 - **Retrospective:** Look backward at existing data
 **Strengths:**
 - Can study multiple outcomes
 - Establishes temporal sequence
 - Can calculate incidence and relative risk
 - More feasible than RCTs for many questions
 **Weaknesses:**
 - Susceptible to confounding
 - Selection bias possible
 - Attrition can bias results
 - Cannot prove causation definitively
 **Critical evaluation:**
 - Were cohorts comparable at baseline?
 - Was exposure measured reliably?
 - Was follow-up adequate and complete?
 - Were potential confounders measured and controlled?
 - Was outcome assessment blinded to exposure?
 ### Level 4: Case-Control Studies
 **Description:** Compare people with outcome (cases) to those without (controls), looking back at exposures.
 **Strengths:**
 - Efficient for rare outcomes
 - Relatively quick and inexpensive
 - Can study multiple exposures
 - Useful for generating hypotheses
 **Weaknesses:**
 - Cannot calculate incidence
 - Susceptible to recall bias
 - Selection of controls is challenging
 - Cannot prove causation
 **Critical evaluation:**
 - Were cases and controls defined clearly?
 - Were controls appropriate (same source population)?
 - Was matching appropriate?
 - How was exposure ascertained (records vs. recall)?
 - Were potential confounders controlled?
 - Could recall bias explain findings?
 ### Level 5: Cross-Sectional Studies
 **Description:** Snapshot observation at single point in time.
 **Strengths:**
 - Quick and inexpensive
 - Can assess prevalence
 - Useful for hypothesis generation
 - Can study multiple outcomes and exposures
 **Weaknesses:**
 - Cannot establish temporal sequence
 - Cannot determine causation
 - Prevalence-incidence bias
 - Survival bias
 **Critical evaluation:**
 - Was sample representative?
 - Were measures validated?
 - Could reverse causation explain findings?
 - Are confounders acknowledged?
 ### Level 6: Case Series and Case Reports
 **Description:** Description of observations in clinical practice.
 **Strengths:**
 - Can identify new diseases or effects
 - Hypothesis-generating
 - Details rare phenomena
 - Quick to report
 **Weaknesses:**
 - No control group
 - No statistical inference possible
 - Highly susceptible to bias
 - Cannot establish causation or frequency
 **Use:** Primarily for hypothesis generation and clinical description.
 ### Level 7: Expert Opinion
 **Description:** Statements by recognized authorities.
 **Strengths:**
 - Synthesizes experience
 - Useful when no research available
 - May integrate multiple sources
 **Weaknesses:**
 - Subjective and potentially biased
 - May not reflect current evidence
 - Appeal to authority fallacy risk
 - Individual expertise varies
 **Use:** Lowest level of evidence; should be supported by data when possible.
 ## Nuances and Limitations of Traditional Hierarchy
 ### When Lower-Level Evidence Can Be Strong
 1. **Well-designed observational studies** with:
   - Large effects (hard to confound)
   - Dose-response relationships
   - Consistent findings across contexts
   - Biological plausibility
   - No plausible confounders
 2. **Multiple converging lines of evidence** from different study types
 3. **Natural experiments** approximating randomization
 ### When Higher-Level Evidence Can Be Weak
 1. **Poor-quality RCTs** with:
   - Inadequate randomization
   - High attrition
   - No blinding when feasible
   - Conflicts of interest
 2. **Biased meta-analyses**:
   - Publication bias
   - Selective inclusion
   - Inappropriate pooling
   - Poor search strategy
 3. **Not addressing the right question**:
   - Wrong population
   - Wrong comparison
   - Wrong outcome
   - Too artificial to generalize
 ## Alternative: GRADE System
 GRADE (Grading of Recommendations Assessment, Development and Evaluation) assesses evidence quality across four levels:
 ### High Quality
 **Definition:** Very confident that true effect is close to estimated effect.
 **Characteristics:**
 - Well-conducted RCTs
 - Overwhelming evidence from observational studies
 - Large, consistent effects
 - No serious limitations
 ### Moderate Quality
 **Definition:** Moderately confident; true effect likely close to estimated, but could be substantially different.
 **Downgrades from high:**
 - Some risk of bias
 - Inconsistency across studies
 - Indirectness (different populations/interventions)
 - Imprecision (wide confidence intervals)
 - Publication bias suspected
 ### Low Quality
 **Definition:** Limited confidence; true effect may be substantially different.
 **Downgrades:**
 - Serious limitations in above factors
 - Observational studies without special strengths
 ### Very Low Quality
 **Definition:** Very limited confidence; true effect likely substantially different.
 **Characteristics:**
 - Very serious limitations
 - Expert opinion
 - Multiple serious flaws
 ## Study Quality Assessment Criteria
 ### Internal Validity (Bias Control)
 **Questions:**
 - Was randomization adequate?
 - Was allocation concealed?
 - Were groups similar at baseline?
 - Was blinding implemented?
 - Was attrition minimal and balanced?
 - Was intention-to-treat used?
 - Were all outcomes reported?
 ### External Validity (Generalizability)
 **Questions:**
 - Is sample representative of target population?
 - Are inclusion/exclusion criteria too restrictive?
 - Is setting realistic?
 - Are results applicable to other populations?
 - Are effects consistent across subgroups?
 ### Statistical Conclusion Validity
 **Questions:**
 - Was sample size adequate (power)?
 - Were statistical tests appropriate?
 - Were assumptions checked?
 - Were effect sizes and confidence intervals reported?
 - Were multiple comparisons addressed?
 - Was analysis prespecified?
 ### Construct Validity (Measurement)
 **Questions:**
 - Were measures validated and reliable?
 - Was outcome defined clearly and appropriately?
 - Were assessors blinded?
 - Were exposures measured accurately?
 - Was timing of measurement appropriate?
 ## Critical Appraisal Tools
 ### For Different Study Types
 **RCTs:**
 - Cochrane Risk of Bias Tool
 - Jadad Scale
 - PEDro Scale (for trials in physical therapy)
 **Observational Studies:**
 - Newcastle-Ottawa Scale
 - ROBINS-I (Risk of Bias in Non-randomized Studies)
 **Diagnostic Studies:**
 - QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies)
 **Systematic Reviews:**
 - AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews)
 **All Study Types:**
 - CASP Checklists (Critical Appraisal Skills Programme)
 ## Domain-Specific Considerations
 ### Basic Science Research
 **Hierarchy differs:**
 1. Multiple convergent lines of evidence
 2. Mechanistic understanding
 3. Reproducible experiments
 4. Established theoretical framework
 **Key considerations:**
 - Replication essential
 - Mechanistic plausibility
 - Consistency across model systems
 - Convergence of methods
 ### Psychological Research
 **Additional concerns:**
 - Replication crisis
 - Publication bias particularly problematic
 - Small effect sizes often expected
 - Cultural context matters
 - Measures often indirect (self-report)
 **Strong evidence includes:**
 - Preregistered studies
 - Large samples
 - Multiple measures
 - Behavioral (not just self-report) outcomes
 - Cross-cultural replication
 ### Epidemiology
 **Causal inference frameworks:**
 - Bradford Hill criteria
 - Rothman's causal pies
 - Directed Acyclic Graphs (DAGs)
 **Strong observational evidence:**
 - Dose-response relationships
 - Temporal consistency
 - Biological plausibility
 - Specificity
 - Consistency across populations
 - Large effects unlikely due to confounding
 ### Social Sciences
 **Challenges:**
 - Complex interventions
 - Context-dependent effects
 - Measurement challenges
 - Ethical constraints on RCTs
 **Strengthening evidence:**
 - Mixed methods
 - Natural experiments
 - Instrumental variables
 - Regression discontinuity designs
 - Multiple operationalizations
 ## Synthesizing Evidence Across Studies
 ### Consistency
 **Strong evidence:**
 - Multiple studies, different investigators
 - Different populations and settings
 - Different research designs converge
 - Different measurement methods
 **Weak evidence:**
 - Single study
 - Only one research group
 - Conflicting results
 - Publication bias evident
 ### Biological/Theoretical Plausibility
 **Strengthens evidence:**
 - Known mechanism
 - Consistent with other knowledge
 - Dose-response relationship
 - Coherent with animal/in vitro data
 **Weakens evidence:**
 - No plausible mechanism
 - Contradicts established knowledge
 - Biological implausibility
 ### Temporality
 **Essential for causation:**
 - Cause must precede effect
 - Cross-sectional studies cannot establish
 - Reverse causation must be ruled out
 ### Specificity
 **Moderate indicator:**
 - Specific cause → specific effect strengthens causation
 - But lack of specificity doesn't rule out causation
 - Most causes have multiple effects
 ### Strength of Association
 **Strong evidence:**
 - Large effects unlikely to be due to confounding
 - Dose-response relationships
 - All-or-none effects
 **Caution:**
 - Small effects may still be real
 - Large effects can still be confounded
 ## Red Flags in Evidence Quality
 ### Study Design Red Flags
 - No control group
 - Self-selected participants
 - No randomization when feasible
 - No blinding when feasible
 - Very small sample
 - Inappropriate statistical tests
 ### Reporting Red Flags
 - Selective outcome reporting
 - No study registration/protocol
 - Missing methodological details
 - No conflicts of interest statement
 - Cherry-picked citations
 - Results don't match methods
 ### Interpretation Red Flags
 - Causal language from correlational data
 - Claiming "proof"
 - Ignoring limitations
 - Overgeneralizing
 - Spinning negative results
 - Post hoc rationalization
 ### Context Red Flags
 - Industry funding without independence
 - Single study in isolation
 - Contradicts preponderance of evidence
 - No replication
 - Published in predatory journal
 - Press release before peer review
 ## Practical Decision Framework
 ### When Evaluating Evidence, Ask:
 1. **What type of study is this?** (Design)
 2. **How well was it conducted?** (Quality)
 3. **What does it actually show?** (Results)
 4. **How likely is bias?** (Internal validity)
 5. **Does it apply to my question?** (External validity)
 6. **How does it fit with other evidence?** (Context)
 7. **Are the conclusions justified?** (Interpretation)
 8. **What are the limitations?** (Uncertainty)
 ### Making Decisions with Imperfect Evidence
 **High-quality evidence:**
 - Strong confidence in acting on findings
 - Reasonable to change practice/policy
 **Moderate-quality evidence:**
 - Provisional conclusions
 - Consider in conjunction with other factors
 - May warrant action depending on stakes
 **Low-quality evidence:**
 - Weak confidence
 - Hypothesis-generating
 - Insufficient for major decisions alone
 - Consider cost/benefit of waiting for better evidence
 **Very low-quality evidence:**
 - Very uncertain
 - Should not drive decisions alone
 - Useful for identifying gaps and research needs
 ### When Evidence is Conflicting
 **Strategies:**
 1. Weight by study quality
 2. Look for systematic differences (population, methods)
 3. Consider publication bias
 4. Update with most recent, rigorous evidence
 5. Conduct/await systematic review
 6. Consider if question is well-formed
 ## Communicating Evidence Strength
 **Avoid:**
 - Absolute certainty ("proves")
 - False balance (equal weight to unequal evidence)
 - Ignoring uncertainty
 - Cherry-picking studies
 **Better:**
 - Quantify uncertainty
 - Describe strength of evidence
 - Acknowledge limitations
 - Present range of evidence
 - Distinguish established from emerging findings
 - Be clear about what is/isn't known
--- a/scientific-thinking/scientific-critical-thinking/references/experimental_design.md
+++ b/scientific-thinking/scientific-critical-thinking/references/experimental_design.md
@@ -0,0 +1,496 @@
 # Experimental Design Checklist
 ## Research Question Formulation
 ### Is the Question Well-Formed?
 - [ ] **Specific:** Clearly defined variables and relationships
 - [ ] **Answerable:** Can be addressed with available methods
 - [ ] **Relevant:** Addresses a gap in knowledge or practical need
 - [ ] **Feasible:** Resources, time, and ethical considerations allow it
 - [ ] **Falsifiable:** Can be proven wrong if incorrect
 ### Have You Reviewed the Literature?
 - [ ] Identified what's already known
 - [ ] Found gaps or contradictions to address
 - [ ] Learned from methodological successes and failures
 - [ ] Identified appropriate outcome measures
 - [ ] Determined typical effect sizes in the field
 ## Hypothesis Development
 ### Is Your Hypothesis Testable?
 - [ ] Makes specific, quantifiable predictions
 - [ ] Variables are operationally defined
 - [ ] Specifies direction/nature of expected relationships
 - [ ] Can be falsified by potential observations
 ### Types of Hypotheses
 - [ ] **Null hypothesis (H₀):** No effect/relationship exists
 - [ ] **Alternative hypothesis (H₁):** Effect/relationship exists
 - [ ] **Directional vs. non-directional:** One-tailed vs. two-tailed tests
 ## Study Design Selection
 ### What Type of Study is Appropriate?
 **Experimental (Intervention) Studies:**
 - [ ] **Randomized Controlled Trial (RCT):** Gold standard for causation
 - [ ] **Quasi-experimental:** Non-random assignment but manipulation
 - [ ] **Within-subjects:** Same participants in all conditions
 - [ ] **Between-subjects:** Different participants per condition
 - [ ] **Factorial:** Multiple independent variables
 - [ ] **Crossover:** Participants receive multiple interventions sequentially
 **Observational Studies:**
 - [ ] **Cohort:** Follow groups over time
 - [ ] **Case-control:** Compare those with/without outcome
 - [ ] **Cross-sectional:** Snapshot at one time point
 - [ ] **Ecological:** Population-level data
 **Consider:**
 - [ ] Can you randomly assign participants?
 - [ ] Can you manipulate the independent variable?
 - [ ] Is the outcome rare (favor case-control) or common?
 - [ ] Do you need to establish temporal sequence?
 - [ ] What's feasible given ethical, practical constraints?
 ## Variables
 ### Independent Variables (Manipulated/Predictor)
 - [ ] Clearly defined and operationalized
 - [ ] Appropriate levels/categories chosen
 - [ ] Manipulation is sufficient to test hypothesis
 - [ ] Manipulation check planned (if applicable)
 ### Dependent Variables (Outcome/Response)
 - [ ] Directly measures the construct of interest
 - [ ] Validated and reliable measurement
 - [ ] Sensitive enough to detect expected effects
 - [ ] Appropriate for statistical analysis planned
 - [ ] Primary outcome clearly designated
 ### Control Variables
 - [ ] **Confounding variables identified:**
  - Variables that affect both IV and DV
  - Alternative explanations for findings
 - [ ] **Strategy for control:**
  - Randomization
  - Matching
  - Stratification
  - Statistical adjustment
  - Restriction (inclusion/exclusion criteria)
  - Blinding
 ### Extraneous Variables
 - [ ] Potential sources of noise identified
 - [ ] Standardized procedures to minimize
 - [ ] Environmental factors controlled
 - [ ] Time of day, setting, equipment standardized
 ## Sampling
 ### Population Definition
 - [ ] **Target population:** Who you want to generalize to
 - [ ] **Accessible population:** Who you can actually sample from
 - [ ] **Sample:** Who actually participates
 - [ ] Difference between these documented
 ### Sampling Method
 - [ ] **Probability sampling (preferred for generalizability):**
  - Simple random sampling
  - Stratified sampling
  - Cluster sampling
  - Systematic sampling
 - [ ] **Non-probability sampling (common but limits generalizability):**
  - Convenience sampling
  - Purposive sampling
  - Snowball sampling
  - Quota sampling
 ### Sample Size
 - [ ] **A priori power analysis conducted**
  - Expected effect size (from literature or pilot)
  - Desired power (typically .80 or .90)
  - Significance level (typically .05)
  - Statistical test to be used
 - [ ] Accounts for expected attrition/dropout
 - [ ] Sufficient for planned subgroup analyses
 - [ ] Practical constraints acknowledged
 ### Inclusion/Exclusion Criteria
 - [ ] Clearly defined and justified
 - [ ] Not overly restrictive (limits generalizability)
 - [ ] Based on theoretical or practical considerations
 - [ ] Ethical considerations addressed
 - [ ] Documented and applied consistently
 ## Blinding and Randomization
 ### Randomization
 - [ ] **What is randomized:**
  - Participant assignment to conditions
  - Order of conditions (within-subjects)
  - Stimuli/items presented
 - [ ] **Method of randomization:**
  - Computer-generated random numbers
  - Random number tables
  - Coin flips (for very small studies)
 - [ ] **Allocation concealment:**
  - Sequence generated before recruitment
  - Allocation hidden until after enrollment
  - Sequentially numbered, sealed envelopes (if needed)
 - [ ] **Stratified randomization:**
  - Balance important variables across groups
  - Block randomization to ensure equal group sizes
 - [ ] **Check randomization:**
  - Compare groups at baseline
  - Report any significant differences
 ### Blinding
 - [ ] **Single-blind:** Participants don't know group assignment
 - [ ] **Double-blind:** Participants and researchers don't know
 - [ ] **Triple-blind:** Participants, researchers, and data analysts don't know
 - [ ] **Blinding feasibility:**
  - Is true blinding possible?
  - Placebo/sham controls needed?
  - Identical appearance of interventions?
 - [ ] **Blinding check:**
  - Assess whether blinding maintained
  - Ask participants/researchers to guess assignments
 ## Control Groups and Conditions
 ### What Type of Control?
 - [ ] **No treatment control:** Natural course of condition
 - [ ] **Placebo control:** Inert treatment for comparison
 - [ ] **Active control:** Standard treatment comparison
 - [ ] **Wait-list control:** Delayed treatment
 - [ ] **Attention control:** Matches contact time without active ingredient
 ### Multiple Conditions
 - [ ] Factorial designs for multiple factors
 - [ ] Dose-response relationship assessment
 - [ ] Mechanism testing with component analyses
 ## Procedures
 ### Protocol Development
 - [ ] **Detailed, written protocol:**
  - Step-by-step procedures
  - Scripts for standardized instructions
  - Decision rules for handling issues
  - Data collection forms
 - [ ] Pilot tested before main study
 - [ ] Staff trained to criterion
 - [ ] Compliance monitoring planned
 ### Standardization
 - [ ] Same instructions for all participants
 - [ ] Same equipment and materials
 - [ ] Same environment/setting when possible
 - [ ] Same assessment timing
 - [ ] Deviations from protocol documented
 ### Data Collection
 - [ ] **When collected:**
  - Baseline measurements
  - Post-intervention
  - Follow-up timepoints
 - [ ] **Who collects:**
  - Trained researchers
  - Blinded when possible
  - Inter-rater reliability established
 - [ ] **How collected:**
  - Valid, reliable instruments
  - Standardized administration
  - Multiple methods if possible (triangulation)
 ## Measurement
 ### Validity
 - [ ] **Face validity:** Appears to measure construct
 - [ ] **Content validity:** Covers all aspects of construct
 - [ ] **Criterion validity:** Correlates with gold standard
  - Concurrent validity
  - Predictive validity
 - [ ] **Construct validity:** Measures theoretical construct
  - Convergent validity (correlates with related measures)
  - Discriminant validity (doesn't correlate with unrelated measures)
 ### Reliability
 - [ ] **Test-retest:** Consistent over time
 - [ ] **Internal consistency:** Items measure same construct (Cronbach's α)
 - [ ] **Inter-rater reliability:** Agreement between raters (Cohen's κ, ICC)
 - [ ] **Parallel forms:** Alternative versions consistent
 ### Measurement Considerations
 - [ ] Objective measures preferred when possible
 - [ ] Validated instruments used when available
 - [ ] Multiple measures of key constructs
 - [ ] Sensitivity to change considered
 - [ ] Floor/ceiling effects avoided
 - [ ] Response formats appropriate
 - [ ] Recall periods appropriate
 - [ ] Cultural appropriateness considered
 ## Bias Minimization
 ### Selection Bias
 - [ ] Random sampling when possible
 - [ ] Clearly defined eligibility criteria
 - [ ] Document who declines and why
 - [ ] Minimize self-selection
 ### Performance Bias
 - [ ] Standardized protocols
 - [ ] Blinding of providers
 - [ ] Monitor protocol adherence
 - [ ] Document deviations
 ### Detection Bias
 - [ ] Blinding of outcome assessors
 - [ ] Objective measures when possible
 - [ ] Standardized assessment procedures
 - [ ] Multiple raters with reliability checks
 ### Attrition Bias
 - [ ] Strategies to minimize dropout
 - [ ] Track reasons for dropout
 - [ ] Compare dropouts to completers
 - [ ] Intention-to-treat analysis planned
 ### Reporting Bias
 - [ ] Preregister study and analysis plan
 - [ ] Designate primary vs. secondary outcomes
 - [ ] Commit to reporting all outcomes
 - [ ] Distinguish planned from exploratory analyses
 ## Data Management
 ### Data Collection
 - [ ] Data collection forms designed and tested
 - [ ] REDCap, Qualtrics, or similar platforms
 - [ ] Range checks and validation rules
 - [ ] Regular backups
 - [ ] Secure storage (HIPAA/GDPR compliant if needed)
 ### Data Quality
 - [ ] Real-time data validation
 - [ ] Regular quality checks
 - [ ] Missing data patterns monitored
 - [ ] Outliers identified and investigated
 - [ ] Protocol deviations documented
 ### Data Security
 - [ ] De-identification procedures
 - [ ] Access controls
 - [ ] Audit trails
 - [ ] Compliance with regulations (IRB, HIPAA, GDPR)
 ## Statistical Analysis Planning
 ### Analysis Plan (Prespecify Before Data Collection)
 - [ ] **Primary analysis:**
  - Statistical test(s) specified
  - Hypothesis clearly stated
  - Significance level set (usually α = .05)
  - One-tailed or two-tailed
 - [ ] **Secondary analyses:**
  - Clearly designated as secondary
  - Exploratory analyses labeled as such
 - [ ] **Multiple comparisons:**
  - Adjustment method specified (if needed)
  - Primary outcome protects from inflation
 ### Assumptions
 - [ ] Assumptions of statistical tests identified
 - [ ] Plan to check assumptions
 - [ ] Backup non-parametric alternatives
 - [ ] Transformation options considered
 ### Missing Data
 - [ ] Anticipated amount of missingness
 - [ ] Missing data mechanism (MCAR, MAR, MNAR)
 - [ ] Handling strategy:
  - Complete case analysis
  - Multiple imputation
  - Maximum likelihood
 - [ ] Sensitivity analyses planned
 ### Effect Sizes
 - [ ] Appropriate effect size measures identified
 - [ ] Will be reported alongside p-values
 - [ ] Confidence intervals planned
 ### Statistical Software
 - [ ] Software selected (R, SPSS, Stata, Python, etc.)
 - [ ] Version documented
 - [ ] Analysis scripts prepared in advance
 - [ ] Will be made available (Open Science)
 ## Ethical Considerations
 ### Ethical Approval
 - [ ] IRB/Ethics committee approval obtained
 - [ ] Study registered (ClinicalTrials.gov, etc.) if applicable
 - [ ] Protocol follows Declaration of Helsinki or equivalent
 ### Informed Consent
 - [ ] Voluntary participation
 - [ ] Comprehensible explanation
 - [ ] Risks and benefits disclosed
 - [ ] Right to withdraw without penalty
 - [ ] Privacy protections explained
 - [ ] Compensation disclosed
 ### Risk-Benefit Analysis
 - [ ] Potential benefits outweigh risks
 - [ ] Risks minimized
 - [ ] Vulnerable populations protected
 - [ ] Data safety monitoring (if high risk)
 ### Confidentiality
 - [ ] Data de-identified
 - [ ] Secure storage
 - [ ] Limited access
 - [ ] Reporting doesn't allow re-identification
 ## Validity Threats
 ### Internal Validity (Causation)
 - [ ] **History:** External events between measurements
 - [ ] **Maturation:** Changes in participants over time
 - [ ] **Testing:** Effects of repeated measurement
 - [ ] **Instrumentation:** Changes in measurement over time
 - [ ] **Regression to mean:** Extreme scores becoming less extreme
 - [ ] **Selection:** Groups differ at baseline
 - [ ] **Attrition:** Differential dropout
 - [ ] **Diffusion:** Control group receives treatment elements
 ### External Validity (Generalizability)
 - [ ] Sample representative of population
 - [ ] Setting realistic/natural
 - [ ] Treatment typical of real-world implementation
 - [ ] Outcome measures ecologically valid
 - [ ] Time frame appropriate
 ### Construct Validity (Measurement)
 - [ ] Measures actually tap intended constructs
 - [ ] Operations match theoretical definitions
 - [ ] No confounding of constructs
 - [ ] Adequate coverage of construct
 ### Statistical Conclusion Validity
 - [ ] Adequate statistical power
 - [ ] Assumptions met
 - [ ] Appropriate tests used
 - [ ] Alpha level appropriate
 - [ ] Multiple comparisons addressed
 ## Reporting and Transparency
 ### Preregistration
 - [ ] Study preregistered (OSF, ClinicalTrials.gov, AsPredicted)
 - [ ] Hypotheses stated a priori
 - [ ] Analysis plan documented
 - [ ] Distinguishes confirmatory from exploratory
 ### Reporting Guidelines
 - [ ] **RCTs:** CONSORT checklist
 - [ ] **Observational studies:** STROBE checklist
 - [ ] **Systematic reviews:** PRISMA checklist
 - [ ] **Diagnostic studies:** STARD checklist
 - [ ] **Qualitative research:** COREQ checklist
 - [ ] **Case reports:** CARE guidelines
 ### Transparency
 - [ ] All measures reported
 - [ ] All manipulations disclosed
 - [ ] Sample size determination explained
 - [ ] Exclusion criteria and numbers reported
 - [ ] Attrition documented
 - [ ] Deviations from protocol noted
 - [ ] Conflicts of interest disclosed
 ### Open Science
 - [ ] Data sharing planned (when ethical)
 - [ ] Analysis code shared
 - [ ] Materials available
 - [ ] Preprint posted
 - [ ] Open access publication when possible
 ## Post-Study Considerations
 ### Data Analysis
 - [ ] Follow preregistered plan
 - [ ] Clearly label deviations and exploratory analyses
 - [ ] Check assumptions
 - [ ] Report all outcomes
 - [ ] Report effect sizes and CIs, not just p-values
 ### Interpretation
 - [ ] Conclusions supported by data
 - [ ] Limitations acknowledged
 - [ ] Alternative explanations considered
 - [ ] Generalizability discussed
 - [ ] Clinical/practical significance addressed
 ### Dissemination
 - [ ] Publish regardless of results (reduce publication bias)
 - [ ] Present at conferences
 - [ ] Share findings with participants (when appropriate)
 - [ ] Communicate to relevant stakeholders
 - [ ] Plain language summaries
 ### Next Steps
 - [ ] Replication needed?
 - [ ] Follow-up studies identified
 - [ ] Mechanism studies planned
 - [ ] Clinical applications considered
 ## Common Pitfalls to Avoid
 - [ ] No power analysis → underpowered study
 - [ ] Hypothesis formed after seeing data (HARKing)
 - [ ] No blinding when feasible → bias
 - [ ] P-hacking (data fishing, optional stopping)
 - [ ] Multiple testing without correction → false positives
 - [ ] Inadequate control group
 - [ ] Confounding not addressed
 - [ ] Instruments not validated
 - [ ] High attrition not addressed
 - [ ] Cherry-picking results to report
 - [ ] Causal language from correlational data
 - [ ] Ignoring assumptions of statistical tests
 - [ ] Not preregistering changes literature bias
 - [ ] Conflicts of interest not disclosed
 ## Final Checklist Before Starting
 - [ ] Research question is clear and important
 - [ ] Hypothesis is testable and specific
 - [ ] Study design is appropriate
 - [ ] Sample size is adequate (power analysis)
 - [ ] Measures are valid and reliable
 - [ ] Confounds are controlled
 - [ ] Randomization and blinding implemented
 - [ ] Data collection is standardized
 - [ ] Analysis plan is prespecified
 - [ ] Ethical approval obtained
 - [ ] Study is preregistered
 - [ ] Resources are sufficient
 - [ ] Team is trained
 - [ ] Protocol is documented
 - [ ] Backup plans exist for problems
 ## Remember
 **Good experimental design is about:**
 - Asking clear questions
 - Minimizing bias
 - Maximizing validity
 - Appropriate inference
 - Transparency
 - Reproducibility
 **The best time to think about these issues is before collecting data, not after.**
--- a/scientific-thinking/scientific-critical-thinking/references/logical_fallacies.md
+++ b/scientific-thinking/scientific-critical-thinking/references/logical_fallacies.md
@@ -0,0 +1,478 @@
 # Logical Fallacies in Scientific Discourse
 ## Fallacies of Causation
 ### 1. Post Hoc Ergo Propter Hoc (After This, Therefore Because of This)
 **Description:** Assuming that because B happened after A, A caused B.
 **Examples:**
 - "I took this supplement and my cold went away, so the supplement cured my cold."
 - "Autism diagnoses increased after vaccine schedules changed, so vaccines cause autism."
 - "I wore my lucky socks and won the game, so the socks caused the win."
 **Why fallacious:** Temporal sequence is necessary but not sufficient for causation. Correlation ≠ causation.
 **Related:** *Cum hoc ergo propter hoc* (with this, therefore because of this) - correlation mistaken for causation even without temporal order.
 ### 2. Confusing Correlation with Causation
 **Description:** Assuming correlation implies direct causal relationship.
 **Examples:**
 - "Countries that eat more chocolate have more Nobel Prize winners, so chocolate makes you smarter."
 - "Ice cream sales correlate with drowning deaths, so ice cream causes drowning."
 **Reality:** Often due to confounding variables (hot weather causes both ice cream sales and swimming).
 ### 3. Reverse Causation
 **Description:** Confusing cause and effect direction.
 **Examples:**
 - "Depression is associated with inflammation, so inflammation causes depression." (Could be: depression causes inflammation)
 - "Wealthy people are healthier, so wealth causes health." (Could be: health enables wealth accumulation)
 **Solution:** Longitudinal studies and experimental designs to establish temporal order.
 ### 4. Single Cause Fallacy
 **Description:** Attributing complex phenomena to one cause when multiple factors contribute.
 **Examples:**
 - "Crime is caused by poverty." (Ignores many other contributing factors)
 - "Heart disease is caused by fat intake." (Oversimplifies multifactorial disease)
 **Reality:** Most outcomes have multiple contributing causes.
 ## Fallacies of Generalization
 ### 5. Hasty Generalization
 **Description:** Drawing broad conclusions from insufficient evidence.
 **Examples:**
 - "My uncle smoked and lived to 90, so smoking isn't dangerous."
 - "This drug worked in 5 patients, so it's effective for everyone."
 - "I saw three black swans, so all swans are black."
 **Why fallacious:** Small, unrepresentative samples don't support universal claims.
 ### 6. Anecdotal Fallacy
 **Description:** Using personal experience or isolated examples as proof.
 **Examples:**
 - "I know someone who survived cancer using alternative medicine, so it works."
 - "My grandmother never exercised and lived to 100, so exercise is unnecessary."
 **Why fallacious:** Anecdotes are unreliable due to selection bias, memory bias, and confounding. Plural of anecdote ≠ data.
 ### 7. Cherry Picking (Suppressing Evidence)
 **Description:** Selecting only evidence that supports your position while ignoring contradictory evidence.
 **Examples:**
 - Citing only studies showing supplement benefits while ignoring null findings
 - Highlighting successful predictions while ignoring failed ones
 - Showing graphs that start at convenient points
 **Detection:** Look for systematic reviews, not individual studies.
 ### 8. Ecological Fallacy
 **Description:** Inferring individual characteristics from group statistics.
 **Example:**
 - "Average income in this neighborhood is high, so this person must be wealthy."
 - "This country has low disease rates, so any individual from there is unlikely to have disease."
 **Why fallacious:** Group-level patterns don't necessarily apply to individuals.
 ## Fallacies of Authority and Tradition
 ### 9. Appeal to Authority (Argumentum ad Verecundiam)
 **Description:** Accepting claims because an authority figure said them, without evidence.
 **Examples:**
 - "Dr. X says this treatment works, so it must." (If Dr. X provides no data)
 - "Einstein believed in God, so God exists." (Einstein's physics expertise doesn't transfer)
 - "99% of doctors recommend..." (Appeal to majority + authority without evidence)
 **Valid use of authority:** Experts providing evidence-based consensus in their domain.
 **Invalid:** Authority opinions without evidence, or outside their expertise.
 ### 10. Appeal to Antiquity/Tradition
 **Description:** Assuming something is true or good because it's old or traditional.
 **Examples:**
 - "Traditional medicine has been used for thousands of years, so it must work."
 - "This theory has been accepted for decades, so it must be correct."
 **Why fallacious:** Age doesn't determine validity. Many old beliefs have been disproven.
 ### 11. Appeal to Novelty
 **Description:** Assuming something is better because it's new.
 **Examples:**
 - "This is the latest treatment, so it must be superior."
 - "New research overturns everything we knew." (Often overstated)
 **Why fallacious:** New ≠ better. Established treatments often outperform novel ones.
 ## Fallacies of Relevance
 ### 12. Ad Hominem (Attack the Person)
 **Description:** Attacking the person making the argument rather than the argument itself.
 **Types:**
 - **Abusive:** "He's an idiot, so his theory is wrong."
 - **Circumstantial:** "She's funded by industry, so her findings are false."
 - **Tu Quoque:** "You smoke, so your anti-smoking argument is invalid."
 **Why fallacious:** Personal characteristics don't determine argument validity.
 **Note:** Conflicts of interest are worth noting but don't invalidate evidence.
 ### 13. Genetic Fallacy
 **Description:** Judging something based on its origin rather than its merits.
 **Examples:**
 - "This idea came from a drug company, so it's wrong."
 - "Ancient Greeks believed this, so it's outdated."
 **Better approach:** Evaluate evidence regardless of source.
 ### 14. Appeal to Emotion
 **Description:** Manipulating emotions instead of presenting evidence.
 **Types:**
 - **Appeal to fear:** "If you don't vaccinate, your child will die."
 - **Appeal to pity:** "Think of the suffering patients who need this unproven treatment."
 - **Appeal to flattery:** "Smart people like you know that..."
 **Why fallacious:** Emotional reactions don't determine truth.
 ### 15. Appeal to Consequences (Argumentum ad Consequentiam)
 **Description:** Arguing something is true/false based on whether consequences are desirable.
 **Examples:**
 - "Climate change can't be real because the solutions would hurt the economy."
 - "Free will must exist because without it, morality is impossible."
 **Why fallacious:** Reality is independent of what we wish were true.
 ### 16. Appeal to Nature (Naturalistic Fallacy)
 **Description:** Assuming "natural" means good, safe, or effective.
 **Examples:**
 - "This treatment is natural, so it's safe."
 - "Organic food is natural, so it's healthier."
 - "Vaccines are unnatural, so they're harmful."
 **Why fallacious:**
 - Many natural things are deadly (arsenic, snake venom, hurricanes)
 - Many synthetic things are beneficial (antibiotics, vaccines)
 - "Natural" is often poorly defined
 ### 17. Moralistic Fallacy
 **Description:** Assuming what ought to be true is true.
 **Examples:**
 - "There shouldn't be sex differences in ability, so they don't exist."
 - "People should be rational, so they are."
 **Why fallacious:** Desires about reality don't change reality.
 ## Fallacies of Structure
 ### 18. False Dichotomy (False Dilemma)
 **Description:** Presenting only two options when more exist.
 **Examples:**
 - "Either you're with us or against us."
 - "It's either genetic or environmental." (Usually both)
 - "Either the treatment works or it doesn't." (Ignores partial effects)
 **Reality:** Most issues have multiple options and shades of gray.
 ### 19. Begging the Question (Circular Reasoning)
 **Description:** Assuming what you're trying to prove.
 **Examples:**
 - "This medicine works because it has healing properties." (What are healing properties? That it works!)
 - "God exists because the Bible says so, and the Bible is true because it's God's word."
 **Detection:** Check if the conclusion is hidden in the premises.
 ### 20. Moving the Goalposts
 **Description:** Changing standards of evidence after initial standards are met.
 **Example:**
 - Skeptic: "Show me one study."
 - [Shows study]
 - Skeptic: "That's just one study; show me a meta-analysis."
 - [Shows meta-analysis]
 - Skeptic: "But meta-analyses have limitations..."
 **Why problematic:** No amount of evidence will ever be sufficient.
 ### 21. Slippery Slope
 **Description:** Arguing that one step will inevitably lead to extreme outcomes without justification.
 **Example:**
 - "If we allow gene editing for disease, we'll end up with designer babies and eugenics."
 **When valid:** If intermediate steps are actually likely.
 **When fallacious:** If chain of events is speculative without evidence.
 ### 22. Straw Man
 **Description:** Misrepresenting an argument to make it easier to attack.
 **Example:**
 - Position: "We should teach evolution in schools."
 - Straw man: "So you think we should tell kids they're just monkeys?"
 **Detection:** Ask: Is this really what they're claiming?
 ## Fallacies of Statistical and Scientific Reasoning
 ### 23. Texas Sharpshooter Fallacy
 **Description:** Cherry-picking data clusters to fit a pattern, like shooting arrows then drawing targets around them.
 **Examples:**
 - Finding cancer clusters and claiming environmental causes (without accounting for random clustering)
 - Data mining until finding significant correlations
 **Why fallacious:** Patterns in random data are inevitable; finding them doesn't prove causation.
 ### 24. Base Rate Fallacy
 **Description:** Ignoring prior probability when evaluating evidence.
 **Example:**
 - Disease affects 0.1% of population; test is 99% accurate
 - Positive test ≠ 99% probability of disease
 - Actually ~9% probability (due to false positives exceeding true positives)
 **Solution:** Use Bayesian reasoning; consider base rates.
 ### 25. Prosecutor's Fallacy
 **Description:** Confusing P(Evidence|Innocent) with P(Innocent|Evidence).
 **Example:**
 - "The probability of this DNA match occurring by chance is 1 in 1 million, so there's only a 1 in 1 million chance the defendant is innocent."
 **Why fallacious:** Ignores base rates and prior probability.
 ### 26. McNamara Fallacy (Quantitative Fallacy)
 **Description:** Focusing only on what can be easily measured while ignoring important unmeasured factors.
 **Example:**
 - Judging school quality only by test scores (ignoring creativity, social skills, ethics)
 - Measuring healthcare only by quantifiable outcomes (ignoring quality of life)
 **Quote:** "Not everything that counts can be counted, and not everything that can be counted counts."
 ### 27. Multiple Comparisons Fallacy
 **Description:** Not accounting for increased false positive rate when testing many hypotheses.
 **Example:**
 - Testing 20 hypotheses at p < .05 gives ~65% chance of at least one false positive
 - Claiming jellybean color X causes acne after testing 20 colors
 **Solution:** Correct for multiple comparisons (Bonferroni, FDR).
 ### 28. Reification (Hypostatization)
 **Description:** Treating abstract concepts as if they were concrete things.
 **Examples:**
 - "Evolution wants organisms to survive." (Evolution doesn't "want")
 - "The gene for intelligence" (Intelligence isn't one gene)
 - "Nature selects..." (Nature doesn't consciously select)
 **Why problematic:** Can lead to confused thinking about mechanisms.
 ## Fallacies of Scope and Definition
 ### 29. No True Scotsman
 **Description:** Retroactively excluding counterexamples by redefining criteria.
 **Example:**
 - "No natural remedy has side effects."
 - "But poison ivy is natural and causes reactions."
 - "Well, no *true* natural remedy has side effects."
 **Why fallacious:** Moves goalposts to protect claim from falsification.
 ### 30. Equivocation
 **Description:** Using a word with multiple meanings inconsistently.
 **Example:**
 - "Evolution is just a theory. Theories are guesses. So evolution is just a guess."
 - (Conflates colloquial "theory" with scientific "theory")
 **Detection:** Check if key terms are used consistently.
 ### 31. Ambiguity
 **Description:** Using vague language that can be interpreted multiple ways.
 **Example:**
 - "Quantum healing" (What does "quantum" mean here?)
 - "Natural" (Animals? Not synthetic? Organic? Common?)
 **Why problematic:** Claims become unfalsifiable when terms are undefined.
 ### 32. Mind Projection Fallacy
 **Description:** Projecting mental constructs onto reality.
 **Example:**
 - Assuming categories that exist in language exist in nature
 - "Which chromosome is the gene for X on?" when X is polygenic and partially environmental
 **Better:** Recognize human categories may not carve nature at the joints.
 ## Fallacies Specific to Science
 ### 33. Galileo Gambit
 **Description:** "They laughed at Galileo, and he was right, so if they're laughing at me, I must be right too."
 **Why fallacious:**
 - They laughed at Galileo, and he was right
 - They also laughed at countless crackpots who were wrong
 - Being an outsider doesn't make you right
 **Reality:** Revolutionary ideas are usually well-supported by evidence.
 ### 34. Argument from Ignorance (Ad Ignorantiam)
 **Description:** Assuming something is true because it hasn't been proven false (or vice versa).
 **Examples:**
 - "No one has proven homeopathy doesn't work, so it works."
 - "We haven't found evidence of harm, so it must be safe."
 **Why fallacious:** Absence of evidence ≠ evidence of absence (though it can be, depending on how hard we've looked).
 **Burden of proof:** Falls on the claimant, not the skeptic.
 ### 35. God of the Gaps
 **Description:** Explaining gaps in knowledge by invoking supernatural or unfalsifiable causes.
 **Examples:**
 - "We don't fully understand consciousness, so it must be spiritual."
 - "This complexity couldn't arise naturally, so it must be designed."
 **Why problematic:**
 - Fills gaps with non-explanations
 - Discourages genuine investigation
 - History shows gaps get filled by natural explanations
 ### 36. Nirvana Fallacy (Perfect Solution Fallacy)
 **Description:** Rejecting solutions because they're imperfect.
 **Examples:**
 - "Vaccines aren't 100% effective, so they're worthless."
 - "This diet doesn't work for everyone, so it doesn't work."
 **Reality:** Most interventions are partial; perfection is rare.
 **Better:** Compare to alternatives, not to perfection.
 ### 37. Special Pleading
 **Description:** Applying standards to others but not to oneself.
 **Examples:**
 - "My anecdotes count as evidence, but yours don't."
 - "Mainstream medicine needs RCTs, but my alternative doesn't."
 - "Correlation doesn't imply causation—except when it supports my view."
 **Why fallacious:** Evidence standards should apply consistently.
 ### 38. Unfalsifiability
 **Description:** Formulating claims in ways that cannot be tested or disproven.
 **Examples:**
 - "This energy can't be detected by any instrument."
 - "It works, but only if you truly believe."
 - "Failures prove the conspiracy is even deeper."
 **Why problematic:** Unfalsifiable claims aren't scientific; they can't be tested.
 **Good science:** Makes specific, testable predictions.
 ### 39. Affirming the Consequent
 **Description:** If A, then B. B is true. Therefore, A is true.
 **Example:**
 - "If the drug works, symptoms improve. Symptoms improved. Therefore, the drug worked."
 - (Could be placebo, natural history, regression to mean)
 **Why fallacious:** Other causes could produce the same outcome.
 **Valid form:** Modus ponens: If A, then B. A is true. Therefore, B is true.
 ### 40. Denying the Antecedent
 **Description:** If A, then B. A is false. Therefore, B is false.
 **Example:**
 - "If you have fever, you have infection. You don't have fever. Therefore, you don't have infection."
 **Why fallacious:** B can be true even when A is false.
 ## Avoiding Logical Fallacies
 ### Practical Steps
 1. **Identify the claim** - What exactly is being argued?
 2. **Identify the evidence** - What supports the claim?
 3. **Check the logic** - Does the evidence actually support the claim?
 4. **Look for hidden assumptions** - What unstated beliefs does the argument rely on?
 5. **Consider alternatives** - What other explanations fit the evidence?
 6. **Check for emotional manipulation** - Is the argument relying on feelings rather than facts?
 7. **Evaluate the source** - Are there conflicts of interest? Is this within their expertise?
 8. **Look for balance** - Are counterarguments addressed fairly?
 9. **Assess the evidence** - Is it anecdotal, observational, or experimental? How strong?
 10. **Be charitable** - Interpret arguments in their strongest form (steel man, not straw man).
 ### Questions to Ask
 - Is the conclusion supported by the premises?
 - Are there unstated assumptions?
 - Is the evidence relevant to the conclusion?
 - Are counterarguments acknowledged?
 - Could alternative explanations account for the evidence?
 - Is the reasoning consistent?
 - Are terms defined clearly?
 - Is evidence being cherry-picked?
 - Are emotions being manipulated?
 - Would this reasoning apply consistently to other cases?
 ### Common Patterns
 **Good Arguments:**
 - Clearly defined terms
 - Relevant, sufficient evidence
 - Valid logical structure
 - Acknowledges limitations and alternatives
 - Proportional conclusions
 - Transparent about uncertainty
 - Applies consistent standards
 **Poor Arguments:**
 - Vague or shifting definitions
 - Irrelevant or insufficient evidence
 - Logical leaps
 - Ignores counterevidence
 - Overclaimed conclusions
 - False certainty
 - Double standards
 ## Remember
 - **Fallacious reasoning doesn't mean the conclusion is false** - just that this argument doesn't support it.
 - **Identifying fallacies isn't about winning** - it's about better understanding reality.
 - **We all commit fallacies** - recognizing them in ourselves is as important as in others.
 - **Charity principle** - Interpret arguments generously; don't assume bad faith.
 - **Focus on claims, not people** - Ad hominem goes both ways.
--- a/scientific-thinking/scientific-critical-thinking/references/scientific_method.md
+++ b/scientific-thinking/scientific-critical-thinking/references/scientific_method.md
@@ -0,0 +1,169 @@
 # Scientific Method Core Principles
 ## Fundamental Principles
 ### 1. Empiricism
 - Knowledge derives from observable, measurable evidence
 - Claims must be testable through observation or experiment
 - Subjective experience alone is insufficient for scientific conclusions
 ### 2. Falsifiability (Popper's Criterion)
 - A hypothesis must be capable of being proven false
 - Unfalsifiable claims are not scientific (e.g., "invisible, undetectable forces")
 - Good hypotheses make specific, testable predictions
 ### 3. Reproducibility
 - Results must be replicable by independent researchers
 - Methods must be described with sufficient detail for replication
 - Single studies are rarely definitive; replication strengthens confidence
 ### 4. Parsimony (Occam's Razor)
 - Prefer simpler explanations over complex ones when both fit the data
 - Don't multiply entities unnecessarily
 - Extraordinary claims require extraordinary evidence
 ### 5. Systematic Observation
 - Use standardized, rigorous methods
 - Control for confounding variables
 - Minimize observer bias through blinding and protocols
 ## The Scientific Process
 ### 1. Question Formation
 - Identify a specific, answerable question
 - Ensure the question is within the scope of scientific inquiry
 - Consider whether current methods can address the question
 ### 2. Literature Review
 - Survey existing knowledge
 - Identify gaps and contradictions
 - Build on previous work rather than reinventing
 ### 3. Hypothesis Development
 - State a clear, testable prediction
 - Define variables operationally
 - Specify the expected relationship between variables
 ### 4. Experimental Design
 - Choose appropriate methodology
 - Identify independent and dependent variables
 - Control confounding variables
 - Select appropriate sample size and population
 - Plan statistical analyses in advance
 ### 5. Data Collection
 - Follow protocols consistently
 - Record all observations, including unexpected results
 - Maintain detailed lab notebooks or data logs
 - Use validated measurement instruments
 ### 6. Analysis
 - Apply appropriate statistical methods
 - Test assumptions of statistical tests
 - Consider effect size, not just significance
 - Look for alternative explanations
 ### 7. Interpretation
 - Distinguish between correlation and causation
 - Acknowledge limitations
 - Consider alternative interpretations
 - Avoid overgeneralizing beyond the data
 ### 8. Communication
 - Report methods transparently
 - Include negative results
 - Acknowledge conflicts of interest
 - Make data and code available when possible
 ## Critical Evaluation Criteria
 ### When Reviewing Scientific Work, Ask:
 **Validity Questions:**
 - Does the study measure what it claims to measure?
 - Are the methods appropriate for the research question?
 - Were controls adequate?
 - Could confounding variables explain the results?
 **Reliability Questions:**
 - Are measurements consistent?
 - Would the study produce similar results if repeated?
 - Are inter-rater reliability and measurement precision reported?
 **Generalizability Questions:**
 - Is the sample representative of the target population?
 - Are the conditions realistic or artificial?
 - Do the results apply beyond the specific context?
 **Statistical Questions:**
 - Is the sample size adequate for the analysis?
 - Are the statistical tests appropriate?
 - Are effect sizes reported alongside p-values?
 - Were multiple comparisons corrected?
 **Logical Questions:**
 - Do the conclusions follow from the data?
 - Are alternative explanations considered?
 - Are causal claims supported by the study design?
 - Are limitations acknowledged?
 ## Red Flags in Scientific Claims
 1. **Cherry-picking data** - Highlighting only supporting evidence
 2. **Moving goalposts** - Changing predictions after seeing results
 3. **Ad hoc hypotheses** - Adding explanations to rescue a failed prediction
 4. **Appeal to authority** - "Expert X says" without evidence
 5. **Anecdotal evidence** - Relying on personal stories over systematic data
 6. **Correlation implies causation** - Confusing association with causality
 7. **Post hoc rationalization** - Explaining results after the fact without prediction
 8. **Ignoring base rates** - Not considering prior probability
 9. **Confirmation bias** - Seeking only evidence that supports beliefs
 10. **Publication bias** - Only positive results get published
 ## Standards for Causal Inference
 ### Bradford Hill Criteria (adapted)
 1. **Strength** - Strong associations are more likely causal
 2. **Consistency** - Repeated observations by different researchers
 3. **Specificity** - Specific outcomes from specific causes
 4. **Temporality** - Cause precedes effect (essential)
 5. **Biological gradient** - Dose-response relationship
 6. **Plausibility** - Coherent with existing knowledge
 7. **Coherence** - Consistent with other evidence
 8. **Experiment** - Experimental evidence supports causation
 9. **Analogy** - Similar cause-effect relationships exist
 ### Establishing Causation Requires:
 - Temporal precedence (cause before effect)
 - Covariation (cause and effect correlate)
 - Elimination of alternative explanations
 - Ideally: experimental manipulation showing cause produces effect
 ## Peer Review and Scientific Consensus
 ### Understanding Peer Review
 - Filters obvious errors but isn't perfect
 - Reviewers can miss problems or have biases
 - Published ≠ proven; it means "passed initial scrutiny"
 - Retraction mechanisms exist for flawed papers
 ### Scientific Consensus
 - Emerges from convergence of multiple independent lines of evidence
 - Consensus can change with new evidence
 - Individual studies rarely overturn consensus
 - Consider the weight of evidence, not individual papers
 ## Open Science Principles
 ### Transparency Practices
 - Preregistration of hypotheses and methods
 - Open data sharing
 - Open-source code
 - Preprints for rapid dissemination
 - Registered reports (peer review before data collection)
 ### Why Transparency Matters
 - Reduces publication bias
 - Enables verification
 - Prevents p-hacking and HARKing (Hypothesizing After Results are Known)
 - Accelerates scientific progress
--- a/scientific-thinking/scientific-critical-thinking/references/statistical_pitfalls.md
+++ b/scientific-thinking/scientific-critical-thinking/references/statistical_pitfalls.md
@@ -0,0 +1,506 @@
 # Common Statistical Pitfalls
 ## P-Value Misinterpretations
 ### Pitfall 1: P-Value = Probability Hypothesis is True
 **Misconception:** p = .05 means 5% chance the null hypothesis is true.
 **Reality:** P-value is the probability of observing data this extreme (or more) *if* the null hypothesis is true. It says nothing about the probability the hypothesis is true.
 **Correct interpretation:** "If there were truly no effect, we would observe data this extreme only 5% of the time."
 ### Pitfall 2: Non-Significant = No Effect
 **Misconception:** p > .05 proves there's no effect.
 **Reality:** Absence of evidence ≠ evidence of absence. Non-significant results may indicate:
 - Insufficient statistical power
 - True effect too small to detect
 - High variability
 - Small sample size
 **Better approach:**
 - Report confidence intervals
 - Conduct power analysis
 - Consider equivalence testing
 ### Pitfall 3: Significant = Important
 **Misconception:** Statistical significance means practical importance.
 **Reality:** With large samples, trivial effects become "significant." A statistically significant 0.1 IQ point difference is meaningless in practice.
 **Better approach:**
 - Report effect sizes
 - Consider practical significance
 - Use confidence intervals
 ### Pitfall 4: P = .049 vs. P = .051
 **Misconception:** These are meaningfully different because one crosses the .05 threshold.
 **Reality:** These represent nearly identical evidence. The .05 threshold is arbitrary.
 **Better approach:**
 - Treat p-values as continuous measures of evidence
 - Report exact p-values
 - Consider context and prior evidence
 ### Pitfall 5: One-Tailed Tests Without Justification
 **Misconception:** One-tailed tests are free extra power.
 **Reality:** One-tailed tests assume effects can only go one direction, which is rarely true. They're often used to artificially boost significance.
 **When appropriate:** Only when effects in one direction are theoretically impossible or equivalent to null.
 ## Multiple Comparisons Problems
 ### Pitfall 6: Multiple Testing Without Correction
 **Problem:** Testing 20 hypotheses at p < .05 gives ~65% chance of at least one false positive.
 **Examples:**
 - Testing many outcomes
 - Testing many subgroups
 - Conducting multiple interim analyses
 - Testing at multiple time points
 **Solutions:**
 - Bonferroni correction (divide α by number of tests)
 - False Discovery Rate (FDR) control
 - Prespecify primary outcome
 - Treat exploratory analyses as hypothesis-generating
 ### Pitfall 7: Subgroup Analysis Fishing
 **Problem:** Testing many subgroups until finding significance.
 **Why problematic:**
 - Inflates false positive rate
 - Often reported without disclosure
 - "Interaction was significant in women" may be random
 **Solutions:**
 - Prespecify subgroups
 - Use interaction tests, not separate tests
 - Require replication
 - Correct for multiple comparisons
 ### Pitfall 8: Outcome Switching
 **Problem:** Analyzing many outcomes, reporting only significant ones.
 **Detection signs:**
 - Secondary outcomes emphasized
 - Incomplete outcome reporting
 - Discrepancy between registration and publication
 **Solutions:**
 - Preregister all outcomes
 - Report all planned outcomes
 - Distinguish primary from secondary
 ## Sample Size and Power Issues
 ### Pitfall 9: Underpowered Studies
 **Problem:** Small samples have low probability of detecting true effects.
 **Consequences:**
 - High false negative rate
 - Significant results more likely to be false positives
 - Overestimated effect sizes (when significant)
 **Solutions:**
 - Conduct a priori power analysis
 - Aim for 80-90% power
 - Consider effect size from prior research
 ### Pitfall 10: Post-Hoc Power Analysis
 **Problem:** Calculating power after seeing results is circular and uninformative.
 **Why useless:**
 - Non-significant results always have low "post-hoc power"
 - It recapitulates the p-value without new information
 **Better approach:**
 - Calculate confidence intervals
 - Plan replication with adequate sample
 - Conduct prospective power analysis for future studies
 ### Pitfall 11: Small Sample Fallacy
 **Problem:** Trusting results from very small samples.
 **Issues:**
 - High sampling variability
 - Outliers have large influence
 - Assumptions of tests violated
 - Confidence intervals very wide
 **Guidelines:**
 - Be skeptical of n < 30
 - Check assumptions carefully
 - Consider non-parametric tests
 - Replicate findings
 ## Effect Size Misunderstandings
 ### Pitfall 12: Ignoring Effect Size
 **Problem:** Focusing only on significance, not magnitude.
 **Why problematic:**
 - Significance ≠ importance
 - Can't compare across studies
 - Doesn't inform practical decisions
 **Solutions:**
 - Always report effect sizes
 - Use standardized measures (Cohen's d, r, η²)
 - Interpret using field conventions
 - Consider minimum clinically important difference
 ### Pitfall 13: Misinterpreting Standardized Effect Sizes
 **Problem:** Treating Cohen's d = 0.5 as "medium" without context.
 **Reality:**
 - Field-specific norms vary
 - Some fields have larger typical effects
 - Real-world importance depends on context
 **Better approach:**
 - Compare to effects in same domain
 - Consider practical implications
 - Look at raw effect sizes too
 ### Pitfall 14: Confusing Explained Variance with Importance
 **Problem:** "Only explains 5% of variance" = unimportant.
 **Reality:**
 - Height explains ~5% of variation in NBA player salary but is crucial
 - Complex phenomena have many small contributors
 - Predictive accuracy ≠ causal importance
 **Consideration:** Context matters more than percentage alone.
 ## Correlation and Causation
 ### Pitfall 15: Correlation Implies Causation
 **Problem:** Inferring causation from correlation.
 **Alternative explanations:**
 - Reverse causation (B causes A, not A causes B)
 - Confounding (C causes both A and B)
 - Coincidence
 - Selection bias
 **Criteria for causation:**
 - Temporal precedence
 - Covariation
 - No plausible alternatives
 - Ideally: experimental manipulation
 ### Pitfall 16: Ecological Fallacy
 **Problem:** Inferring individual-level relationships from group-level data.
 **Example:** Countries with more chocolate consumption have more Nobel laureates doesn't mean eating chocolate makes you win Nobels.
 **Why problematic:** Group-level correlations may not hold at individual level.
 ### Pitfall 17: Simpson's Paradox
 **Problem:** Trend appears in groups but reverses when combined (or vice versa).
 **Example:** Treatment appears worse overall but better in every subgroup.
 **Cause:** Confounding variable distributed differently across groups.
 **Solution:** Consider confounders and look at appropriate level of analysis.
 ## Regression and Modeling Pitfalls
 ### Pitfall 18: Overfitting
 **Problem:** Model fits sample data well but doesn't generalize.
 **Causes:**
 - Too many predictors relative to sample size
 - Fitting noise rather than signal
 - No cross-validation
 **Solutions:**
 - Use cross-validation
 - Penalized regression (LASSO, ridge)
 - Independent test set
 - Simpler models
 ### Pitfall 19: Extrapolation Beyond Data Range
 **Problem:** Predicting outside the range of observed data.
 **Why dangerous:**
 - Relationships may not hold outside observed range
 - Increased uncertainty not reflected in predictions
 **Solution:** Only interpolate; avoid extrapolation.
 ### Pitfall 20: Ignoring Model Assumptions
 **Problem:** Using statistical tests without checking assumptions.
 **Common violations:**
 - Non-normality (for parametric tests)
 - Heteroscedasticity (unequal variances)
 - Non-independence
 - Linearity
 - No multicollinearity
 **Solutions:**
 - Check assumptions with diagnostics
 - Use robust methods
 - Transform data
 - Use appropriate non-parametric alternatives
 ### Pitfall 21: Treating Non-Significant Covariates as Eliminating Confounding
 **Problem:** "We controlled for X and it wasn't significant, so it's not a confounder."
 **Reality:** Non-significant covariates can still be important confounders. Significance ≠ confounding.
 **Solution:** Include theoretically important covariates regardless of significance.
 ### Pitfall 22: Collinearity Masking Effects
 **Problem:** When predictors are highly correlated, true effects may appear non-significant.
 **Manifestations:**
 - Large standard errors
 - Unstable coefficients
 - Sign changes when adding/removing variables
 **Detection:**
 - Variance Inflation Factors (VIF)
 - Correlation matrices
 **Solutions:**
 - Remove redundant predictors
 - Combine correlated variables
 - Use regularization methods
 ## Specific Test Misuses
 ### Pitfall 23: T-Test for Multiple Groups
 **Problem:** Conducting multiple t-tests instead of ANOVA.
 **Why wrong:** Inflates Type I error rate dramatically.
 **Correct approach:**
 - Use ANOVA first
 - Follow with planned comparisons or post-hoc tests with correction
 ### Pitfall 24: Pearson Correlation for Non-Linear Relationships
 **Problem:** Using Pearson's r for curved relationships.
 **Why misleading:** r measures linear relationships only.
 **Solutions:**
 - Check scatterplots first
 - Use Spearman's ρ for monotonic relationships
 - Consider polynomial or non-linear models
 ### Pitfall 25: Chi-Square with Small Expected Frequencies
 **Problem:** Chi-square test with expected cell counts < 5.
 **Why wrong:** Violates test assumptions, p-values inaccurate.
 **Solutions:**
 - Fisher's exact test
 - Combine categories
 - Increase sample size
 ### Pitfall 26: Paired vs. Independent Tests
 **Problem:** Using independent samples test for paired data (or vice versa).
 **Why wrong:**
 - Wastes power (paired data analyzed as independent)
 - Violates independence assumption (independent data analyzed as paired)
 **Solution:** Match test to design.
 ## Confidence Interval Misinterpretations
 ### Pitfall 27: 95% CI = 95% Probability True Value Inside
 **Misconception:** "95% chance the true value is in this interval."
 **Reality:** The true value either is or isn't in this specific interval. If we repeated the study many times, 95% of resulting intervals would contain the true value.
 **Better interpretation:** "We're 95% confident this interval contains the true value."
 ### Pitfall 28: Overlapping CIs = No Difference
 **Problem:** Assuming overlapping confidence intervals mean no significant difference.
 **Reality:** Overlapping CIs are less stringent than difference tests. Two CIs can overlap while the difference between groups is significant.
 **Guideline:** Overlap of point estimate with other CI is more relevant than overlap of intervals.
 ### Pitfall 29: Ignoring CI Width
 **Problem:** Focusing only on whether CI includes zero, not precision.
 **Why important:** Wide CIs indicate high uncertainty. "Significant" effects with huge CIs are less convincing.
 **Consider:** Both significance and precision.
 ## Bayesian vs. Frequentist Confusions
 ### Pitfall 30: Mixing Bayesian and Frequentist Interpretations
 **Problem:** Making Bayesian statements from frequentist analyses.
 **Examples:**
 - "Probability hypothesis is true" (Bayesian) from p-value (frequentist)
 - "Evidence for null" from non-significant result (frequentist can't support null)
 **Solution:**
 - Be clear about framework
 - Use Bayesian methods for Bayesian questions
 - Use Bayes factors to compare hypotheses
 ### Pitfall 31: Ignoring Prior Probability
 **Problem:** Treating all hypotheses as equally likely initially.
 **Reality:** Extraordinary claims need extraordinary evidence. Prior plausibility matters.
 **Consider:**
 - Plausibility given existing knowledge
 - Mechanism plausibility
 - Base rates
 ## Data Transformation Issues
 ### Pitfall 32: Dichotomizing Continuous Variables
 **Problem:** Splitting continuous variables at arbitrary cutoffs.
 **Consequences:**
 - Loss of information and power
 - Arbitrary distinctions
 - Discarding individual differences
 **Exceptions:** Clinically meaningful cutoffs with strong justification.
 **Better:** Keep continuous or use multiple categories.
 ### Pitfall 33: Trying Multiple Transformations
 **Problem:** Testing many transformations until finding significance.
 **Why problematic:** Inflates Type I error, is a form of p-hacking.
 **Better approach:**
 - Prespecify transformations
 - Use theory-driven transformations
 - Correct for multiple testing if exploring
 ## Missing Data Problems
 ### Pitfall 34: Listwise Deletion by Default
 **Problem:** Automatically deleting all cases with any missing data.
 **Consequences:**
 - Reduced power
 - Potential bias if data not missing completely at random (MCAR)
 **Better approaches:**
 - Multiple imputation
 - Maximum likelihood methods
 - Analyze missingness patterns
 ### Pitfall 35: Ignoring Missing Data Mechanisms
 **Problem:** Not considering why data are missing.
 **Types:**
 - MCAR (Missing Completely at Random): Safe to delete
 - MAR (Missing at Random): Can impute
 - MNAR (Missing Not at Random): May bias results
 **Solution:** Analyze patterns, use appropriate methods, consider sensitivity analyses.
 ## Publication and Reporting Issues
 ### Pitfall 36: Selective Reporting
 **Problem:** Only reporting significant results or favorable analyses.
 **Consequences:**
 - Literature appears more consistent than reality
 - Meta-analyses biased
 - Wasted research effort
 **Solutions:**
 - Preregistration
 - Report all analyses
 - Use reporting guidelines (CONSORT, PRISMA, etc.)
 ### Pitfall 37: Rounding to p < .05
 **Problem:** Reporting exact p-values selectively (e.g., p = .049 but p < .05 for .051).
 **Why problematic:** Obscures values near threshold, enables p-hacking detection evasion.
 **Better:** Always report exact p-values.
 ### Pitfall 38: No Data Sharing
 **Problem:** Not making data available for verification or reanalysis.
 **Consequences:**
 - Can't verify results
 - Can't include in meta-analyses
 - Hinders scientific progress
 **Best practice:** Share data unless privacy concerns prohibit.
 ## Cross-Validation and Generalization
 ### Pitfall 39: No Cross-Validation
 **Problem:** Testing model on same data used to build it.
 **Consequence:** Overly optimistic performance estimates.
 **Solutions:**
 - Split data (train/test)
 - K-fold cross-validation
 - Independent validation sample
 ### Pitfall 40: Data Leakage
 **Problem:** Information from test set leaking into training.
 **Examples:**
 - Normalizing before splitting
 - Feature selection on full dataset
 - Including temporal information
 **Consequence:** Inflated performance metrics.
 **Prevention:** All preprocessing decisions made using only training data.
 ## Meta-Analysis Pitfalls
 ### Pitfall 41: Apples and Oranges
 **Problem:** Combining studies with different designs, populations, or measures.
 **Balance:** Need homogeneity but also comprehensiveness.
 **Solutions:**
 - Clear inclusion criteria
 - Subgroup analyses
 - Meta-regression for moderators
 ### Pitfall 42: Ignoring Publication Bias
 **Problem:** Published studies overrepresent significant results.
 **Consequences:** Overestimated effects in meta-analyses.
 **Detection:**
 - Funnel plots
 - Trim-and-fill
 - PET-PEESE
 - P-curve analysis
 **Solutions:**
 - Include unpublished studies
 - Register reviews
 - Use bias-correction methods
 ## General Best Practices
 1. **Preregister studies** - Distinguish confirmatory from exploratory
 2. **Report transparently** - All analyses, not just significant ones
 3. **Check assumptions** - Don't blindly apply tests
 4. **Use appropriate tests** - Match test to data and design
 5. **Report effect sizes** - Not just p-values
 6. **Consider practical significance** - Not just statistical
 7. **Replicate findings** - One study is rarely definitive
 8. **Share data and code** - Enable verification
 9. **Use confidence intervals** - Show uncertainty
 10. **Think causally carefully** - Most research is correlational
--- a/scientific-thinking/scientific-visualization/SKILL.md
+++ b/scientific-thinking/scientific-visualization/SKILL.md
@@ -0,0 +1,448 @@
 ---
 name: scientific-visualization
 description: Create publication-ready scientific figures using best practices and guidelines for matplotlib, seaborn, and plotly. Use this skill when creating plots, charts, or visualizations for scientific papers, when figures need to meet journal requirements (Nature, Science, Cell, etc.), when ensuring colorblind accessibility, or when asked to make figures "publication-quality" or "publication-ready". Also use for multi-panel figures, data visualization with statistical rigor, and figures following specific style guidelines.
 ---
 # Scientific Visualization
 ## Overview
 This skill provides comprehensive guidance, tools, and best practices for creating publication-ready scientific figures. It covers proper figure composition, colorblind-friendly design, journal-specific requirements, and practical implementation using matplotlib, seaborn, and plotly.
 Publication-ready figures must be:
 - **Clear**: Immediately understandable with proper labeling
 - **Accurate**: Truthful data representation without distortion
 - **Accessible**: Interpretable by readers with color vision deficiencies
 - **Professional**: Polished appearance meeting journal standards
 ## When to Use This Skill
 Activate this skill when:
 - Creating plots or visualizations for scientific manuscripts
 - Preparing figures for journal submission (Nature, Science, Cell, PLOS, etc.)
 - Ensuring figures are colorblind-friendly and accessible
 - Making multi-panel figures with consistent styling
 - Exporting figures at correct resolution and format
 - Following specific publication guidelines
 - Improving existing figures to meet publication standards
 - Creating figures that need to work in both color and grayscale
 ## Quick Start Guide
 ### Basic Publication-Quality Figure
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 # Apply publication style (from scripts/style_presets.py)
 from style_presets import apply_publication_style
 apply_publication_style('default')
 # Create figure with appropriate size (single column = 3.5 inches)
 fig, ax = plt.subplots(figsize=(3.5, 2.5))
 # Plot data
 x = np.linspace(0, 10, 100)
 ax.plot(x, np.sin(x), label='sin(x)')
 ax.plot(x, np.cos(x), label='cos(x)')
 # Proper labeling with units
 ax.set_xlabel('Time (seconds)')
 ax.set_ylabel('Amplitude (mV)')
 ax.legend(frameon=False)
 # Remove unnecessary spines
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 # Save in publication formats (from scripts/figure_export.py)
 from figure_export import save_publication_figure
 save_publication_figure(fig, 'figure1', formats=['pdf', 'png'], dpi=300)
 ```
 ### Using Pre-configured Styles
 Apply journal-specific styles using the matplotlib style files in `assets/`:
 ```python
 import matplotlib.pyplot as plt
 # Option 1: Use style file directly
 plt.style.use('assets/nature.mplstyle')
 # Option 2: Use style_presets.py helper
 from style_presets import configure_for_journal
 configure_for_journal('nature', figure_width='single')
 # Now create figures - they'll automatically match Nature specifications
 fig, ax = plt.subplots()
 # ... your plotting code ...
 ```
 ## Core Principles and Best Practices
 ### 1. Resolution and File Format
 **Critical requirements** (detailed in `references/publication_guidelines.md`):
 - **Raster images** (photos, microscopy): 300-600 DPI
 - **Line art** (graphs, plots): 600-1200 DPI or vector format
 - **Vector formats** (preferred): PDF, EPS, SVG
 - **Raster formats**: TIFF, PNG (never JPEG for scientific data)
 **Implementation:**
 ```python
 # Use the figure_export.py script for correct settings
 from figure_export import save_publication_figure
 # Saves in multiple formats with proper DPI
 save_publication_figure(fig, 'myfigure', formats=['pdf', 'png'], dpi=300)
 # Or save for specific journal requirements
 from figure_export import save_for_journal
 save_for_journal(fig, 'figure1', journal='nature', figure_type='combination')
 ```
 ### 2. Color Selection - Colorblind Accessibility
 **Always use colorblind-friendly palettes** (detailed in `references/color_palettes.md`):
 **Recommended: Okabe-Ito palette** (distinguishable by all types of color blindness):
 ```python
 # Option 1: Use assets/color_palettes.py
 from color_palettes import OKABE_ITO_LIST, apply_palette
 apply_palette('okabe_ito')
 # Option 2: Manual specification
 okabe_ito = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
             '#0072B2', '#D55E00', '#CC79A7', '#000000']
 plt.rcParams['axes.prop_cycle'] = plt.cycler(color=okabe_ito)
 ```
 **For heatmaps/continuous data:**
 - Use perceptually uniform colormaps: `viridis`, `plasma`, `cividis`
 - Avoid red-green diverging maps (use `PuOr`, `RdBu`, `BrBG` instead)
 - Never use `jet` or `rainbow` colormaps
 **Always test figures in grayscale** to ensure interpretability.
 ### 3. Typography and Text
 **Font guidelines** (detailed in `references/publication_guidelines.md`):
 - Sans-serif fonts: Arial, Helvetica, Calibri
 - Minimum sizes at **final print size**:
  - Axis labels: 7-9 pt
  - Tick labels: 6-8 pt
  - Panel labels: 8-12 pt (bold)
 - Sentence case for labels: "Time (hours)" not "TIME (HOURS)"
 - Always include units in parentheses
 **Implementation:**
 ```python
 # Set fonts globally
 import matplotlib as mpl
 mpl.rcParams['font.family'] = 'sans-serif'
 mpl.rcParams['font.sans-serif'] = ['Arial', 'Helvetica']
 mpl.rcParams['font.size'] = 8
 mpl.rcParams['axes.labelsize'] = 9
 mpl.rcParams['xtick.labelsize'] = 7
 mpl.rcParams['ytick.labelsize'] = 7
 ```
 ### 4. Figure Dimensions
 **Journal-specific widths** (detailed in `references/journal_requirements.md`):
 - **Nature**: Single 89 mm, Double 183 mm
 - **Science**: Single 55 mm, Double 175 mm
 - **Cell**: Single 85 mm, Double 178 mm
 **Check figure size compliance:**
 ```python
 from figure_export import check_figure_size
 fig = plt.figure(figsize=(3.5, 3))  # 89 mm for Nature
 check_figure_size(fig, journal='nature')
 ```
 ### 5. Multi-Panel Figures
 **Best practices:**
 - Label panels with bold letters: **A**, **B**, **C** (uppercase for most journals, lowercase for Nature)
 - Maintain consistent styling across all panels
 - Align panels along edges where possible
 - Use adequate white space between panels
 **Example implementation** (see `references/matplotlib_examples.md` for complete code):
 ```python
 from string import ascii_uppercase
 fig = plt.figure(figsize=(7, 4))
 gs = fig.add_gridspec(2, 2, hspace=0.4, wspace=0.4)
 ax1 = fig.add_subplot(gs[0, 0])
 ax2 = fig.add_subplot(gs[0, 1])
 # ... create other panels ...
 # Add panel labels
 for i, ax in enumerate([ax1, ax2, ...]):
    ax.text(-0.15, 1.05, ascii_uppercase[i], transform=ax.transAxes,
            fontsize=10, fontweight='bold', va='top')
 ```
 ## Common Tasks
 ### Task 1: Create a Publication-Ready Line Plot
 See `references/matplotlib_examples.md` Example 1 for complete code.
 **Key steps:**
 1. Apply publication style
 2. Set appropriate figure size for target journal
 3. Use colorblind-friendly colors
 4. Add error bars with correct representation (SEM, SD, or CI)
 5. Label axes with units
 6. Remove unnecessary spines
 7. Save in vector format
 ### Task 2: Create a Multi-Panel Figure
 See `references/matplotlib_examples.md` Example 2 for complete code.
 **Key steps:**
 1. Use `GridSpec` for flexible layout
 2. Ensure consistent styling across panels
 3. Add bold panel labels (A, B, C, etc.)
 4. Align related panels
 5. Verify all text is readable at final size
 ### Task 3: Create a Heatmap with Proper Colormap
 See `references/matplotlib_examples.md` Example 4 for complete code.
 **Key steps:**
 1. Use perceptually uniform colormap (`viridis`, `plasma`, `cividis`)
 2. Include labeled colorbar
 3. For diverging data, use colorblind-safe diverging map (`RdBu_r`, `PuOr`)
 4. Set appropriate center value for diverging maps
 5. Test appearance in grayscale
 ### Task 4: Prepare Figure for Specific Journal
 **Workflow:**
 1. Check journal requirements: `references/journal_requirements.md`
 2. Configure matplotlib for journal:
   ```python
   from style_presets import configure_for_journal
   configure_for_journal('nature', figure_width='single')
   ```
 3. Create figure (will auto-size correctly)
 4. Export with journal specifications:
   ```python
   from figure_export import save_for_journal
   save_for_journal(fig, 'figure1', journal='nature', figure_type='line_art')
   ```
 ### Task 5: Fix an Existing Figure to Meet Publication Standards
 **Checklist approach** (full checklist in `references/publication_guidelines.md`):
 1. **Check resolution**: Verify DPI meets journal requirements
 2. **Check file format**: Use vector for plots, TIFF/PNG for images
 3. **Check colors**: Ensure colorblind-friendly
 4. **Check fonts**: Minimum 6-7 pt at final size, sans-serif
 5. **Check labels**: All axes labeled with units
 6. **Check size**: Matches journal column width
 7. **Test grayscale**: Figure interpretable without color
 8. **Remove chart junk**: No unnecessary grids, 3D effects, shadows
 ### Task 6: Create Colorblind-Friendly Visualizations
 **Strategy:**
 1. Use approved palettes from `assets/color_palettes.py`
 2. Add redundant encoding (line styles, markers, patterns)
 3. Test with colorblind simulator
 4. Ensure grayscale compatibility
 **Example:**
 ```python
 from color_palettes import apply_palette
 import matplotlib.pyplot as plt
 apply_palette('okabe_ito')
 # Add redundant encoding beyond color
 line_styles = ['-', '--', '-.', ':']
 markers = ['o', 's', '^', 'v']
 for i, (data, label) in enumerate(datasets):
    plt.plot(x, data, linestyle=line_styles[i % 4],
             marker=markers[i % 4], label=label)
 ```
 ## Statistical Rigor
 **Always include:**
 - Error bars (SD, SEM, or CI - specify which in caption)
 - Sample size (n) in figure or caption
 - Statistical significance markers (*, **, ***)
 - Individual data points when possible (not just summary statistics)
 **Example with statistics:**
 ```python
 # Show individual points with summary statistics
 ax.scatter(x_jittered, individual_points, alpha=0.4, s=8)
 ax.errorbar(x, means, yerr=sems, fmt='o', capsize=3)
 # Mark significance
 ax.text(1.5, max_y * 1.1, '***', ha='center', fontsize=8)
 ```
 ## Working with Different Plotting Libraries
 ### Matplotlib
 - Most control over publication details
 - Best for complex multi-panel figures
 - Use provided style files for consistent formatting
 - See `references/matplotlib_examples.md` for extensive examples
 ### Seaborn
 - Built on matplotlib, inherits all matplotlib customizations
 - Good for statistical plots
 - Apply matplotlib styles first, then use seaborn
 ```python
 import seaborn as sns
 from style_presets import apply_publication_style
 apply_publication_style('default')
 sns.set_palette(['#E69F00', '#56B4E9', '#009E73'])  # Okabe-Ito colors
 ```
 ### Plotly
 - Interactive figures for exploration
 - Export static images for publication
 - Configure for publication quality:
 ```python
 fig.update_layout(
    font=dict(family='Arial, sans-serif', size=10),
    plot_bgcolor='white',
    # ... see matplotlib_examples.md Example 8
 )
 fig.write_image('figure.png', scale=3)  # scale=3 gives ~300 DPI
 ```
 ## Resources
 ### References Directory
 **Load these as needed for detailed information:**
 - **`publication_guidelines.md`**: Comprehensive best practices
  - Resolution and file format requirements
  - Typography guidelines
  - Layout and composition rules
  - Statistical rigor requirements
  - Complete publication checklist
 - **`color_palettes.md`**: Color usage guide
  - Colorblind-friendly palette specifications with RGB values
  - Sequential and diverging colormap recommendations
  - Testing procedures for accessibility
  - Domain-specific palettes (genomics, microscopy)
 - **`journal_requirements.md`**: Journal-specific specifications
  - Technical requirements by publisher
  - File format and DPI specifications
  - Figure dimension requirements
  - Quick reference table
 - **`matplotlib_examples.md`**: Practical code examples
  - 10 complete working examples
  - Line plots, bar plots, heatmaps, multi-panel figures
  - Journal-specific figure examples
  - Tips for each library (matplotlib, seaborn, plotly)
 ### Scripts Directory
 **Use these helper scripts for automation:**
 - **`figure_export.py`**: Export utilities
  - `save_publication_figure()`: Save in multiple formats with correct DPI
  - `save_for_journal()`: Use journal-specific requirements automatically
  - `check_figure_size()`: Verify dimensions meet journal specs
  - Run directly: `python scripts/figure_export.py` for examples
 - **`style_presets.py`**: Pre-configured styles
  - `apply_publication_style()`: Apply preset styles (default, nature, science, cell)
  - `set_color_palette()`: Quick palette switching
  - `configure_for_journal()`: One-command journal configuration
  - Run directly: `python scripts/style_presets.py` to see examples
 ### Assets Directory
 **Use these files in figures:**
 - **`color_palettes.py`**: Importable color definitions
  - All recommended palettes as Python constants
  - `apply_palette()` helper function
  - Can be imported directly into notebooks/scripts
 - **Matplotlib style files**: Use with `plt.style.use()`
  - `publication.mplstyle`: General publication quality
  - `nature.mplstyle`: Nature journal specifications
  - `presentation.mplstyle`: Larger fonts for posters/slides
 ## Workflow Summary
 **Recommended workflow for creating publication figures:**
 1. **Plan**: Determine target journal, figure type, and content
 2. **Configure**: Apply appropriate style for journal
   ```python
   from style_presets import configure_for_journal
   configure_for_journal('nature', 'single')
   ```
 3. **Create**: Build figure with proper labels, colors, statistics
 4. **Verify**: Check size, fonts, colors, accessibility
   ```python
   from figure_export import check_figure_size
   check_figure_size(fig, journal='nature')
   ```
 5. **Export**: Save in required formats
   ```python
   from figure_export import save_for_journal
   save_for_journal(fig, 'figure1', 'nature', 'combination')
   ```
 6. **Review**: View at final size in manuscript context
 ## Common Pitfalls to Avoid
 1. **Font too small**: Text unreadable when printed at final size
 2. **JPEG format**: Never use JPEG for graphs/plots (creates artifacts)
 3. **Red-green colors**: ~8% of males cannot distinguish
 4. **Low resolution**: Pixelated figures in publication
 5. **Missing units**: Always label axes with units
 6. **3D effects**: Distorts perception, avoid completely
 7. **Chart junk**: Remove unnecessary gridlines, decorations
 8. **Truncated axes**: Start bar charts at zero unless scientifically justified
 9. **Inconsistent styling**: Different fonts/colors across figures in same manuscript
 10. **No error bars**: Always show uncertainty
 ## Final Checklist
 Before submitting figures, verify:
 - [ ] Resolution meets journal requirements (300+ DPI)
 - [ ] File format is correct (vector for plots, TIFF for images)
 - [ ] Figure size matches journal specifications
 - [ ] All text readable at final size (≥6 pt)
 - [ ] Colors are colorblind-friendly
 - [ ] Figure works in grayscale
 - [ ] All axes labeled with units
 - [ ] Error bars present with definition in caption
 - [ ] Panel labels present and consistent
 - [ ] No chart junk or 3D effects
 - [ ] Fonts consistent across all figures
 - [ ] Statistical significance clearly marked
 - [ ] Legend is clear and complete
 Use this skill to ensure scientific figures meet the highest publication standards while remaining accessible to all readers.
--- a/scientific-thinking/scientific-visualization/assets/color_palettes.py
+++ b/scientific-thinking/scientific-visualization/assets/color_palettes.py
@@ -0,0 +1,197 @@
 """
 Colorblind-Friendly Color Palettes for Scientific Visualization
 This module provides carefully curated color palettes optimized for
 scientific publications and accessibility.
 Usage:
    from color_palettes import OKABE_ITO, apply_palette
    import matplotlib.pyplot as plt
    apply_palette('okabe_ito')
    plt.plot([1, 2, 3], [1, 4, 9])
 """
 # Okabe-Ito Palette (2008)
 # The most widely recommended colorblind-friendly palette
 OKABE_ITO = {
    'orange': '#E69F00',
    'sky_blue': '#56B4E9',
    'bluish_green': '#009E73',
    'yellow': '#F0E442',
    'blue': '#0072B2',
    'vermillion': '#D55E00',
    'reddish_purple': '#CC79A7',
    'black': '#000000'
 }
 OKABE_ITO_LIST = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
                   '#0072B2', '#D55E00', '#CC79A7', '#000000']
 # Wong Palette (Nature Methods)
 WONG = ['#000000', '#E69F00', '#56B4E9', '#009E73',
        '#F0E442', '#0072B2', '#D55E00', '#CC79A7']
 # Paul Tol Palettes (https://personal.sron.nl/~pault/)
 TOL_BRIGHT = ['#4477AA', '#EE6677', '#228833', '#CCBB44',
              '#66CCEE', '#AA3377', '#BBBBBB']
 TOL_MUTED = ['#332288', '#88CCEE', '#44AA99', '#117733',
             '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
 TOL_LIGHT = ['#77AADD', '#EE8866', '#EEDD88', '#FFAABB',
             '#99DDFF', '#44BB99', '#BBCC33', '#AAAA00', '#DDDDDD']
 TOL_HIGH_CONTRAST = ['#004488', '#DDAA33', '#BB5566']
 # Sequential colormaps (for continuous data)
 SEQUENTIAL_COLORMAPS = [
    'viridis',   # Default, perceptually uniform
    'plasma',    # Perceptually uniform
    'inferno',   # Perceptually uniform
    'magma',     # Perceptually uniform
    'cividis',   # Optimized for colorblind viewers
    'YlOrRd',    # Yellow-Orange-Red
    'YlGnBu',    # Yellow-Green-Blue
    'Blues',     # Single hue
    'Greens',    # Single hue
    'Purples',   # Single hue
 ]
 # Diverging colormaps (for data with meaningful center)
 DIVERGING_COLORMAPS_SAFE = [
    'RdYlBu',    # Red-Yellow-Blue (reversed is common)
    'RdBu',      # Red-Blue
    'PuOr',      # Purple-Orange (excellent for colorblind)
    'BrBG',      # Brown-Blue-Green (good for colorblind)
    'PRGn',      # Purple-Green (use with caution)
    'PiYG',      # Pink-Yellow-Green (use with caution)
 ]
 # Diverging colormaps to AVOID (red-green combinations)
 DIVERGING_COLORMAPS_AVOID = [
    'RdGn',      # Red-Green (problematic!)
    'RdYlGn',    # Red-Yellow-Green (problematic!)
 ]
 # Fluorophore colors (traditional - use with caution)
 FLUOROPHORES_TRADITIONAL = {
    'DAPI': '#0000FF',    # Blue
    'GFP': '#00FF00',     # Green (problematic for colorblind)
    'RFP': '#FF0000',     # Red
    'Cy5': '#FF00FF',     # Magenta
    'YFP': '#FFFF00',     # Yellow
 }
 # Fluorophore colors (colorblind-friendly alternatives)
 FLUOROPHORES_ACCESSIBLE = {
    'Channel1': '#0072B2',  # Blue
    'Channel2': '#E69F00',  # Orange (instead of green)
    'Channel3': '#D55E00',  # Vermillion (instead of red)
    'Channel4': '#CC79A7',  # Magenta
    'Channel5': '#F0E442',  # Yellow
 }
 # Genomics/Bioinformatics
 DNA_BASES = {
    'A': '#00CC00',  # Green
    'C': '#0000CC',  # Blue
    'G': '#FFB300',  # Orange
    'T': '#CC0000',  # Red
 }
 DNA_BASES_ACCESSIBLE = {
    'A': '#009E73',  # Bluish Green
    'C': '#0072B2',  # Blue
    'G': '#E69F00',  # Orange
    'T': '#D55E00',  # Vermillion
 }
 def apply_palette(palette_name='okabe_ito'):
    """
    Apply a color palette to matplotlib's default color cycle.
    Parameters
    ----------
    palette_name : str
        Name of the palette to apply. Options:
        'okabe_ito', 'wong', 'tol_bright', 'tol_muted',
        'tol_light', 'tol_high_contrast'
    Returns
    -------
    list
        List of colors in the palette
    Examples
    --------
    >>> apply_palette('okabe_ito')
    >>> plt.plot([1, 2, 3], [1, 4, 9])  # Uses Okabe-Ito colors
    """
    try:
        import matplotlib.pyplot as plt
    except ImportError:
        print("matplotlib not installed")
        return None
    palettes = {
        'okabe_ito': OKABE_ITO_LIST,
        'wong': WONG,
        'tol_bright': TOL_BRIGHT,
        'tol_muted': TOL_MUTED,
        'tol_light': TOL_LIGHT,
        'tol_high_contrast': TOL_HIGH_CONTRAST,
    }
    if palette_name not in palettes:
        available = ', '.join(palettes.keys())
        raise ValueError(f"Palette '{palette_name}' not found. Available: {available}")
    colors = palettes[palette_name]
    plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
    return colors
 def get_palette(palette_name='okabe_ito'):
    """
    Get a color palette as a list.
    Parameters
    ----------
    palette_name : str
        Name of the palette
    Returns
    -------
    list
        List of color hex codes
    """
    palettes = {
        'okabe_ito': OKABE_ITO_LIST,
        'wong': WONG,
        'tol_bright': TOL_BRIGHT,
        'tol_muted': TOL_MUTED,
        'tol_light': TOL_LIGHT,
        'tol_high_contrast': TOL_HIGH_CONTRAST,
    }
    if palette_name not in palettes:
        available = ', '.join(palettes.keys())
        raise ValueError(f"Palette '{palette_name}' not found. Available: {available}")
    return palettes[palette_name]
 if __name__ == "__main__":
    print("Available colorblind-friendly palettes:")
    print(f"  - Okabe-Ito: {len(OKABE_ITO_LIST)} colors")
    print(f"  - Wong: {len(WONG)} colors")
    print(f"  - Tol Bright: {len(TOL_BRIGHT)} colors")
    print(f"  - Tol Muted: {len(TOL_MUTED)} colors")
    print(f"  - Tol Light: {len(TOL_LIGHT)} colors")
    print(f"  - Tol High Contrast: {len(TOL_HIGH_CONTRAST)} colors")
    print("\nOkabe-Ito palette (most recommended):")
    for name, color in OKABE_ITO.items():
        print(f"  {name:15s}: {color}")
--- a/scientific-thinking/scientific-visualization/assets/nature.mplstyle
+++ b/scientific-thinking/scientific-visualization/assets/nature.mplstyle
@@ -0,0 +1,63 @@
 # Nature journal style
 # Usage: plt.style.use('nature.mplstyle')
 #
 # Optimized for Nature journal specifications:
 # - Single column: 89 mm
 # - Double column: 183 mm
 # - High resolution requirements
 # Figure properties
 figure.dpi: 100
 figure.facecolor: white
 figure.constrained_layout.use: True
 figure.figsize: 3.5, 2.625  # 89 mm single column, 3:4 aspect
 # Font properties (Nature prefers smaller fonts)
 font.size: 7
 font.family: sans-serif
 font.sans-serif: Arial, Helvetica
 # Axes properties
 axes.linewidth: 0.5
 axes.labelsize: 8
 axes.titlesize: 8
 axes.labelweight: normal
 axes.spines.top: False
 axes.spines.right: False
 axes.edgecolor: black
 axes.axisbelow: True
 axes.grid: False
 axes.prop_cycle: cycler('color', ['E69F00', '56B4E9', '009E73', 'F0E442', '0072B2', 'D55E00', 'CC79A7'])
 # Tick properties
 xtick.major.size: 2.5
 xtick.minor.size: 1.5
 xtick.major.width: 0.5
 xtick.minor.width: 0.4
 xtick.labelsize: 6
 xtick.direction: out
 ytick.major.size: 2.5
 ytick.minor.size: 1.5
 ytick.major.width: 0.5
 ytick.minor.width: 0.4
 ytick.labelsize: 6
 ytick.direction: out
 # Line properties
 lines.linewidth: 1.2
 lines.markersize: 3
 lines.markeredgewidth: 0.4
 # Legend properties
 legend.fontsize: 6
 legend.frameon: False
 # Save properties (Nature requirements)
 savefig.dpi: 600  # 1000 for line art, 600 for combination
 savefig.format: pdf
 savefig.bbox: tight
 savefig.pad_inches: 0.05
 savefig.facecolor: white
 # Image properties
 image.cmap: viridis
--- a/scientific-thinking/scientific-visualization/assets/presentation.mplstyle
+++ b/scientific-thinking/scientific-visualization/assets/presentation.mplstyle
@@ -0,0 +1,61 @@
 # Presentation/Poster style
 # Usage: plt.style.use('presentation.mplstyle')
 #
 # Larger fonts and thicker lines for presentations,
 # posters, and projected displays
 # Figure properties
 figure.dpi: 100
 figure.facecolor: white
 figure.constrained_layout.use: True
 figure.figsize: 8, 6
 # Font properties (larger for visibility)
 font.size: 14
 font.family: sans-serif
 font.sans-serif: Arial, Helvetica, Calibri
 # Axes properties
 axes.linewidth: 1.5
 axes.labelsize: 16
 axes.titlesize: 18
 axes.labelweight: normal
 axes.spines.top: False
 axes.spines.right: False
 axes.edgecolor: black
 axes.axisbelow: True
 axes.grid: False
 axes.prop_cycle: cycler('color', ['E69F00', '56B4E9', '009E73', 'F0E442', '0072B2', 'D55E00', 'CC79A7'])
 # Tick properties
 xtick.major.size: 6
 xtick.minor.size: 4
 xtick.major.width: 1.5
 xtick.minor.width: 1.0
 xtick.labelsize: 12
 xtick.direction: out
 ytick.major.size: 6
 ytick.minor.size: 4
 ytick.major.width: 1.5
 ytick.minor.width: 1.0
 ytick.labelsize: 12
 ytick.direction: out
 # Line properties
 lines.linewidth: 2.5
 lines.markersize: 8
 lines.markeredgewidth: 1.0
 # Legend properties
 legend.fontsize: 12
 legend.frameon: False
 # Save properties
 savefig.dpi: 300
 savefig.format: png
 savefig.bbox: tight
 savefig.pad_inches: 0.1
 savefig.facecolor: white
 # Image properties
 image.cmap: viridis
--- a/scientific-thinking/scientific-visualization/assets/publication.mplstyle
+++ b/scientific-thinking/scientific-visualization/assets/publication.mplstyle
@@ -0,0 +1,68 @@
 # Publication-quality matplotlib style
 # Usage: plt.style.use('publication.mplstyle')
 #
 # This style provides clean, professional formatting suitable
 # for most scientific journals
 # Figure properties
 figure.dpi: 100
 figure.facecolor: white
 figure.autolayout: False
 figure.constrained_layout.use: True
 figure.figsize: 3.5, 2.5
 # Font properties
 font.size: 8
 font.family: sans-serif
 font.sans-serif: Arial, Helvetica, DejaVu Sans
 # Axes properties
 axes.linewidth: 0.5
 axes.labelsize: 9
 axes.titlesize: 9
 axes.labelweight: normal
 axes.spines.top: False
 axes.spines.right: False
 axes.spines.left: True
 axes.spines.bottom: True
 axes.edgecolor: black
 axes.labelcolor: black
 axes.axisbelow: True
 axes.grid: False
 axes.prop_cycle: cycler('color', ['E69F00', '56B4E9', '009E73', 'F0E442', '0072B2', 'D55E00', 'CC79A7', '000000'])
 # Tick properties
 xtick.major.size: 3
 xtick.minor.size: 2
 xtick.major.width: 0.5
 xtick.minor.width: 0.5
 xtick.labelsize: 7
 xtick.direction: out
 ytick.major.size: 3
 ytick.minor.size: 2
 ytick.major.width: 0.5
 ytick.minor.width: 0.5
 ytick.labelsize: 7
 ytick.direction: out
 # Line properties
 lines.linewidth: 1.5
 lines.markersize: 4
 lines.markeredgewidth: 0.5
 # Legend properties
 legend.fontsize: 7
 legend.frameon: False
 legend.loc: best
 # Save properties
 savefig.dpi: 300
 savefig.format: pdf
 savefig.bbox: tight
 savefig.pad_inches: 0.05
 savefig.transparent: False
 savefig.facecolor: white
 # Image properties
 image.cmap: viridis
 image.aspect: auto
--- a/scientific-thinking/scientific-visualization/references/color_palettes.md
+++ b/scientific-thinking/scientific-visualization/references/color_palettes.md
@@ -0,0 +1,348 @@
 # Scientific Color Palettes and Guidelines
 ## Overview
 Color choice in scientific visualization is critical for accessibility, clarity, and accurate data representation. This reference provides colorblind-friendly palettes and best practices for color usage.
 ## Colorblind-Friendly Palettes
 ### Okabe-Ito Palette (Recommended for Categories)
 The Okabe-Ito palette is specifically designed to be distinguishable by people with all forms of color blindness.
 ```python
 # Okabe-Ito colors (RGB values)
 okabe_ito = {
    'orange': '#E69F00',      # RGB: (230, 159, 0)
    'sky_blue': '#56B4E9',    # RGB: (86, 180, 233)
    'bluish_green': '#009E73', # RGB: (0, 158, 115)
    'yellow': '#F0E442',      # RGB: (240, 228, 66)
    'blue': '#0072B2',        # RGB: (0, 114, 178)
    'vermillion': '#D55E00',  # RGB: (213, 94, 0)
    'reddish_purple': '#CC79A7', # RGB: (204, 121, 167)
    'black': '#000000'        # RGB: (0, 0, 0)
 }
 ```
 **Usage in Matplotlib:**
 ```python
 import matplotlib.pyplot as plt
 colors = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
          '#0072B2', '#D55E00', '#CC79A7', '#000000']
 plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
 ```
 **Usage in Seaborn:**
 ```python
 import seaborn as sns
 okabe_ito_palette = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
                      '#0072B2', '#D55E00', '#CC79A7']
 sns.set_palette(okabe_ito_palette)
 ```
 **Usage in Plotly:**
 ```python
 import plotly.graph_objects as go
 okabe_ito_plotly = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
                     '#0072B2', '#D55E00', '#CC79A7']
 fig = go.Figure()
 # Apply to discrete color scale
 ```
 ### Wong Palette (Alternative for Categories)
 Another excellent colorblind-friendly palette by Bang Wong (Nature Methods).
 ```python
 wong_palette = {
    'black': '#000000',
    'orange': '#E69F00',
    'sky_blue': '#56B4E9',
    'green': '#009E73',
    'yellow': '#F0E442',
    'blue': '#0072B2',
    'vermillion': '#D55E00',
    'purple': '#CC79A7'
 }
 ```
 ### Paul Tol Palettes
 Paul Tol has designed multiple scientifically-optimized palettes for different use cases.
 **Bright Palette (up to 7 categories):**
 ```python
 tol_bright = ['#4477AA', '#EE6677', '#228833', '#CCBB44',
              '#66CCEE', '#AA3377', '#BBBBBB']
 ```
 **Muted Palette (up to 9 categories):**
 ```python
 tol_muted = ['#332288', '#88CCEE', '#44AA99', '#117733',
             '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
 ```
 **High Contrast (3 categories only):**
 ```python
 tol_high_contrast = ['#004488', '#DDAA33', '#BB5566']
 ```
 ## Sequential Colormaps (Continuous Data)
 Sequential colormaps represent data from low to high values with a single hue.
 ### Perceptually Uniform Colormaps
 These colormaps have uniform perceptual change across the color scale.
 **Viridis (default in Matplotlib):**
 - Colorblind-friendly
 - Prints well in grayscale
 - Perceptually uniform
 ```python
 plt.imshow(data, cmap='viridis')
 ```
 **Cividis:**
 - Optimized for colorblind viewers
 - Designed specifically for deuteranopia/protanopia
 ```python
 plt.imshow(data, cmap='cividis')
 ```
 **Plasma, Inferno, Magma:**
 - Perceptually uniform alternatives to viridis
 - Good for different aesthetic preferences
 ```python
 plt.imshow(data, cmap='plasma')
 ```
 ### When to Use Sequential Maps
 - Heatmaps showing intensity
 - Geographic elevation data
 - Probability distributions
 - Any single-variable continuous data (low → high)
 ## Diverging Colormaps (Negative to Positive)
 Diverging colormaps have a neutral middle color with two contrasting colors at extremes.
 ### Colorblind-Safe Diverging Maps
 **RdYlBu (Red-Yellow-Blue):**
 ```python
 plt.imshow(data, cmap='RdYlBu_r')  # _r reverses: blue (low) to red (high)
 ```
 **PuOr (Purple-Orange):**
 - Excellent for colorblind viewers
 ```python
 plt.imshow(data, cmap='PuOr')
 ```
 **BrBG (Brown-Blue-Green):**
 - Good colorblind accessibility
 ```python
 plt.imshow(data, cmap='BrBG')
 ```
 ### Avoid These Diverging Maps
 - **RdGn (Red-Green)**: Problematic for red-green colorblindness
 - **RdYlGn (Red-Yellow-Green)**: Same issue
 ### When to Use Diverging Maps
 - Correlation matrices
 - Change/difference data (positive vs. negative)
 - Deviation from a central value
 - Temperature anomalies
 ## Special Purpose Palettes
 ### For Genomics/Bioinformatics
 **Sequence type identification:**
 ```python
 # DNA/RNA bases
 nucleotide_colors = {
    'A': '#00CC00',  # Green
    'C': '#0000CC',  # Blue
    'G': '#FFB300',  # Orange
    'T': '#CC0000',  # Red
    'U': '#CC0000'   # Red (RNA)
 }
 ```
 **Gene expression:**
 - Use sequential colormaps (viridis, YlOrRd) for expression levels
 - Use diverging colormaps (RdBu) for log2 fold change
 ### For Microscopy
 **Fluorescence channels:**
 ```python
 # Traditional fluorophore colors (use with caution)
 fluorophore_colors = {
    'DAPI': '#0000FF',      # Blue - DNA
    'GFP': '#00FF00',       # Green (problematic for colorblind)
    'RFP': '#FF0000',       # Red
    'Cy5': '#FF00FF'        # Magenta
 }
 # Colorblind-friendly alternatives
 fluorophore_alt = {
    'Channel1': '#0072B2',  # Blue
    'Channel2': '#E69F00',  # Orange (instead of green)
    'Channel3': '#D55E00',  # Vermillion
    'Channel4': '#CC79A7'   # Magenta
 }
 ```
 ## Color Usage Best Practices
 ### Categorical Data (Qualitative Color Schemes)
 **Do:**
 - Use distinct, saturated colors from Okabe-Ito or Wong palette
 - Limit to 7-8 categories max in one plot
 - Use consistent colors for same categories across figures
 - Add patterns/markers when colors alone might be insufficient
 **Don't:**
 - Use red/green combinations
 - Use rainbow (jet) colormap for categories
 - Use similar hues that are hard to distinguish
 ### Continuous Data (Sequential/Diverging Schemes)
 **Do:**
 - Use perceptually uniform colormaps (viridis, plasma, cividis)
 - Choose diverging maps when data has meaningful center point
 - Include colorbar with labeled ticks
 - Test appearance in grayscale
 **Don't:**
 - Use rainbow (jet) colormap - not perceptually uniform
 - Use red-green diverging maps
 - Omit colorbar on heatmaps
 ## Testing for Colorblind Accessibility
 ### Online Simulators
 - **Coblis**: https://www.color-blindness.com/coblis-color-blindness-simulator/
 - **Color Oracle**: Free downloadable tool for Windows/Mac/Linux
 - **Sim Daltonism**: Mac application
 ### Types of Color Vision Deficiency
 - **Deuteranopia** (~5% of males): Cannot distinguish green
 - **Protanopia** (~2% of males): Cannot distinguish red
 - **Tritanopia** (<1%): Cannot distinguish blue (rare)
 ### Python Tools
 ```python
 # Using colorspacious to simulate colorblind vision
 from colorspacious import cspace_convert
 def simulate_deuteranopia(image_rgb):
    from colorspacious import cspace_convert
    # Convert to colorblind simulation
    # (Implementation would require colorspacious library)
    pass
 ```
 ## Implementation Examples
 ### Setting Global Matplotlib Style
 ```python
 import matplotlib.pyplot as plt
 import matplotlib as mpl
 # Set Okabe-Ito as default color cycle
 okabe_ito_colors = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
                     '#0072B2', '#D55E00', '#CC79A7']
 mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=okabe_ito_colors)
 # Set default colormap to viridis
 mpl.rcParams['image.cmap'] = 'viridis'
 ```
 ### Seaborn with Custom Palette
 ```python
 import seaborn as sns
 # Set Paul Tol muted palette
 tol_muted = ['#332288', '#88CCEE', '#44AA99', '#117733',
             '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
 sns.set_palette(tol_muted)
 # For heatmaps
 sns.heatmap(data, cmap='viridis', annot=True)
 ```
 ### Plotly with Discrete Colors
 ```python
 import plotly.express as px
 # Use Okabe-Ito for categorical data
 okabe_ito_plotly = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
                     '#0072B2', '#D55E00', '#CC79A7']
 fig = px.scatter(df, x='x', y='y', color='category',
                 color_discrete_sequence=okabe_ito_plotly)
 ```
 ## Grayscale Compatibility
 All figures should remain interpretable in grayscale. Test by converting to grayscale:
 ```python
 # Convert figure to grayscale for testing
 fig.savefig('figure_gray.png', dpi=300, colormap='gray')
 ```
 **Strategies for grayscale compatibility:**
 1. Use different line styles (solid, dashed, dotted)
 2. Use different marker shapes (circles, squares, triangles)
 3. Add hatching patterns to bars
 4. Ensure sufficient luminance contrast between colors
 ## Color Spaces
 ### RGB vs CMYK
 - **RGB** (Red, Green, Blue): For digital/screen display
 - **CMYK** (Cyan, Magenta, Yellow, Black): For print
 **Important:** Colors appear different in print vs. screen. When preparing for print:
 1. Convert to CMYK color space
 2. Check color appearance in CMYK preview
 3. Ensure sufficient contrast remains
 ### Matplotlib Color Spaces
 ```python
 # Save for print (CMYK)
 # Note: Direct CMYK support limited; use PDF and let publisher convert
 fig.savefig('figure.pdf', dpi=300)
 # For RGB (digital)
 fig.savefig('figure.png', dpi=300)
 ```
 ## Common Mistakes
 1. **Using jet/rainbow colormap**: Not perceptually uniform; avoid
 2. **Red-green combinations**: ~8% of males cannot distinguish
 3. **Too many colors**: More than 7-8 becomes difficult to distinguish
 4. **Inconsistent color meaning**: Same color should mean same thing across figures
 5. **Missing colorbar**: Always include for continuous data
 6. **Low contrast**: Ensure colors differ sufficiently
 7. **Relying solely on color**: Add texture, patterns, or markers
 ## Resources
 - **ColorBrewer**: http://colorbrewer2.org/ - Choose palettes by colorblind-safe option
 - **Paul Tol's palettes**: https://personal.sron.nl/~pault/
 - **Okabe-Ito palette origin**: "Color Universal Design" (Okabe & Ito, 2008)
 - **Matplotlib colormaps**: https://matplotlib.org/stable/tutorials/colors/colormaps.html
 - **Seaborn palettes**: https://seaborn.pydata.org/tutorial/color_palettes.html
--- a/scientific-thinking/scientific-visualization/references/journal_requirements.md
+++ b/scientific-thinking/scientific-visualization/references/journal_requirements.md
@@ -0,0 +1,320 @@
 # Journal-Specific Figure Requirements
 ## Overview
 Different journals have specific technical requirements for figures. This reference compiles common requirements from major scientific publishers. **Always check the specific journal's author guidelines for the most current requirements.**
 ## Nature Portfolio (Nature, Nature Methods, etc.)
 ### Technical Specifications
 - **File formats**:
  - Vector: PDF, EPS, AI (preferred for graphs)
  - Raster: TIFF, PNG (for images)
  - Never: PowerPoint, Word, JPEG
 - **Resolution**:
  - Line art: 1000-1200 DPI
  - Combination (line art + images): 600 DPI
  - Photographs/microscopy: 300 DPI minimum
 - **Color space**: RGB (Nature is digital-first)
 - **Dimensions**:
  - Single column: 89 mm (3.5 inches)
  - 1.5 column: 120 mm (4.7 inches)
  - Double column: 183 mm (7.2 inches)
  - Maximum height: 247 mm (9.7 inches)
 - **Fonts**:
  - Arial or Helvetica (or similar sans-serif)
  - Minimum 5-7 pt at final size
  - Embed all fonts in PDF/EPS
 ### Nature Specific Guidelines
 - Panel labels: a, b, c (lowercase, bold) in top-left corner
 - Scale bars required for microscopy images
 - Gel images: Include molecular weight markers
 - Cropping: Indicate with line breaks
 - Statistics: Mark significance; define symbols in legend
 - Source data: Required for all graphs
 ### File Naming
 Format: `FirstAuthorLastName_FigureNumber.ext`
 Example: `Smith_Fig1.pdf`
 ## Science (AAAS)
 ### Technical Specifications
 - **File formats**:
  - Vector: EPS, PDF (preferred)
  - Raster: TIFF
  - Acceptable: AI, PSD (Photoshop)
 - **Resolution**:
  - Line art: 1000 DPI minimum
  - Photographs: 300 DPI minimum
  - Combination: 600 DPI minimum
 - **Color space**: RGB
 - **Dimensions**:
  - Single column: 5.5 cm (2.17 inches)
  - 1.5 column: 12 cm (4.72 inches)
  - Full width: 17.5 cm (6.89 inches)
  - Maximum height: 23.3 cm (9.17 inches)
 - **Fonts**:
  - Helvetica (or Arial)
  - 6-8 pt minimum at final size
  - Consistent across all figures
 ### Science Specific Guidelines
 - Panel labels: (A), (B), (C) in parentheses
 - Minimal text within figures (details in caption)
 - High contrast for web and print
 - Error bars required; define in caption
 - Avoid excessive whitespace
 ### File Naming
 Format: `Manuscript#_Fig#.ext`
 Example: `abn1234_Fig1.eps`
 ## Cell Press (Cell, Neuron, Molecular Cell, etc.)
 ### Technical Specifications
 - **File formats**:
  - Vector: PDF, EPS (preferred for graphs/diagrams)
  - Raster: TIFF (for photographs)
 - **Resolution**:
  - Line art: 1000 DPI
  - Photographs: 300 DPI
  - Combination: 600 DPI
 - **Color space**: RGB
 - **Dimensions**:
  - Single column: 85 mm (3.35 inches)
  - Double column: 178 mm (7.01 inches)
  - Maximum height: 230 mm (9.06 inches)
 - **Fonts**:
  - Arial or Helvetica only
  - 8-12 pt for axis labels
  - 6-8 pt for tick labels
 ### Cell Press Specific Guidelines
 - Panel labels: (A), (B), (C) or A, B, C in top-left
 - Related panels should match in size
 - Scale bars mandatory for microscopy
 - Western blots: Include molecular weight markers
 - Arrows/arrowheads: 2 pt minimum width
 - Line widths: 1-2 pt for data
 ## PLOS (Public Library of Science)
 ### Technical Specifications
 - **File formats**:
  - Vector: EPS, PDF (preferred)
  - Raster: TIFF, PNG
  - TIFF with LZW compression acceptable
 - **Resolution**:
  - Minimum 300 DPI at final size (all figure types)
  - 600 DPI preferred for line art
 - **Color space**: RGB
 - **Dimensions**:
  - Single column: 8.3 cm (3.27 inches)
  - 1.5 column: 11.4 cm (4.49 inches)
  - Double column: 17.3 cm (6.81 inches)
  - Maximum height: 23.3 cm (9.17 inches)
 - **Fonts**:
  - Sans-serif preferred (Arial, Helvetica)
  - 8-12 pt for labels at final size
 ### PLOS Specific Guidelines
 - Figures should be understandable without caption
 - Color required only if adding information
 - All figures convertible to grayscale
 - Panel labels optional but recommended
 - Open access: Figures must be CC-BY licensed
 - Source data files encouraged
 ## ACS (American Chemical Society)
 ### Technical Specifications
 - **File formats**:
  - Preferred: TIFF, PDF, EPS
  - Application files: AI, CDX (ChemDraw), CDL
  - Acceptable: PNG (not for publication)
 - **Resolution**:
  - Minimum 300 DPI at final size
  - 600 DPI for line art and chemical structures
  - 1200 DPI for detailed structures
 - **Color space**: RGB or CMYK (check specific journal)
 - **Dimensions**:
  - Single column: 3.25 inches (8.25 cm)
  - Double column: 7 inches (17.78 cm)
 - **Fonts**:
  - Embedded fonts required
  - Consistent sizing across figures
 ### ACS Specific Guidelines
 - Chemical structures: Use ChemDraw or equivalent
 - Atom labels: 10-12 pt
 - Bond thickness: 2 pt
 - Panel labels: Lowercase bold (a, b, c)
 - High contrast required (many ACS journals grayscale print)
 ## Elsevier Journals (varies by journal)
 ### Technical Specifications
 - **File formats**:
  - Vector: EPS, PDF
  - Raster: TIFF, JPEG (only for photographs)
 - **Resolution**:
  - Line art: 1000 DPI minimum
  - Photographs: 300 DPI minimum
  - Combination: 600 DPI minimum
 - **Color space**: RGB (for online); CMYK (for print journals)
 - **Dimensions**: Vary by journal
  - Common single column: 90 mm
  - Common double column: 190 mm
 - **Fonts**:
  - Preferred: Arial, Times, Symbol
  - Minimum 6 pt at final size
 ### Elsevier Specific Guidelines
 - Check individual journal guidelines (highly variable)
 - Some journals charge for color in print
 - Panel labels typically (A), (B), (C) or A, B, C
 - Graphical abstract often required (separate from figures)
 ## IEEE (Engineering/Computer Science)
 ### Technical Specifications
 - **File formats**:
  - Vector: PDF, EPS (preferred)
  - Raster: TIFF, PNG
 - **Resolution**:
  - Photographs/graphics: 300 DPI minimum at final size
  - Line art: 600 DPI minimum
 - **Color space**: RGB (online); CMYK (print)
 - **Dimensions**:
  - Single column: 3.5 inches (8.9 cm)
  - Double column: 7.16 inches (18.2 cm)
 - **Fonts**:
  - Sans-serif preferred
  - Minimum 8-10 pt at final size
 ### IEEE Specific Guidelines
 - Figures should be readable in black and white
 - Color figures incur no charge (online publication)
 - Panel labels: (a), (b), (c) in lowercase
 - Captions below figures (not on separate page)
 - Use IEEE graphics checker tool before submission
 ## BMC (BioMed Central) - Open Access
 ### Technical Specifications
 - **File formats**:
  - Any standard format accepted
  - Preferred: TIFF, PDF, EPS, PNG
 - **Resolution**:
  - Minimum 600 DPI for line art
  - Minimum 300 DPI for photographs
 - **Color space**: RGB
 - **Dimensions**:
  - Flexible, but consider readability
  - Maximum width typically 140 mm
 - **Fonts**:
  - Embedded and readable
 ### BMC Specific Guidelines
 - Open access: CC-BY license required
 - Figure files uploaded separately
 - Panel labels as appropriate for field
 - Source data encouraged
 - Accessibility important (colorblind-friendly)
 ## Common Requirements Across Journals
 ### Universal Best Practices
 1. **Never use JPEG for graphs/plots**: Compression artifacts
 2. **Embed all fonts**: In PDF/EPS files
 3. **Layer structure**: Flatten images (merge layers in Photoshop)
 4. **RGB vs CMYK**: Most journals now RGB (digital-first)
 5. **High resolution**: Always better to start high, reduce if needed
 6. **Consistency**: Same style across all figures in manuscript
 7. **File size**: Balance quality with reasonable file sizes (typically <10 MB per figure)
 ### Submitting Figures
 - **Initial submission**: Lower resolution often acceptable (for review)
 - **Revision/acceptance**: High-resolution required
 - **Separate files**: Each figure as separate file
 - **File naming**: Clear, systematic naming
 - **Supporting information**: May have different requirements
 ## Quick Reference Table
 | Publisher | Single Column | Double Column | Min DPI (photos) | Min DPI (line art) | Preferred Format |
 |-----------|---------------|---------------|------------------|-------------------|------------------|
 | Nature | 89 mm | 183 mm | 300 | 1000 | EPS, PDF |
 | Science | 5.5 cm | 17.5 cm | 300 | 1000 | EPS, PDF |
 | Cell Press | 85 mm | 178 mm | 300 | 1000 | EPS, PDF |
 | PLOS | 8.3 cm | 17.3 cm | 300 | 600 | EPS, TIFF |
 | ACS | 3.25 in | 7 in | 300 | 600 | TIFF, EPS |
 ## Checking Requirements
 ### Before Submission Checklist
 1. Read journal's author guidelines (figure section)
 2. Check file format requirements
 3. Verify resolution requirements
 4. Confirm size specifications (width × height)
 5. Check font requirements
 6. Verify color space (RGB vs CMYK)
 7. Check panel labeling style
 8. Review supplementary materials requirements
 9. Confirm file naming conventions
 10. Check file size limits
 ### Useful Tools
 - **ImageJ/Fiji**: Check/adjust DPI
 - **Adobe Acrobat**: Verify embedded fonts, check PDF properties
 - **GIMP**: Free alternative to Photoshop for raster editing
 - **Inkscape**: Free vector graphics editor
 ## Resources
 - **Journal websites**: Always check "Author Guidelines" or "Instructions for Authors"
 - **Publisher resources**: Many provide templates and tools
 - **Format conversion**: Use reputable tools; check output quality
 - **Help desks**: Contact journal staff if unclear
 ## Notes
 - Requirements change periodically - always verify current guidelines
 - Preprint servers (bioRxiv, arXiv) often have different requirements
 - Conference proceedings may have separate requirements
 - Some journals offer figure preparation services (often paid)
 - Supplementary figures may have relaxed requirements compared to main text figures
--- a/scientific-thinking/scientific-visualization/references/matplotlib_examples.md
+++ b/scientific-thinking/scientific-visualization/references/matplotlib_examples.md
@@ -0,0 +1,620 @@
 # Publication-Ready Matplotlib Examples
 ## Overview
 This reference provides practical code examples for creating publication-ready scientific figures using Matplotlib, Seaborn, and Plotly. All examples follow best practices from `publication_guidelines.md` and use colorblind-friendly palettes from `color_palettes.md`.
 ## Setup and Configuration
 ### Publication-Quality Matplotlib Configuration
 ```python
 import matplotlib.pyplot as plt
 import matplotlib as mpl
 import numpy as np
 # Set publication quality parameters
 mpl.rcParams['figure.dpi'] = 300
 mpl.rcParams['savefig.dpi'] = 300
 mpl.rcParams['font.size'] = 8
 mpl.rcParams['font.family'] = 'sans-serif'
 mpl.rcParams['font.sans-serif'] = ['Arial', 'Helvetica']
 mpl.rcParams['axes.labelsize'] = 9
 mpl.rcParams['axes.titlesize'] = 9
 mpl.rcParams['xtick.labelsize'] = 7
 mpl.rcParams['ytick.labelsize'] = 7
 mpl.rcParams['legend.fontsize'] = 7
 mpl.rcParams['axes.linewidth'] = 0.5
 mpl.rcParams['xtick.major.width'] = 0.5
 mpl.rcParams['ytick.major.width'] = 0.5
 mpl.rcParams['lines.linewidth'] = 1.5
 # Use colorblind-friendly colors (Okabe-Ito palette)
 okabe_ito = ['#E69F00', '#56B4E9', '#009E73', '#F0E442',
             '#0072B2', '#D55E00', '#CC79A7', '#000000']
 mpl.rcParams['axes.prop_cycle'] = mpl.cycler(color=okabe_ito)
 # Use perceptually uniform colormap
 mpl.rcParams['image.cmap'] = 'viridis'
 ```
 ### Helper Function for Saving
 ```python
 def save_publication_figure(fig, filename, formats=['pdf', 'png'], dpi=300):
    """
    Save figure in multiple formats for publication.
    Parameters:
    -----------
    fig : matplotlib.figure.Figure
        Figure to save
    filename : str
        Base filename (without extension)
    formats : list
        List of file formats to save ['pdf', 'png', 'eps', 'svg']
    dpi : int
        Resolution for raster formats
    """
    for fmt in formats:
        output_file = f"{filename}.{fmt}"
        fig.savefig(output_file, dpi=dpi, bbox_inches='tight',
                   facecolor='white', edgecolor='none',
                   transparent=False, format=fmt)
        print(f"Saved: {output_file}")
 ```
 ## Example 1: Line Plot with Error Bars
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 # Generate sample data
 x = np.linspace(0, 10, 50)
 y1 = 2 * x + 1 + np.random.normal(0, 1, 50)
 y2 = 1.5 * x + 2 + np.random.normal(0, 1.2, 50)
 # Calculate means and standard errors for binned data
 bins = np.linspace(0, 10, 11)
 y1_mean = [y1[(x >= bins[i]) & (x < bins[i+1])].mean() for i in range(len(bins)-1)]
 y1_sem = [y1[(x >= bins[i]) & (x < bins[i+1])].std() /
          np.sqrt(len(y1[(x >= bins[i]) & (x < bins[i+1])]))
          for i in range(len(bins)-1)]
 x_binned = (bins[:-1] + bins[1:]) / 2
 # Create figure with appropriate size (single column width = 3.5 inches)
 fig, ax = plt.subplots(figsize=(3.5, 2.5))
 # Plot with error bars
 ax.errorbar(x_binned, y1_mean, yerr=y1_sem,
            marker='o', markersize=4, capsize=3, capthick=0.5,
            label='Condition A', linewidth=1.5)
 # Add labels with units
 ax.set_xlabel('Time (hours)')
 ax.set_ylabel('Fluorescence intensity (a.u.)')
 # Add legend
 ax.legend(frameon=False, loc='upper left')
 # Remove top and right spines
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 # Tight layout
 fig.tight_layout()
 # Save
 save_publication_figure(fig, 'line_plot_with_errors')
 plt.show()
 ```
 ## Example 2: Multi-Panel Figure
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 from string import ascii_uppercase
 # Create figure with multiple panels (double column width = 7 inches)
 fig = plt.figure(figsize=(7, 4))
 # Define grid for panels
 gs = fig.add_gridspec(2, 3, hspace=0.4, wspace=0.4,
                      left=0.08, right=0.98, top=0.95, bottom=0.08)
 # Panel A: Line plot
 ax_a = fig.add_subplot(gs[0, :2])
 x = np.linspace(0, 10, 100)
 for i, offset in enumerate([0, 0.5, 1.0]):
    ax_a.plot(x, np.sin(x) + offset, label=f'Dataset {i+1}')
 ax_a.set_xlabel('Time (s)')
 ax_a.set_ylabel('Amplitude (V)')
 ax_a.legend(frameon=False, fontsize=6)
 ax_a.spines['top'].set_visible(False)
 ax_a.spines['right'].set_visible(False)
 # Panel B: Bar plot
 ax_b = fig.add_subplot(gs[0, 2])
 categories = ['Control', 'Treatment\nA', 'Treatment\nB']
 values = [100, 125, 140]
 errors = [5, 8, 6]
 ax_b.bar(categories, values, yerr=errors, capsize=3,
         color=['#0072B2', '#E69F00', '#009E73'], alpha=0.8)
 ax_b.set_ylabel('Response (%)')
 ax_b.spines['top'].set_visible(False)
 ax_b.spines['right'].set_visible(False)
 ax_b.set_ylim(0, 160)
 # Panel C: Scatter plot
 ax_c = fig.add_subplot(gs[1, 0])
 x = np.random.randn(100)
 y = 2*x + np.random.randn(100)
 ax_c.scatter(x, y, s=10, alpha=0.6, color='#0072B2')
 ax_c.set_xlabel('Variable X')
 ax_c.set_ylabel('Variable Y')
 ax_c.spines['top'].set_visible(False)
 ax_c.spines['right'].set_visible(False)
 # Panel D: Heatmap
 ax_d = fig.add_subplot(gs[1, 1:])
 data = np.random.randn(10, 20)
 im = ax_d.imshow(data, cmap='viridis', aspect='auto')
 ax_d.set_xlabel('Sample number')
 ax_d.set_ylabel('Feature')
 cbar = plt.colorbar(im, ax=ax_d, fraction=0.046, pad=0.04)
 cbar.set_label('Intensity (a.u.)', rotation=270, labelpad=12)
 # Add panel labels
 panels = [ax_a, ax_b, ax_c, ax_d]
 for i, ax in enumerate(panels):
    ax.text(-0.15, 1.05, ascii_uppercase[i], transform=ax.transAxes,
            fontsize=10, fontweight='bold', va='top')
 save_publication_figure(fig, 'multi_panel_figure')
 plt.show()
 ```
 ## Example 3: Box Plot with Individual Points
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 # Generate sample data
 np.random.seed(42)
 data = [np.random.normal(100, 15, 30),
        np.random.normal(120, 20, 30),
        np.random.normal(140, 18, 30),
        np.random.normal(110, 22, 30)]
 fig, ax = plt.subplots(figsize=(3.5, 3))
 # Create box plot
 bp = ax.boxplot(data, widths=0.5, patch_artist=True,
                showfliers=False,  # We'll add points manually
                boxprops=dict(facecolor='lightgray', edgecolor='black', linewidth=0.8),
                medianprops=dict(color='black', linewidth=1.5),
                whiskerprops=dict(linewidth=0.8),
                capprops=dict(linewidth=0.8))
 # Overlay individual points
 colors = ['#0072B2', '#E69F00', '#009E73', '#D55E00']
 for i, (d, color) in enumerate(zip(data, colors)):
    # Add jitter to x positions
    x = np.random.normal(i+1, 0.04, size=len(d))
    ax.scatter(x, d, alpha=0.4, s=8, color=color)
 # Customize
 ax.set_xticklabels(['Control', 'Treatment A', 'Treatment B', 'Treatment C'])
 ax.set_ylabel('Cell count')
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 ax.set_ylim(50, 200)
 fig.tight_layout()
 save_publication_figure(fig, 'boxplot_with_points')
 plt.show()
 ```
 ## Example 4: Heatmap with Colorbar
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 # Generate correlation matrix
 np.random.seed(42)
 n = 10
 A = np.random.randn(n, n)
 corr_matrix = np.corrcoef(A)
 # Create figure
 fig, ax = plt.subplots(figsize=(4, 3.5))
 # Plot heatmap
 im = ax.imshow(corr_matrix, cmap='RdBu_r', vmin=-1, vmax=1, aspect='auto')
 # Add colorbar
 cbar = plt.colorbar(im, ax=ax, fraction=0.046, pad=0.04)
 cbar.set_label('Correlation coefficient', rotation=270, labelpad=15)
 # Set ticks and labels
 gene_names = [f'Gene{i+1}' for i in range(n)]
 ax.set_xticks(np.arange(n))
 ax.set_yticks(np.arange(n))
 ax.set_xticklabels(gene_names, rotation=45, ha='right')
 ax.set_yticklabels(gene_names)
 # Add grid
 ax.set_xticks(np.arange(n)-.5, minor=True)
 ax.set_yticks(np.arange(n)-.5, minor=True)
 ax.grid(which='minor', color='white', linestyle='-', linewidth=0.5)
 fig.tight_layout()
 save_publication_figure(fig, 'correlation_heatmap')
 plt.show()
 ```
 ## Example 5: Seaborn Violin Plot
 ```python
 import matplotlib.pyplot as plt
 import seaborn as sns
 import pandas as pd
 import numpy as np
 # Generate sample data
 np.random.seed(42)
 data = pd.DataFrame({
    'condition': np.repeat(['Control', 'Drug A', 'Drug B'], 50),
    'value': np.concatenate([
        np.random.normal(100, 15, 50),
        np.random.normal(120, 20, 50),
        np.random.normal(140, 18, 50)
    ])
 })
 # Set style
 sns.set_style('ticks')
 sns.set_palette(['#0072B2', '#E69F00', '#009E73'])
 fig, ax = plt.subplots(figsize=(3.5, 3))
 # Create violin plot
 sns.violinplot(data=data, x='condition', y='value', ax=ax,
               inner='box', linewidth=0.8)
 # Add strip plot
 sns.stripplot(data=data, x='condition', y='value', ax=ax,
              size=2, alpha=0.3, color='black')
 # Customize
 ax.set_xlabel('')
 ax.set_ylabel('Expression level (AU)')
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 fig.tight_layout()
 save_publication_figure(fig, 'violin_plot')
 plt.show()
 ```
 ## Example 6: Scientific Scatter with Regression
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 from scipy import stats
 # Generate data with correlation
 np.random.seed(42)
 x = np.random.randn(100)
 y = 2.5 * x + np.random.randn(100) * 0.8
 # Calculate regression
 slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
 # Create figure
 fig, ax = plt.subplots(figsize=(3.5, 3.5))
 # Scatter plot
 ax.scatter(x, y, s=15, alpha=0.6, color='#0072B2', edgecolors='none')
 # Regression line
 x_line = np.array([x.min(), x.max()])
 y_line = slope * x_line + intercept
 ax.plot(x_line, y_line, 'r-', linewidth=1.5, label=f'y = {slope:.2f}x + {intercept:.2f}')
 # Add statistics text
 stats_text = f'$R^2$ = {r_value**2:.3f}\n$p$ < 0.001' if p_value < 0.001 else f'$R^2$ = {r_value**2:.3f}\n$p$ = {p_value:.3f}'
 ax.text(0.05, 0.95, stats_text, transform=ax.transAxes,
        verticalalignment='top', fontsize=7,
        bbox=dict(boxstyle='round', facecolor='white', alpha=0.8, edgecolor='gray', linewidth=0.5))
 # Customize
 ax.set_xlabel('Predictor variable')
 ax.set_ylabel('Response variable')
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 fig.tight_layout()
 save_publication_figure(fig, 'scatter_regression')
 plt.show()
 ```
 ## Example 7: Time Series with Shaded Error
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 # Generate time series data
 np.random.seed(42)
 time = np.linspace(0, 24, 100)
 n_replicates = 5
 # Simulate multiple replicates
 data = np.array([10 * np.exp(-time/10) + np.random.normal(0, 0.5, 100)
                 for _ in range(n_replicates)])
 # Calculate mean and SEM
 mean = data.mean(axis=0)
 sem = data.std(axis=0) / np.sqrt(n_replicates)
 # Create figure
 fig, ax = plt.subplots(figsize=(4, 2.5))
 # Plot mean line
 ax.plot(time, mean, linewidth=1.5, color='#0072B2', label='Mean ± SEM')
 # Add shaded error region
 ax.fill_between(time, mean - sem, mean + sem,
                alpha=0.3, color='#0072B2', linewidth=0)
 # Customize
 ax.set_xlabel('Time (hours)')
 ax.set_ylabel('Concentration (μM)')
 ax.legend(frameon=False, loc='upper right')
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 ax.set_xlim(0, 24)
 ax.set_ylim(0, 12)
 fig.tight_layout()
 save_publication_figure(fig, 'timeseries_shaded')
 plt.show()
 ```
 ## Example 8: Plotly Interactive Figure
 ```python
 import plotly.graph_objects as go
 import numpy as np
 # Generate data
 np.random.seed(42)
 x = np.random.randn(100)
 y = 2*x + np.random.randn(100)
 colors = np.random.choice(['Group A', 'Group B'], 100)
 # Okabe-Ito colors for Plotly
 okabe_ito_plotly = ['#E69F00', '#56B4E9']
 # Create figure
 fig = go.Figure()
 for group, color in zip(['Group A', 'Group B'], okabe_ito_plotly):
    mask = colors == group
    fig.add_trace(go.Scatter(
        x=x[mask], y=y[mask],
        mode='markers',
        name=group,
        marker=dict(size=6, color=color, opacity=0.6)
    ))
 # Update layout for publication quality
 fig.update_layout(
    width=500,
    height=400,
    font=dict(family='Arial, sans-serif', size=10),
    plot_bgcolor='white',
    xaxis=dict(
        title='Variable X',
        showgrid=False,
        showline=True,
        linewidth=1,
        linecolor='black',
        mirror=False
    ),
    yaxis=dict(
        title='Variable Y',
        showgrid=False,
        showline=True,
        linewidth=1,
        linecolor='black',
        mirror=False
    ),
    legend=dict(
        x=0.02,
        y=0.98,
        bgcolor='rgba(255,255,255,0.8)',
        bordercolor='gray',
        borderwidth=0.5
    )
 )
 # Save as static image (requires kaleido)
 fig.write_image('plotly_scatter.png', width=500, height=400, scale=3)  # scale=3 gives ~300 DPI
 fig.write_html('plotly_scatter.html')  # Interactive version
 fig.show()
 ```
 ## Example 9: Grouped Bar Plot with Significance
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 # Data
 categories = ['WT', 'Mutant A', 'Mutant B']
 control_means = [100, 85, 70]
 control_sem = [5, 6, 5]
 treatment_means = [100, 120, 140]
 treatment_sem = [6, 8, 9]
 x = np.arange(len(categories))
 width = 0.35
 fig, ax = plt.subplots(figsize=(3.5, 3))
 # Create bars
 bars1 = ax.bar(x - width/2, control_means, width, yerr=control_sem,
               capsize=3, label='Control', color='#0072B2', alpha=0.8)
 bars2 = ax.bar(x + width/2, treatment_means, width, yerr=treatment_sem,
               capsize=3, label='Treatment', color='#E69F00', alpha=0.8)
 # Add significance markers
 def add_significance_bar(ax, x1, x2, y, h, text):
    """Add significance bar between two bars"""
    ax.plot([x1, x1, x2, x2], [y, y+h, y+h, y], linewidth=0.8, c='black')
    ax.text((x1+x2)/2, y+h, text, ha='center', va='bottom', fontsize=7)
 # Mark significant differences
 add_significance_bar(ax, x[1]-width/2, x[1]+width/2, 135, 3, '***')
 add_significance_bar(ax, x[2]-width/2, x[2]+width/2, 155, 3, '***')
 # Customize
 ax.set_ylabel('Activity (% of WT control)')
 ax.set_xticks(x)
 ax.set_xticklabels(categories)
 ax.legend(frameon=False, loc='upper left')
 ax.spines['top'].set_visible(False)
 ax.spines['right'].set_visible(False)
 ax.set_ylim(0, 180)
 # Add note about significance
 ax.text(0.98, 0.02, '*** p < 0.001', transform=ax.transAxes,
        ha='right', va='bottom', fontsize=6)
 fig.tight_layout()
 save_publication_figure(fig, 'grouped_bar_significance')
 plt.show()
 ```
 ## Example 10: Publication-Ready Figure for Nature
 ```python
 import matplotlib.pyplot as plt
 import numpy as np
 from string import ascii_lowercase
 # Nature specifications: 89mm single column
 inch_per_mm = 0.0393701
 width_mm = 89
 height_mm = 110
 figsize = (width_mm * inch_per_mm, height_mm * inch_per_mm)
 fig = plt.figure(figsize=figsize)
 gs = fig.add_gridspec(3, 2, hspace=0.5, wspace=0.4,
                      left=0.12, right=0.95, top=0.96, bottom=0.08)
 # Panel a: Time course
 ax_a = fig.add_subplot(gs[0, :])
 time = np.linspace(0, 48, 100)
 for i, label in enumerate(['Control', 'Treatment']):
    y = (1 + i*0.5) * np.exp(-time/20) * (1 + 0.3*np.sin(time/5))
    ax_a.plot(time, y, linewidth=1.2, label=label)
 ax_a.set_xlabel('Time (h)', fontsize=7)
 ax_a.set_ylabel('Growth (OD$_{600}$)', fontsize=7)
 ax_a.legend(frameon=False, fontsize=6)
 ax_a.tick_params(labelsize=6)
 ax_a.spines['top'].set_visible(False)
 ax_a.spines['right'].set_visible(False)
 # Panel b: Bar plot
 ax_b = fig.add_subplot(gs[1, 0])
 categories = ['A', 'B', 'C']
 values = [1.0, 1.5, 2.2]
 errors = [0.1, 0.15, 0.2]
 ax_b.bar(categories, values, yerr=errors, capsize=2, width=0.6,
         color='#0072B2', alpha=0.8)
 ax_b.set_ylabel('Fold change', fontsize=7)
 ax_b.tick_params(labelsize=6)
 ax_b.spines['top'].set_visible(False)
 ax_b.spines['right'].set_visible(False)
 # Panel c: Heatmap
 ax_c = fig.add_subplot(gs[1, 1])
 data = np.random.randn(8, 6)
 im = ax_c.imshow(data, cmap='viridis', aspect='auto')
 ax_c.set_xlabel('Sample', fontsize=7)
 ax_c.set_ylabel('Gene', fontsize=7)
 ax_c.tick_params(labelsize=6)
 # Panel d: Scatter
 ax_d = fig.add_subplot(gs[2, :])
 x = np.random.randn(50)
 y = 2*x + np.random.randn(50)*0.5
 ax_d.scatter(x, y, s=8, alpha=0.6, color='#E69F00')
 ax_d.set_xlabel('Expression gene X', fontsize=7)
 ax_d.set_ylabel('Expression gene Y', fontsize=7)
 ax_d.tick_params(labelsize=6)
 ax_d.spines['top'].set_visible(False)
 ax_d.spines['right'].set_visible(False)
 # Add lowercase panel labels (Nature style)
 for i, ax in enumerate([ax_a, ax_b, ax_c, ax_d]):
    ax.text(-0.2, 1.1, f'{ascii_lowercase[i]}', transform=ax.transAxes,
            fontsize=9, fontweight='bold', va='top')
 # Save in Nature-preferred format
 fig.savefig('nature_figure.pdf', dpi=1000, bbox_inches='tight',
           facecolor='white', edgecolor='none')
 fig.savefig('nature_figure.png', dpi=300, bbox_inches='tight',
           facecolor='white', edgecolor='none')
 plt.show()
 ```
 ## Tips for Each Library
 ### Matplotlib
 - Use `fig.tight_layout()` or `constrained_layout=True` to prevent overlapping
 - Set DPI to 300-600 for publication
 - Use vector formats (PDF, EPS) for line plots
 - Embed fonts in PDF/EPS files
 ### Seaborn
 - Built on matplotlib, so all matplotlib customizations work
 - Use `sns.set_style('ticks')` or `'whitegrid'` for clean looks
 - `sns.despine()` removes top and right spines
 - Set custom palette with `sns.set_palette()`
 ### Plotly
 - Great for interactive exploratory analysis
 - Export static images with `fig.write_image()` (requires kaleido package)
 - Use `scale` parameter to control DPI (scale=3 ≈ 300 DPI)
 - Update layout extensively for publication quality
 ## Common Workflow
 1. **Explore with default settings**
 2. **Apply publication configuration** (see Setup section)
 3. **Create plot with appropriate size** (check journal requirements)
 4. **Customize colors** (use colorblind-friendly palettes)
 5. **Adjust fonts and line widths** (readable at final size)
 6. **Remove chart junk** (top/right spines, excessive grid)
 7. **Add clear labels with units**
 8. **Test in grayscale**
 9. **Save in multiple formats** (PDF for vector, PNG for raster)
 10. **Verify in final context** (import into manuscript to check size)
 ## Resources
 - Matplotlib documentation: https://matplotlib.org/
 - Seaborn gallery: https://seaborn.pydata.org/examples/index.html
 - Plotly documentation: https://plotly.com/python/
 - Nature Methods Points of View: Data visualization column archive
--- a/scientific-thinking/scientific-visualization/references/publication_guidelines.md
+++ b/scientific-thinking/scientific-visualization/references/publication_guidelines.md
@@ -0,0 +1,205 @@
 # Publication-Ready Figure Guidelines
 ## Core Principles
 Scientific figures must be clear, accurate, and accessible. Publication-ready figures follow these fundamental principles:
 1. **Clarity**: Information should be immediately understandable
 2. **Accuracy**: Data representation must be truthful and unmanipulated
 3. **Accessibility**: Figures should be interpretable by all readers, including those with visual impairments
 4. **Professional**: Clean, polished appearance suitable for peer-reviewed journals
 ## Resolution and File Format
 ### Resolution Requirements
 - **Raster images (photos, microscopy)**: 300-600 DPI at final print size
 - **Line art and graphs**: 600-1200 DPI (or vector format)
 - **Combined figures**: 300-600 DPI
 ### File Formats
 - **Vector formats (preferred for graphs/plots)**: PDF, EPS, SVG
  - Infinitely scalable without quality loss
  - Smaller file sizes for line art
  - Best for: plots, diagrams, schematics
 - **Raster formats**: TIFF, PNG (never JPEG for scientific data)
  - Use for: photographs, microscopy, images with continuous tone
  - TIFF: Lossless, widely accepted
  - PNG: Lossless, good for web and supplementary materials
  - **Never use JPEG**: Lossy compression introduces artifacts
 ### Size Specifications
 - **Single column**: 85-90 mm (3.35-3.54 inches) width
 - **1.5 column**: 114-120 mm (4.49-4.72 inches) width
 - **Double column**: 174-180 mm (6.85-7.08 inches) width
 - **Maximum height**: Usually 230-240 mm (9-9.5 inches)
 ## Typography
 ### Font Guidelines
 - **Font family**: Sans-serif fonts (Arial, Helvetica, Calibri) for most journals
  - Some journals prefer specific fonts (check guidelines)
  - Consistency across all figures in manuscript
 - **Font sizes at final print size**:
  - Axis labels: 7-9 pt minimum
  - Tick labels: 6-8 pt minimum
  - Legends: 6-8 pt
  - Panel labels (A, B, C): 8-12 pt, bold
  - Title: Generally avoided in multi-panel figures
 - **Font weight**: Regular weight for most text; bold for panel labels only
 ### Text Best Practices
 - Use sentence case for axis labels ("Time (hours)" not "TIME (HOURS)")
 - Include units in parentheses
 - Avoid abbreviations unless space-constrained (define in caption)
 - No text smaller than 5-6 pt at final size
 ## Color Usage
 ### Color Selection Principles
 1. **Colorblind-friendly**: ~8% of males have color vision deficiency
   - Avoid red/green combinations
   - Use blue/orange, blue/yellow, or add texture/pattern
   - Test with colorblindness simulators
 2. **Purposeful color**: Color should convey meaning, not just aesthetics
   - Use color to distinguish categories or highlight key data
   - Maintain consistency across figures (same treatment = same color)
 3. **Print considerations**:
   - Colors may appear different in print vs. screen
   - Use CMYK color space for print, RGB for digital
   - Ensure sufficient contrast (especially for grayscale conversion)
 ### Recommended Color Palettes
 - **Qualitative (categories)**: ColorBrewer, Okabe-Ito palette
 - **Sequential (low to high)**: Viridis, Cividis, Blues, Oranges
 - **Diverging (negative to positive)**: RdBu, PuOr, BrBG (ensure colorblind-safe)
 ### Grayscale Compatibility
 - All figures should be interpretable in grayscale
 - Use different line styles (solid, dashed, dotted) and markers
 - Add patterns/hatching to bars and areas
 ## Layout and Composition
 ### Multi-Panel Figures
 - **Panel labels**: Use bold uppercase letters (A, B, C) in top-left corner
 - **Spacing**: Adequate white space between panels
 - **Alignment**: Align panels along edges or axes where possible
 - **Sizing**: Related panels should have consistent sizes
 - **Arrangement**: Logical flow (left-to-right, top-to-bottom)
 ### Plot Elements
 #### Axes
 - **Axis lines**: 0.5-1 pt thickness
 - **Tick marks**: Point inward or outward consistently
 - **Tick frequency**: Enough to read values, not cluttered (typically 4-7 major ticks)
 - **Axis labels**: Required on all plots; state units
 - **Axis ranges**: Start from zero for bar charts (unless scientifically inappropriate)
 #### Lines and Markers
 - **Line width**: 1-2 pt for data lines; 0.5-1 pt for reference lines
 - **Marker size**: 3-6 pt, larger than line width
 - **Marker types**: Differentiate when multiple series (circles, squares, triangles)
 - **Error bars**: 0.5-1 pt width; include caps if appropriate
 #### Legends
 - **Position**: Inside plot area if space permits, outside otherwise
 - **Frame**: Optional; if used, thin line (0.5 pt)
 - **Order**: Match order of data appearance (top to bottom or left to right)
 - **Content**: Concise descriptions; full details in caption
 ### White Space and Margins
 - Remove unnecessary white space around plots
 - Maintain consistent margins
 - `tight_layout()` or `constrained_layout=True` in matplotlib
 ## Data Representation Best Practices
 ### Statistical Rigor
 - **Error bars**: Always show uncertainty (SD, SEM, CI) and state which in caption
 - **Sample size**: Indicate n in figure or caption
 - **Significance**: Mark statistical significance clearly (*, **, ***)
 - **Replicates**: Show individual data points when possible, not just summary statistics
 ### Appropriate Chart Types
 - **Bar plots**: Comparing discrete categories; always start y-axis at zero
 - **Line plots**: Time series or continuous relationships
 - **Scatter plots**: Correlation between variables; add regression line if appropriate
 - **Box plots**: Distribution comparisons; show outliers
 - **Heatmaps**: Matrix data, correlations, expression patterns
 - **Violin plots**: Distribution shape comparison (better than box plots for bimodal data)
 ### Avoiding Distortion
 - **No 3D effects**: Distorts perception of values
 - **No unnecessary decorations**: No gradients, shadows, or chart junk
 - **Consistent scales**: Use same scale for comparable panels
 - **No truncated axes**: Unless clearly indicated and scientifically justified
 - **Linear vs. log scales**: Choose appropriate scale; always label clearly
 ## Accessibility
 ### Colorblind Considerations
 - Test with online simulators (e.g., Coblis, Color Oracle)
 - Use patterns/textures in addition to color
 - Provide alternative representations in supplementary materials if needed
 ### Visual Impairment
 - High contrast between elements
 - Thick enough lines (minimum 0.5 pt)
 - Clear, uncluttered layouts
 ### Data Availability
 - Include data tables in supplementary materials
 - Provide source data files for graphs
 - Consider interactive figures for online supplementary materials
 ## Common Mistakes to Avoid
 1. **Font too small**: Text unreadable at final print size
 2. **Low resolution**: Pixelated or blurry images
 3. **Chart junk**: Unnecessary grid lines, 3D effects, decorations
 4. **Poor color choices**: Red/green combinations, low contrast
 5. **Missing elements**: No axis labels, no units, no error bars
 6. **Inconsistent styling**: Different fonts/sizes within figure or between figures
 7. **Data distortion**: Truncated axes, inappropriate scales, 3D effects
 8. **JPEG compression**: Artifacts around text and lines
 9. **Too much information**: Cramming too many data series into one plot
 10. **Inaccessible legends**: Legends outside the figure boundary after export
 ## Figure Checklist
 Before submission, verify:
 - [ ] Resolution meets journal requirements (300+ DPI for raster)
 - [ ] File format is acceptable (vector for plots, TIFF/PNG for images)
 - [ ] Figure dimensions match journal specifications
 - [ ] All text is readable at final size (minimum 6-7 pt)
 - [ ] Fonts are consistent and embedded (for PDF/EPS)
 - [ ] Colors are colorblind-friendly
 - [ ] Figure is interpretable in grayscale
 - [ ] All axes are labeled with units
 - [ ] Error bars or uncertainty indicators are present
 - [ ] Statistical significance is marked if applicable
 - [ ] Panel labels are present and consistent (A, B, C)
 - [ ] Legend is clear and complete
 - [ ] No chart junk or unnecessary elements
 - [ ] File naming follows journal conventions
 - [ ] Figure caption is comprehensive
 - [ ] Source data is available
 ## Journal-Specific Considerations
 Always consult the specific journal's author guidelines. Common variations include:
 - **Nature journals**: RGB, 300 DPI minimum, specific size requirements
 - **Science**: EPS or high-res TIFF, specific font requirements
 - **Cell Press**: PDF or EPS preferred, Arial or Helvetica fonts
 - **PLOS**: TIFF or EPS, specific color space requirements
 - **ACS journals**: Application files (AI, EPS) or high-res TIFF
 See `journal_requirements.md` for detailed specifications from major publishers.
--- a/scientific-thinking/scientific-visualization/scripts/figure_export.py
+++ b/scientific-thinking/scientific-visualization/scripts/figure_export.py
@@ -0,0 +1,343 @@
 #!/usr/bin/env python3
 """
 Figure Export Utilities for Publication-Ready Scientific Figures
 This module provides utilities to export matplotlib figures in publication-ready
 formats with appropriate settings for various journals.
 """
 import matplotlib.pyplot as plt
 from pathlib import Path
 from typing import List, Optional, Union
 def save_publication_figure(
    fig: plt.Figure,
    filename: Union[str, Path],
    formats: List[str] = ['pdf', 'png'],
    dpi: int = 300,
    transparent: bool = False,
    bbox_inches: str = 'tight',
    pad_inches: float = 0.1,
    facecolor: str = 'white',
    **kwargs
 ) -> List[Path]:
    """
    Save a matplotlib figure in multiple formats with publication-quality settings.
    Parameters
    ----------
    fig : matplotlib.figure.Figure
        The figure to save
    filename : str or Path
        Base filename (without extension)
    formats : list of str, default ['pdf', 'png']
        List of file formats to save. Options: 'pdf', 'png', 'eps', 'svg', 'tiff'
    dpi : int, default 300
        Resolution for raster formats (png, tiff). 300 DPI is minimum for most journals
    transparent : bool, default False
        If True, save with transparent background
    bbox_inches : str, default 'tight'
        Bounding box specification. 'tight' removes excess whitespace
    pad_inches : float, default 0.1
        Padding around the figure when bbox_inches='tight'
    facecolor : str, default 'white'
        Background color (ignored if transparent=True)
    **kwargs
        Additional keyword arguments passed to fig.savefig()
    Returns
    -------
    list of Path
        List of paths to saved files
    Examples
    --------
    >>> fig, ax = plt.subplots()
    >>> ax.plot([1, 2, 3], [1, 4, 9])
    >>> save_publication_figure(fig, 'my_plot', formats=['pdf', 'png'], dpi=600)
    ['my_plot.pdf', 'my_plot.png']
    """
    filename = Path(filename)
    base_name = filename.stem
    output_dir = filename.parent if filename.parent.exists() else Path.cwd()
    saved_files = []
    for fmt in formats:
        output_file = output_dir / f"{base_name}.{fmt}"
        # Set format-specific parameters
        save_kwargs = {
            'dpi': dpi,
            'bbox_inches': bbox_inches,
            'pad_inches': pad_inches,
            'facecolor': facecolor if not transparent else 'none',
            'edgecolor': 'none',
            'transparent': transparent,
            'format': fmt,
        }
        # Update with user-provided kwargs
        save_kwargs.update(kwargs)
        # Adjust DPI for vector formats (DPI less relevant)
        if fmt in ['pdf', 'eps', 'svg']:
            save_kwargs['dpi'] = min(dpi, 300)  # Lower DPI for embedded rasters in vector
        try:
            fig.savefig(output_file, **save_kwargs)
            saved_files.append(output_file)
            print(f"✓ Saved: {output_file}")
        except Exception as e:
            print(f"✗ Failed to save {output_file}: {e}")
    return saved_files
 def save_for_journal(
    fig: plt.Figure,
    filename: Union[str, Path],
    journal: str,
    figure_type: str = 'combination'
 ) -> List[Path]:
    """
    Save figure with journal-specific requirements.
    Parameters
    ----------
    fig : matplotlib.figure.Figure
        The figure to save
    filename : str or Path
        Base filename (without extension)
    journal : str
        Journal name. Options: 'nature', 'science', 'cell', 'plos', 'acs', 'ieee'
    figure_type : str, default 'combination'
        Type of figure. Options: 'line_art', 'photo', 'combination'
    Returns
    -------
    list of Path
        List of paths to saved files
    Examples
    --------
    >>> fig, ax = plt.subplots()
    >>> ax.plot([1, 2, 3], [1, 4, 9])
    >>> save_for_journal(fig, 'figure1', journal='nature', figure_type='line_art')
    """
    journal = journal.lower()
    # Define journal-specific requirements
    journal_specs = {
        'nature': {
            'line_art': {'formats': ['pdf', 'eps'], 'dpi': 1000},
            'photo': {'formats': ['tiff'], 'dpi': 300},
            'combination': {'formats': ['pdf'], 'dpi': 600},
        },
        'science': {
            'line_art': {'formats': ['eps', 'pdf'], 'dpi': 1000},
            'photo': {'formats': ['tiff'], 'dpi': 300},
            'combination': {'formats': ['eps'], 'dpi': 600},
        },
        'cell': {
            'line_art': {'formats': ['pdf', 'eps'], 'dpi': 1000},
            'photo': {'formats': ['tiff'], 'dpi': 300},
            'combination': {'formats': ['pdf'], 'dpi': 600},
        },
        'plos': {
            'line_art': {'formats': ['pdf', 'eps'], 'dpi': 600},
            'photo': {'formats': ['tiff', 'png'], 'dpi': 300},
            'combination': {'formats': ['tiff'], 'dpi': 300},
        },
        'acs': {
            'line_art': {'formats': ['tiff', 'pdf'], 'dpi': 600},
            'photo': {'formats': ['tiff'], 'dpi': 300},
            'combination': {'formats': ['tiff'], 'dpi': 600},
        },
        'ieee': {
            'line_art': {'formats': ['pdf', 'eps'], 'dpi': 600},
            'photo': {'formats': ['tiff'], 'dpi': 300},
            'combination': {'formats': ['pdf'], 'dpi': 300},
        },
    }
    if journal not in journal_specs:
        available = ', '.join(journal_specs.keys())
        raise ValueError(f"Journal '{journal}' not recognized. Available: {available}")
    if figure_type not in journal_specs[journal]:
        available = ', '.join(journal_specs[journal].keys())
        raise ValueError(f"Figure type '{figure_type}' not valid. Available: {available}")
    specs = journal_specs[journal][figure_type]
    print(f"Saving for {journal.upper()} ({figure_type}):")
    print(f"  Formats: {', '.join(specs['formats'])}")
    print(f"  DPI: {specs['dpi']}")
    return save_publication_figure(
        fig=fig,
        filename=filename,
        formats=specs['formats'],
        dpi=specs['dpi']
    )
 def check_figure_size(fig: plt.Figure, journal: str = 'nature') -> dict:
    """
    Check if figure dimensions are appropriate for journal requirements.
    Parameters
    ----------
    fig : matplotlib.figure.Figure
        The figure to check
    journal : str, default 'nature'
        Journal name
    Returns
    -------
    dict
        Dictionary with figure dimensions and compliance status
    Examples
    --------
    >>> fig = plt.figure(figsize=(3.5, 3))
    >>> info = check_figure_size(fig, journal='nature')
    >>> print(info)
    """
    journal = journal.lower()
    # Get figure dimensions in inches
    width_inches, height_inches = fig.get_size_inches()
    width_mm = width_inches * 25.4
    height_mm = height_inches * 25.4
    # Journal specifications (widths in mm)
    specs = {
        'nature': {'single': 89, 'double': 183, 'max_height': 247},
        'science': {'single': 55, 'double': 175, 'max_height': 233},
        'cell': {'single': 85, 'double': 178, 'max_height': 230},
        'plos': {'single': 83, 'double': 173, 'max_height': 233},
        'acs': {'single': 82.5, 'double': 178, 'max_height': 247},
    }
    if journal not in specs:
        journal_spec = specs['nature']
        print(f"Warning: Journal '{journal}' not found, using Nature specifications")
    else:
        journal_spec = specs[journal]
    # Determine column type
    column_type = None
    width_ok = False
    tolerance = 5  # mm tolerance
    if abs(width_mm - journal_spec['single']) < tolerance:
        column_type = 'single'
        width_ok = True
    elif abs(width_mm - journal_spec['double']) < tolerance:
        column_type = 'double'
        width_ok = True
    height_ok = height_mm <= journal_spec['max_height']
    result = {
        'width_inches': width_inches,
        'height_inches': height_inches,
        'width_mm': width_mm,
        'height_mm': height_mm,
        'journal': journal,
        'column_type': column_type,
        'width_ok': width_ok,
        'height_ok': height_ok,
        'compliant': width_ok and height_ok,
        'recommendations': {
            'single_column_mm': journal_spec['single'],
            'double_column_mm': journal_spec['double'],
            'max_height_mm': journal_spec['max_height'],
        }
    }
    # Print report
    print(f"\n{'='*60}")
    print(f"Figure Size Check for {journal.upper()}")
    print(f"{'='*60}")
    print(f"Current size: {width_mm:.1f} × {height_mm:.1f} mm")
    print(f"              ({width_inches:.2f} × {height_inches:.2f} inches)")
    print(f"\n{journal.upper()} specifications:")
    print(f"  Single column: {journal_spec['single']} mm")
    print(f"  Double column: {journal_spec['double']} mm")
    print(f"  Max height: {journal_spec['max_height']} mm")
    print(f"\nCompliance:")
    print(f"  Width: {'✓ OK' if width_ok else '✗ Non-standard'} ({column_type or 'custom'})")
    print(f"  Height: {'✓ OK' if height_ok else '✗ Too tall'}")
    print(f"  Overall: {'✓ COMPLIANT' if result['compliant'] else '✗ NEEDS ADJUSTMENT'}")
    print(f"{'='*60}\n")
    return result
 def verify_font_embedding(pdf_path: Union[str, Path]) -> bool:
    """
    Check if fonts are embedded in a PDF file.
    Note: This requires PyPDF2 or a similar library to be installed.
    Parameters
    ----------
    pdf_path : str or Path
        Path to PDF file
    Returns
    -------
    bool
        True if fonts are embedded, False otherwise
    """
    try:
        from PyPDF2 import PdfReader
    except ImportError:
        print("Warning: PyPDF2 not installed. Cannot verify font embedding.")
        print("Install with: pip install PyPDF2")
        return None
    pdf_path = Path(pdf_path)
    try:
        reader = PdfReader(pdf_path)
        # This is a simplified check; full verification is complex
        print(f"PDF has {len(reader.pages)} page(s)")
        print("Note: Full font embedding verification requires detailed PDF inspection.")
        return True
    except Exception as e:
        print(f"Error reading PDF: {e}")
        return False
 if __name__ == "__main__":
    # Example usage
    import numpy as np
    # Create example figure
    fig, ax = plt.subplots(figsize=(3.5, 2.5))
    x = np.linspace(0, 10, 100)
    ax.plot(x, np.sin(x), label='sin(x)')
    ax.plot(x, np.cos(x), label='cos(x)')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    # Check size
    check_figure_size(fig, journal='nature')
    # Save in multiple formats
    print("\nSaving figure...")
    save_publication_figure(fig, 'example_figure', formats=['pdf', 'png'], dpi=300)
    # Save with journal-specific requirements
    print("\nSaving for Nature...")
    save_for_journal(fig, 'example_figure_nature', journal='nature', figure_type='line_art')
    plt.close(fig)
--- a/scientific-thinking/scientific-visualization/scripts/style_presets.py
+++ b/scientific-thinking/scientific-visualization/scripts/style_presets.py
@@ -0,0 +1,416 @@
 #!/usr/bin/env python3
 """
 Matplotlib Style Presets for Publication-Ready Scientific Figures
 This module provides pre-configured matplotlib styles optimized for
 different journals and use cases.
 """
 import matplotlib.pyplot as plt
 import matplotlib as mpl
 from typing import Optional, Dict, Any
 # Okabe-Ito colorblind-friendly palette
 OKABE_ITO_COLORS = [
    '#E69F00',  # Orange
    '#56B4E9',  # Sky Blue
    '#009E73',  # Bluish Green
    '#F0E442',  # Yellow
    '#0072B2',  # Blue
    '#D55E00',  # Vermillion
    '#CC79A7',  # Reddish Purple
    '#000000'   # Black
 ]
 # Paul Tol palettes
 TOL_BRIGHT = ['#4477AA', '#EE6677', '#228833', '#CCBB44', '#66CCEE', '#AA3377', '#BBBBBB']
 TOL_MUTED = ['#332288', '#88CCEE', '#44AA99', '#117733', '#999933', '#DDCC77', '#CC6677', '#882255', '#AA4499']
 TOL_HIGH_CONTRAST = ['#004488', '#DDAA33', '#BB5566']
 # Wong palette
 WONG_COLORS = ['#000000', '#E69F00', '#56B4E9', '#009E73', '#F0E442', '#0072B2', '#D55E00', '#CC79A7']
 def get_base_style() -> Dict[str, Any]:
    """
    Get base publication-quality style settings.
    Returns
    -------
    dict
        Dictionary of matplotlib rcParams
    """
    return {
        # Figure
        'figure.dpi': 100,  # Display DPI (changed on save)
        'figure.facecolor': 'white',
        'figure.autolayout': False,
        'figure.constrained_layout.use': True,
        # Font
        'font.size': 8,
        'font.family': 'sans-serif',
        'font.sans-serif': ['Arial', 'Helvetica', 'DejaVu Sans'],
        # Axes
        'axes.linewidth': 0.5,
        'axes.labelsize': 9,
        'axes.titlesize': 9,
        'axes.labelweight': 'normal',
        'axes.spines.top': False,
        'axes.spines.right': False,
        'axes.spines.left': True,
        'axes.spines.bottom': True,
        'axes.edgecolor': 'black',
        'axes.labelcolor': 'black',
        'axes.axisbelow': True,
        'axes.prop_cycle': mpl.cycler(color=OKABE_ITO_COLORS),
        # Grid
        'axes.grid': False,
        # Ticks
        'xtick.major.size': 3,
        'xtick.minor.size': 2,
        'xtick.major.width': 0.5,
        'xtick.minor.width': 0.5,
        'xtick.labelsize': 7,
        'xtick.direction': 'out',
        'ytick.major.size': 3,
        'ytick.minor.size': 2,
        'ytick.major.width': 0.5,
        'ytick.minor.width': 0.5,
        'ytick.labelsize': 7,
        'ytick.direction': 'out',
        # Lines
        'lines.linewidth': 1.5,
        'lines.markersize': 4,
        'lines.markeredgewidth': 0.5,
        # Legend
        'legend.fontsize': 7,
        'legend.frameon': False,
        'legend.loc': 'best',
        # Savefig
        'savefig.dpi': 300,
        'savefig.format': 'pdf',
        'savefig.bbox': 'tight',
        'savefig.pad_inches': 0.05,
        'savefig.transparent': False,
        'savefig.facecolor': 'white',
        # Image
        'image.cmap': 'viridis',
        'image.aspect': 'auto',
    }
 def apply_publication_style(style_name: str = 'default') -> None:
    """
    Apply a pre-configured publication style.
    Parameters
    ----------
    style_name : str, default 'default'
        Name of the style to apply. Options:
        - 'default': General publication style
        - 'nature': Nature journal style
        - 'science': Science journal style
        - 'cell': Cell Press style
        - 'minimal': Minimal clean style
        - 'presentation': Larger fonts for presentations
    Examples
    --------
    >>> apply_publication_style('nature')
    >>> fig, ax = plt.subplots()
    >>> ax.plot([1, 2, 3], [1, 4, 9])
    """
    base_style = get_base_style()
    # Style-specific modifications
    if style_name == 'nature':
        base_style.update({
            'font.size': 7,
            'axes.labelsize': 8,
            'axes.titlesize': 8,
            'xtick.labelsize': 6,
            'ytick.labelsize': 6,
            'legend.fontsize': 6,
            'savefig.dpi': 600,
        })
    elif style_name == 'science':
        base_style.update({
            'font.size': 7,
            'axes.labelsize': 8,
            'xtick.labelsize': 6,
            'ytick.labelsize': 6,
            'legend.fontsize': 6,
            'savefig.dpi': 600,
        })
    elif style_name == 'cell':
        base_style.update({
            'font.size': 8,
            'axes.labelsize': 9,
            'xtick.labelsize': 7,
            'ytick.labelsize': 7,
            'legend.fontsize': 7,
            'savefig.dpi': 600,
        })
    elif style_name == 'minimal':
        base_style.update({
            'axes.linewidth': 0.8,
            'xtick.major.width': 0.8,
            'ytick.major.width': 0.8,
            'lines.linewidth': 2,
        })
    elif style_name == 'presentation':
        base_style.update({
            'font.size': 14,
            'axes.labelsize': 16,
            'axes.titlesize': 18,
            'xtick.labelsize': 12,
            'ytick.labelsize': 12,
            'legend.fontsize': 12,
            'axes.linewidth': 1.5,
            'lines.linewidth': 2.5,
            'lines.markersize': 8,
        })
    elif style_name != 'default':
        print(f"Warning: Style '{style_name}' not recognized. Using 'default'.")
    # Apply the style
    plt.rcParams.update(base_style)
    print(f"✓ Applied '{style_name}' publication style")
 def set_color_palette(palette_name: str = 'okabe_ito') -> None:
    """
    Set a colorblind-friendly color palette.
    Parameters
    ----------
    palette_name : str, default 'okabe_ito'
        Name of the palette. Options:
        - 'okabe_ito': Okabe-Ito palette (8 colors)
        - 'wong': Wong palette (8 colors)
        - 'tol_bright': Paul Tol bright palette (7 colors)
        - 'tol_muted': Paul Tol muted palette (9 colors)
        - 'tol_high_contrast': Paul Tol high contrast (3 colors)
    Examples
    --------
    >>> set_color_palette('tol_muted')
    >>> fig, ax = plt.subplots()
    >>> for i in range(5):
    ...     ax.plot([1, 2, 3], [i, i+1, i+2])
    """
    palettes = {
        'okabe_ito': OKABE_ITO_COLORS,
        'wong': WONG_COLORS,
        'tol_bright': TOL_BRIGHT,
        'tol_muted': TOL_MUTED,
        'tol_high_contrast': TOL_HIGH_CONTRAST,
    }
    if palette_name not in palettes:
        available = ', '.join(palettes.keys())
        print(f"Warning: Palette '{palette_name}' not found. Available: {available}")
        palette_name = 'okabe_ito'
    colors = palettes[palette_name]
    plt.rcParams['axes.prop_cycle'] = plt.cycler(color=colors)
    print(f"✓ Applied '{palette_name}' color palette ({len(colors)} colors)")
 def configure_for_journal(journal: str, figure_width: str = 'single') -> None:
    """
    Configure matplotlib for a specific journal.
    Parameters
    ----------
    journal : str
        Journal name: 'nature', 'science', 'cell', 'plos', 'acs', 'ieee'
    figure_width : str, default 'single'
        Figure width: 'single' or 'double' column
    Examples
    --------
    >>> configure_for_journal('nature', figure_width='single')
    >>> fig, ax = plt.subplots()  # Will have correct size for Nature
    """
    journal = journal.lower()
    # Journal specifications
    journal_configs = {
        'nature': {
            'single_width': 89,  # mm
            'double_width': 183,
            'style': 'nature',
        },
        'science': {
            'single_width': 55,
            'double_width': 175,
            'style': 'science',
        },
        'cell': {
            'single_width': 85,
            'double_width': 178,
            'style': 'cell',
        },
        'plos': {
            'single_width': 83,
            'double_width': 173,
            'style': 'default',
        },
        'acs': {
            'single_width': 82.5,
            'double_width': 178,
            'style': 'default',
        },
        'ieee': {
            'single_width': 89,
            'double_width': 182,
            'style': 'default',
        },
    }
    if journal not in journal_configs:
        available = ', '.join(journal_configs.keys())
        raise ValueError(f"Journal '{journal}' not recognized. Available: {available}")
    config = journal_configs[journal]
    # Apply style
    apply_publication_style(config['style'])
    # Set default figure size
    width_mm = config['single_width'] if figure_width == 'single' else config['double_width']
    width_inches = width_mm / 25.4
    plt.rcParams['figure.figsize'] = (width_inches, width_inches * 0.75)  # 4:3 aspect ratio
    print(f"✓ Configured for {journal.upper()} ({figure_width} column: {width_mm} mm)")
 def create_style_template(output_file: str = 'publication.mplstyle') -> None:
    """
    Create a matplotlib style file that can be used with plt.style.use().
    Parameters
    ----------
    output_file : str, default 'publication.mplstyle'
        Output filename for the style file
    Examples
    --------
    >>> create_style_template('my_style.mplstyle')
    >>> plt.style.use('my_style.mplstyle')
    """
    style = get_base_style()
    with open(output_file, 'w') as f:
        f.write("# Publication-quality matplotlib style\n")
        f.write("# Usage: plt.style.use('publication.mplstyle')\n\n")
        for key, value in style.items():
            if isinstance(value, mpl.cycler):
                # Handle cycler specially
                colors = [c['color'] for c in value]
                f.write(f"axes.prop_cycle : cycler('color', {colors})\n")
            else:
                f.write(f"{key} : {value}\n")
    print(f"✓ Created style template: {output_file}")
    print(f"  Use with: plt.style.use('{output_file}')")
 def show_color_palettes() -> None:
    """
    Display available color palettes for visual inspection.
    """
    palettes = {
        'Okabe-Ito': OKABE_ITO_COLORS,
        'Wong': WONG_COLORS,
        'Tol Bright': TOL_BRIGHT,
        'Tol Muted': TOL_MUTED,
        'Tol High Contrast': TOL_HIGH_CONTRAST,
    }
    fig, axes = plt.subplots(len(palettes), 1, figsize=(8, len(palettes) * 0.5))
    for ax, (name, colors) in zip(axes, palettes.items()):
        ax.set_xlim(0, len(colors))
        ax.set_ylim(0, 1)
        ax.set_yticks([])
        ax.set_xticks([])
        ax.set_ylabel(name, fontsize=10)
        for i, color in enumerate(colors):
            ax.add_patch(plt.Rectangle((i, 0), 1, 1, facecolor=color, edgecolor='black', linewidth=0.5))
            # Add hex code
            ax.text(i + 0.5, 0.5, color, ha='center', va='center',
                   fontsize=7, color='white' if i >= len(colors) - 1 else 'black')
    fig.suptitle('Colorblind-Friendly Palettes', fontsize=12, fontweight='bold')
    plt.tight_layout()
    plt.show()
 def reset_to_default() -> None:
    """
    Reset matplotlib to default settings.
    """
    mpl.rcdefaults()
    print("✓ Reset to matplotlib defaults")
 if __name__ == "__main__":
    print("Matplotlib Style Presets for Scientific Figures")
    print("=" * 50)
    # Show available styles
    print("\nAvailable publication styles:")
    print("  - default")
    print("  - nature")
    print("  - science")
    print("  - cell")
    print("  - minimal")
    print("  - presentation")
    print("\nAvailable color palettes:")
    print("  - okabe_ito (recommended)")
    print("  - wong")
    print("  - tol_bright")
    print("  - tol_muted")
    print("  - tol_high_contrast")
    print("\nExample usage:")
    print("  from style_presets import apply_publication_style, set_color_palette")
    print("  apply_publication_style('nature')")
    print("  set_color_palette('okabe_ito')")
    # Create example figure
    print("\nGenerating example figure with 'default' style...")
    apply_publication_style('default')
    fig, ax = plt.subplots(figsize=(3.5, 2.5))
    for i in range(5):
        ax.plot([1, 2, 3, 4], [i, i+1, i+0.5, i+2], marker='o', label=f'Series {i+1}')
    ax.set_xlabel('Time (hours)')
    ax.set_ylabel('Response (AU)')
    ax.legend()
    fig.suptitle('Example with Publication Style')
    plt.tight_layout()
    plt.show()
    # Show color palettes
    print("\nDisplaying color palettes...")
    show_color_palettes()
--- a/scientific-thinking/statistical-analysis/SKILL.md
+++ b/scientific-thinking/statistical-analysis/SKILL.md
@@ -0,0 +1,615 @@
 ---
 name: statistical-analysis
 description: Toolkit for rigorous academic-grade statistical analysis using Python. Perform hypothesis testing (t-tests, ANOVA, chi-square), regression analysis (linear, logistic), and Bayesian statistics with comprehensive assumption checking, effect sizes, power analysis, and publication-ready reporting. Use this skill when conducting statistical analyses for research, requiring proper diagnostics, effect size interpretation, or following APA reporting standards.
 ---
 # Statistical Analysis
 ## Overview
 Conduct rigorous, publication-quality statistical analyses with comprehensive assumption checking, effect size calculations, and proper reporting. This skill provides systematic workflows for selecting appropriate statistical tests, validating assumptions, interpreting results, and reporting findings according to academic standards (APA style).
 ---
 ## Core Capabilities
 ### 1. Test Selection and Planning
 - Choose appropriate statistical tests based on research questions and data characteristics
 - Conduct a priori power analyses to determine required sample sizes
 - Plan analysis strategies including multiple comparison corrections
 ### 2. Assumption Checking
 - Automatically verify all relevant assumptions before running tests
 - Provide diagnostic visualizations (Q-Q plots, residual plots, box plots)
 - Recommend remedial actions when assumptions are violated
 ### 3. Statistical Testing
 - Hypothesis testing: t-tests, ANOVA, chi-square, non-parametric alternatives
 - Regression: linear, multiple, logistic, with diagnostics
 - Correlations: Pearson, Spearman, with confidence intervals
 - Bayesian alternatives: Bayesian t-tests, ANOVA, regression with Bayes Factors
 ### 4. Effect Sizes and Interpretation
 - Calculate and interpret appropriate effect sizes for all analyses
 - Provide confidence intervals for effect estimates
 - Distinguish statistical from practical significance
 ### 5. Professional Reporting
 - Generate APA-style statistical reports
 - Create publication-ready figures and tables
 - Provide complete interpretation with all required statistics
 ---
 ## Workflow Decision Tree
 Use this decision tree to determine your analysis path:
 ```
 START
 │
 ├─ Do you need to SELECT a statistical test?
 │  └─ YES → See "Test Selection Guide"
 │  └─ NO → Continue
 │
 ├─ Are you ready to check ASSUMPTIONS?
 │  └─ YES → See "Assumption Checking"
 │  └─ NO → Continue
 │
 ├─ Ready to run ANALYSIS?
 │  └─ YES → See "Running Statistical Tests"
 │  └─ NO → Continue
 │
 └─ Need to REPORT results?
   └─ YES → See "Reporting Results"
 ```
 ---
 ## Test Selection Guide
 ### Quick Reference: Choosing the Right Test
 Use `references/test_selection_guide.md` for comprehensive guidance. Quick reference:
 **Comparing Two Groups:**
 - Independent, continuous, normal → Independent t-test
 - Independent, continuous, non-normal → Mann-Whitney U test
 - Paired, continuous, normal → Paired t-test
 - Paired, continuous, non-normal → Wilcoxon signed-rank test
 - Binary outcome → Chi-square or Fisher's exact test
 **Comparing 3+ Groups:**
 - Independent, continuous, normal → One-way ANOVA
 - Independent, continuous, non-normal → Kruskal-Wallis test
 - Paired, continuous, normal → Repeated measures ANOVA
 - Paired, continuous, non-normal → Friedman test
 **Relationships:**
 - Two continuous variables → Pearson (normal) or Spearman correlation (non-normal)
 - Continuous outcome with predictor(s) → Linear regression
 - Binary outcome with predictor(s) → Logistic regression
 **Bayesian Alternatives:**
 All tests have Bayesian versions that provide:
 - Direct probability statements about hypotheses
 - Bayes Factors quantifying evidence
 - Ability to support null hypothesis
 - See `references/bayesian_statistics.md`
 ---
 ## Assumption Checking
 ### Systematic Assumption Verification
 **ALWAYS check assumptions before interpreting test results.**
 Use the provided `scripts/assumption_checks.py` module for automated checking:
 ```python
 from scripts.assumption_checks import comprehensive_assumption_check
 # Comprehensive check with visualizations
 results = comprehensive_assumption_check(
    data=df,
    value_col='score',
    group_col='group',  # Optional: for group comparisons
    alpha=0.05
 )
 ```
 This performs:
 1. **Outlier detection** (IQR and z-score methods)
 2. **Normality testing** (Shapiro-Wilk test + Q-Q plots)
 3. **Homogeneity of variance** (Levene's test + box plots)
 4. **Interpretation and recommendations**
 ### Individual Assumption Checks
 For targeted checks, use individual functions:
 ```python
 from scripts.assumption_checks import (
    check_normality,
    check_normality_per_group,
    check_homogeneity_of_variance,
    check_linearity,
    detect_outliers
 )
 # Example: Check normality with visualization
 result = check_normality(
    data=df['score'],
    name='Test Score',
    alpha=0.05,
    plot=True
 )
 print(result['interpretation'])
 print(result['recommendation'])
 ```
 ### What to Do When Assumptions Are Violated
 **Normality violated:**
 - Mild violation + n > 30 per group → Proceed with parametric test (robust)
 - Moderate violation → Use non-parametric alternative
 - Severe violation → Transform data or use non-parametric test
 **Homogeneity of variance violated:**
 - For t-test → Use Welch's t-test
 - For ANOVA → Use Welch's ANOVA or Brown-Forsythe ANOVA
 - For regression → Use robust standard errors or weighted least squares
 **Linearity violated (regression):**
 - Add polynomial terms
 - Transform variables
 - Use non-linear models or GAM
 See `references/assumptions_and_diagnostics.md` for comprehensive guidance.
 ---
 ## Running Statistical Tests
 ### Python Libraries
 Primary libraries for statistical analysis:
 - **scipy.stats**: Core statistical tests
 - **statsmodels**: Advanced regression and diagnostics
 - **pingouin**: User-friendly statistical testing with effect sizes
 - **pymc**: Bayesian statistical modeling
 - **arviz**: Bayesian visualization and diagnostics
 ### Example Analyses
 #### T-Test with Complete Reporting
 ```python
 import pingouin as pg
 import numpy as np
 # Run independent t-test
 result = pg.ttest(group_a, group_b, correction='auto')
 # Extract results
 t_stat = result['T'].values[0]
 df = result['dof'].values[0]
 p_value = result['p-val'].values[0]
 cohens_d = result['cohen-d'].values[0]
 ci_lower = result['CI95%'].values[0][0]
 ci_upper = result['CI95%'].values[0][1]
 # Report
 print(f"t({df:.0f}) = {t_stat:.2f}, p = {p_value:.3f}")
 print(f"Cohen's d = {cohens_d:.2f}, 95% CI [{ci_lower:.2f}, {ci_upper:.2f}]")
 ```
 #### ANOVA with Post-Hoc Tests
 ```python
 import pingouin as pg
 # One-way ANOVA
 aov = pg.anova(dv='score', between='group', data=df, detailed=True)
 print(aov)
 # If significant, conduct post-hoc tests
 if aov['p-unc'].values[0] < 0.05:
    posthoc = pg.pairwise_tukey(dv='score', between='group', data=df)
    print(posthoc)
 # Effect size
 eta_squared = aov['np2'].values[0]  # Partial eta-squared
 print(f"Partial η² = {eta_squared:.3f}")
 ```
 #### Linear Regression with Diagnostics
 ```python
 import statsmodels.api as sm
 from statsmodels.stats.outliers_influence import variance_inflation_factor
 # Fit model
 X = sm.add_constant(X_predictors)  # Add intercept
 model = sm.OLS(y, X).fit()
 # Summary
 print(model.summary())
 # Check multicollinearity (VIF)
 vif_data = pd.DataFrame()
 vif_data["Variable"] = X.columns
 vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
 print(vif_data)
 # Check assumptions
 residuals = model.resid
 fitted = model.fittedvalues
 # Residual plots
 import matplotlib.pyplot as plt
 fig, axes = plt.subplots(2, 2, figsize=(12, 10))
 # Residuals vs fitted
 axes[0, 0].scatter(fitted, residuals, alpha=0.6)
 axes[0, 0].axhline(y=0, color='r', linestyle='--')
 axes[0, 0].set_xlabel('Fitted values')
 axes[0, 0].set_ylabel('Residuals')
 axes[0, 0].set_title('Residuals vs Fitted')
 # Q-Q plot
 from scipy import stats
 stats.probplot(residuals, dist="norm", plot=axes[0, 1])
 axes[0, 1].set_title('Normal Q-Q')
 # Scale-Location
 axes[1, 0].scatter(fitted, np.sqrt(np.abs(residuals / residuals.std())), alpha=0.6)
 axes[1, 0].set_xlabel('Fitted values')
 axes[1, 0].set_ylabel('√|Standardized residuals|')
 axes[1, 0].set_title('Scale-Location')
 # Residuals histogram
 axes[1, 1].hist(residuals, bins=20, edgecolor='black', alpha=0.7)
 axes[1, 1].set_xlabel('Residuals')
 axes[1, 1].set_ylabel('Frequency')
 axes[1, 1].set_title('Histogram of Residuals')
 plt.tight_layout()
 plt.show()
 ```
 #### Bayesian T-Test
 ```python
 import pymc as pm
 import arviz as az
 import numpy as np
 with pm.Model() as model:
    # Priors
    mu1 = pm.Normal('mu_group1', mu=0, sigma=10)
    mu2 = pm.Normal('mu_group2', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)
    # Likelihood
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group_a)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group_b)
    # Derived quantity
    diff = pm.Deterministic('difference', mu1 - mu2)
    # Sample
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)
 # Summarize
 print(az.summary(trace, var_names=['difference']))
 # Probability that group1 > group2
 prob_greater = np.mean(trace.posterior['difference'].values > 0)
 print(f"P(μ₁ > μ₂ | data) = {prob_greater:.3f}")
 # Plot posterior
 az.plot_posterior(trace, var_names=['difference'], ref_val=0)
 ```
 ---
 ## Effect Sizes
 ### Always Calculate Effect Sizes
 **Effect sizes quantify magnitude, while p-values only indicate existence of an effect.**
 See `references/effect_sizes_and_power.md` for comprehensive guidance.
 ### Quick Reference: Common Effect Sizes
 | Test | Effect Size | Small | Medium | Large |
 |------|-------------|-------|--------|-------|
 | T-test | Cohen's d | 0.20 | 0.50 | 0.80 |
 | ANOVA | η²_p | 0.01 | 0.06 | 0.14 |
 | Correlation | r | 0.10 | 0.30 | 0.50 |
 | Regression | R² | 0.02 | 0.13 | 0.26 |
 | Chi-square | Cramér's V | 0.07 | 0.21 | 0.35 |
 **Important**: Benchmarks are guidelines. Context matters!
 ### Calculating Effect Sizes
 Most effect sizes are automatically calculated by pingouin:
 ```python
 # T-test returns Cohen's d
 result = pg.ttest(x, y)
 d = result['cohen-d'].values[0]
 # ANOVA returns partial eta-squared
 aov = pg.anova(dv='score', between='group', data=df)
 eta_p2 = aov['np2'].values[0]
 # Correlation: r is already an effect size
 corr = pg.corr(x, y)
 r = corr['r'].values[0]
 ```
 ### Confidence Intervals for Effect Sizes
 Always report CIs to show precision:
 ```python
 from pingouin import compute_effsize_from_t
 # For t-test
 d, ci = compute_effsize_from_t(
    t_statistic,
    nx=len(group1),
    ny=len(group2),
    eftype='cohen'
 )
 print(f"d = {d:.2f}, 95% CI [{ci[0]:.2f}, {ci[1]:.2f}]")
 ```
 ---
 ## Power Analysis
 ### A Priori Power Analysis (Study Planning)
 Determine required sample size before data collection:
 ```python
 from statsmodels.stats.power import (
    tt_ind_solve_power,
    FTestAnovaPower
 )
 # T-test: What n is needed to detect d = 0.5?
 n_required = tt_ind_solve_power(
    effect_size=0.5,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
 )
 print(f"Required n per group: {n_required:.0f}")
 # ANOVA: What n is needed to detect f = 0.25?
 anova_power = FTestAnovaPower()
 n_per_group = anova_power.solve_power(
    effect_size=0.25,
    ngroups=3,
    alpha=0.05,
    power=0.80
 )
 print(f"Required n per group: {n_per_group:.0f}")
 ```
 ### Sensitivity Analysis (Post-Study)
 Determine what effect size you could detect:
 ```python
 # With n=50 per group, what effect could we detect?
 detectable_d = tt_ind_solve_power(
    effect_size=None,  # Solve for this
    nobs1=50,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
 )
 print(f"Study could detect d ≥ {detectable_d:.2f}")
 ```
 **Note**: Post-hoc power analysis (calculating power after study) is generally not recommended. Use sensitivity analysis instead.
 See `references/effect_sizes_and_power.md` for detailed guidance.
 ---
 ## Reporting Results
 ### APA Style Statistical Reporting
 Follow guidelines in `references/reporting_standards.md`.
 ### Essential Reporting Elements
 1. **Descriptive statistics**: M, SD, n for all groups/variables
 2. **Test statistics**: Test name, statistic, df, exact p-value
 3. **Effect sizes**: With confidence intervals
 4. **Assumption checks**: Which tests were done, results, actions taken
 5. **All planned analyses**: Including non-significant findings
 ### Example Report Templates
 #### Independent T-Test
 ```
 Group A (n = 48, M = 75.2, SD = 8.5) scored significantly higher than
 Group B (n = 52, M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77,
 95% CI [0.36, 1.18], two-tailed. Assumptions of normality (Shapiro-Wilk:
 Group A W = 0.97, p = .18; Group B W = 0.96, p = .12) and homogeneity
 of variance (Levene's F(1, 98) = 1.23, p = .27) were satisfied.
 ```
 #### One-Way ANOVA
 ```
 A one-way ANOVA revealed a significant main effect of treatment condition
 on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc
 comparisons using Tukey's HSD indicated that Condition A (M = 78.2,
 SD = 7.3) scored significantly higher than Condition B (M = 71.5,
 SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9,
 p < .001, d = 1.07). Conditions B and C did not differ significantly
 (p = .52, d = 0.18).
 ```
 #### Multiple Regression
 ```
 Multiple linear regression was conducted to predict exam scores from
 study hours, prior GPA, and attendance. The overall model was significant,
 F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours
 (B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42])
 and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001,
 95% CI [4.66, 12.38]) were significant predictors, while attendance was
 not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]).
 Multicollinearity was not a concern (all VIF < 1.5).
 ```
 #### Bayesian Analysis
 ```
 A Bayesian independent samples t-test was conducted using weakly
 informative priors (Normal(0, 1) for mean difference). The posterior
 distribution indicated that Group A scored higher than Group B
 (M_diff = 6.8, 95% credible interval [3.2, 10.4]). The Bayes Factor
 BF₁₀ = 45.3 provided very strong evidence for a difference between
 groups, with a 99.8% posterior probability that Group A's mean exceeded
 Group B's mean. Convergence diagnostics were satisfactory (all R̂ < 1.01,
 ESS > 1000).
 ```
 ---
 ## Bayesian Statistics
 ### When to Use Bayesian Methods
 Consider Bayesian approaches when:
 - You have prior information to incorporate
 - You want direct probability statements about hypotheses
 - Sample size is small or planning sequential data collection
 - You need to quantify evidence for the null hypothesis
 - The model is complex (hierarchical, missing data)
 See `references/bayesian_statistics.md` for comprehensive guidance on:
 - Bayes' theorem and interpretation
 - Prior specification (informative, weakly informative, non-informative)
 - Bayesian hypothesis testing with Bayes Factors
 - Credible intervals vs. confidence intervals
 - Bayesian t-tests, ANOVA, regression, and hierarchical models
 - Model convergence checking and posterior predictive checks
 ### Key Advantages
 1. **Intuitive interpretation**: "Given the data, there is a 95% probability the parameter is in this interval"
 2. **Evidence for null**: Can quantify support for no effect
 3. **Flexible**: No p-hacking concerns; can analyze data as it arrives
 4. **Uncertainty quantification**: Full posterior distribution
 ---
 ## Resources
 This skill includes comprehensive reference materials:
 ### References Directory
 - **test_selection_guide.md**: Decision tree for choosing appropriate statistical tests
 - **assumptions_and_diagnostics.md**: Detailed guidance on checking and handling assumption violations
 - **effect_sizes_and_power.md**: Calculating, interpreting, and reporting effect sizes; conducting power analyses
 - **bayesian_statistics.md**: Complete guide to Bayesian analysis methods
 - **reporting_standards.md**: APA-style reporting guidelines with examples
 ### Scripts Directory
 - **assumption_checks.py**: Automated assumption checking with visualizations
  - `comprehensive_assumption_check()`: Complete workflow
  - `check_normality()`: Normality testing with Q-Q plots
  - `check_homogeneity_of_variance()`: Levene's test with box plots
  - `check_linearity()`: Regression linearity checks
  - `detect_outliers()`: IQR and z-score outlier detection
 ---
 ## Best Practices
 1. **Pre-register analyses** when possible to distinguish confirmatory from exploratory
 2. **Always check assumptions** before interpreting results
 3. **Report effect sizes** with confidence intervals
 4. **Report all planned analyses** including non-significant results
 5. **Distinguish statistical from practical significance**
 6. **Visualize data** before and after analysis
 7. **Check diagnostics** for regression/ANOVA (residual plots, VIF, etc.)
 8. **Conduct sensitivity analyses** to assess robustness
 9. **Share data and code** for reproducibility
 10. **Be transparent** about violations, transformations, and decisions
 ---
 ## Common Pitfalls to Avoid
 1. **P-hacking**: Don't test multiple ways until something is significant
 2. **HARKing**: Don't present exploratory findings as confirmatory
 3. **Ignoring assumptions**: Check them and report violations
 4. **Confusing significance with importance**: p < .05 ≠ meaningful effect
 5. **Not reporting effect sizes**: Essential for interpretation
 6. **Cherry-picking results**: Report all planned analyses
 7. **Misinterpreting p-values**: They're NOT probability that hypothesis is true
 8. **Multiple comparisons**: Correct for family-wise error when appropriate
 9. **Ignoring missing data**: Understand mechanism (MCAR, MAR, MNAR)
 10. **Overinterpreting non-significant results**: Absence of evidence ≠ evidence of absence
 ---
 ## Getting Started Checklist
 When beginning a statistical analysis:
 - [ ] Define research question and hypotheses
 - [ ] Determine appropriate statistical test (use test_selection_guide.md)
 - [ ] Conduct power analysis to determine sample size
 - [ ] Load and inspect data
 - [ ] Check for missing data and outliers
 - [ ] Verify assumptions using assumption_checks.py
 - [ ] Run primary analysis
 - [ ] Calculate effect sizes with confidence intervals
 - [ ] Conduct post-hoc tests if needed (with corrections)
 - [ ] Create visualizations
 - [ ] Write results following reporting_standards.md
 - [ ] Conduct sensitivity analyses
 - [ ] Share data and code
 ---
 ## Support and Further Reading
 For questions about:
 - **Test selection**: See references/test_selection_guide.md
 - **Assumptions**: See references/assumptions_and_diagnostics.md
 - **Effect sizes**: See references/effect_sizes_and_power.md
 - **Bayesian methods**: See references/bayesian_statistics.md
 - **Reporting**: See references/reporting_standards.md
 **Key textbooks**:
 - Cohen, J. (1988). *Statistical Power Analysis for the Behavioral Sciences*
 - Field, A. (2013). *Discovering Statistics Using IBM SPSS Statistics*
 - Gelman, A., & Hill, J. (2006). *Data Analysis Using Regression and Multilevel/Hierarchical Models*
 - Kruschke, J. K. (2014). *Doing Bayesian Data Analysis*
 **Online resources**:
 - APA Style Guide: https://apastyle.apa.org/
 - Statistical Consulting: Cross Validated (stats.stackexchange.com)
--- a/scientific-thinking/statistical-analysis/references/assumptions_and_diagnostics.md
+++ b/scientific-thinking/statistical-analysis/references/assumptions_and_diagnostics.md
@@ -0,0 +1,369 @@
 # Statistical Assumptions and Diagnostic Procedures
 This document provides comprehensive guidance on checking and validating statistical assumptions for various analyses.
 ## General Principles
 1. **Always check assumptions before interpreting test results**
 2. **Use multiple diagnostic methods** (visual + formal tests)
 3. **Consider robustness**: Some tests are robust to violations under certain conditions
 4. **Document all assumption checks** in analysis reports
 5. **Report violations and remedial actions taken**
 ## Common Assumptions Across Tests
 ### 1. Independence of Observations
 **What it means**: Each observation is independent; measurements on one subject do not influence measurements on another.
 **How to check**:
 - Review study design and data collection procedures
 - For time series: Check autocorrelation (ACF/PACF plots, Durbin-Watson test)
 - For clustered data: Consider intraclass correlation (ICC)
 **What to do if violated**:
 - Use mixed-effects models for clustered/hierarchical data
 - Use time series methods for temporally dependent data
 - Use generalized estimating equations (GEE) for correlated data
 **Critical severity**: HIGH - violations can severely inflate Type I error
 ---
 ### 2. Normality
 **What it means**: Data or residuals follow a normal (Gaussian) distribution.
 **When required**:
 - t-tests (for small samples; robust for n > 30 per group)
 - ANOVA (for small samples; robust for n > 30 per group)
 - Linear regression (for residuals)
 - Some correlation tests (Pearson)
 **How to check**:
 **Visual methods** (primary):
 - Q-Q (quantile-quantile) plot: Points should fall on diagonal line
 - Histogram with normal curve overlay
 - Kernel density plot
 **Formal tests** (secondary):
 - Shapiro-Wilk test (recommended for n < 50)
 - Kolmogorov-Smirnov test
 - Anderson-Darling test
 **Python implementation**:
 ```python
 from scipy import stats
 import matplotlib.pyplot as plt
 # Shapiro-Wilk test
 statistic, p_value = stats.shapiro(data)
 # Q-Q plot
 stats.probplot(data, dist="norm", plot=plt)
 ```
 **Interpretation guidance**:
 - For n < 30: Both visual and formal tests important
 - For 30 ≤ n < 100: Visual inspection primary, formal tests secondary
 - For n ≥ 100: Formal tests overly sensitive; rely on visual inspection
 - Look for severe skewness, outliers, or bimodality
 **What to do if violated**:
 - **Mild violations** (slight skewness): Proceed if n > 30 per group
 - **Moderate violations**: Use non-parametric alternatives (Mann-Whitney, Kruskal-Wallis, Wilcoxon)
 - **Severe violations**:
  - Transform data (log, square root, Box-Cox)
  - Use non-parametric methods
  - Use robust regression methods
  - Consider bootstrapping
 **Critical severity**: MEDIUM - parametric tests are often robust to mild violations with adequate sample size
 ---
 ### 3. Homogeneity of Variance (Homoscedasticity)
 **What it means**: Variances are equal across groups or across the range of predictors.
 **When required**:
 - Independent samples t-test
 - ANOVA
 - Linear regression (constant variance of residuals)
 **How to check**:
 **Visual methods** (primary):
 - Box plots by group (for t-test/ANOVA)
 - Residuals vs. fitted values plot (for regression) - should show random scatter
 - Scale-location plot (square root of standardized residuals vs. fitted)
 **Formal tests** (secondary):
 - Levene's test (robust to non-normality)
 - Bartlett's test (sensitive to non-normality, not recommended)
 - Brown-Forsythe test (median-based version of Levene's)
 - Breusch-Pagan test (for regression)
 **Python implementation**:
 ```python
 from scipy import stats
 import pingouin as pg
 # Levene's test
 statistic, p_value = stats.levene(group1, group2, group3)
 # For regression
 # Breusch-Pagan test
 from statsmodels.stats.diagnostic import het_breuschpagan
 _, p_value, _, _ = het_breuschpagan(residuals, exog)
 ```
 **Interpretation guidance**:
 - Variance ratio (max/min) < 2-3: Generally acceptable
 - For ANOVA: Test is robust if groups have equal sizes
 - For regression: Look for funnel patterns in residual plots
 **What to do if violated**:
 - **t-test**: Use Welch's t-test (does not assume equal variances)
 - **ANOVA**: Use Welch's ANOVA or Brown-Forsythe ANOVA
 - **Regression**:
  - Transform dependent variable (log, square root)
  - Use weighted least squares (WLS)
  - Use robust standard errors (HC3)
  - Use generalized linear models (GLM) with appropriate variance function
 **Critical severity**: MEDIUM - tests can be robust with equal sample sizes
 ---
 ## Test-Specific Assumptions
 ### T-Tests
 **Assumptions**:
 1. Independence of observations
 2. Normality (each group for independent t-test; differences for paired t-test)
 3. Homogeneity of variance (independent t-test only)
 **Diagnostic workflow**:
 ```python
 import scipy.stats as stats
 import pingouin as pg
 # Check normality for each group
 stats.shapiro(group1)
 stats.shapiro(group2)
 # Check homogeneity of variance
 stats.levene(group1, group2)
 # If assumptions violated:
 # Option 1: Welch's t-test (unequal variances)
 pg.ttest(group1, group2, correction=False)  # Welch's
 # Option 2: Non-parametric alternative
 pg.mwu(group1, group2)  # Mann-Whitney U
 ```
 ---
 ### ANOVA
 **Assumptions**:
 1. Independence of observations within and between groups
 2. Normality in each group
 3. Homogeneity of variance across groups
 **Additional considerations**:
 - For repeated measures ANOVA: Sphericity assumption (Mauchly's test)
 **Diagnostic workflow**:
 ```python
 import pingouin as pg
 # Check normality per group
 for group in df['group'].unique():
    data = df[df['group'] == group]['value']
    stats.shapiro(data)
 # Check homogeneity of variance
 pg.homoscedasticity(df, dv='value', group='group')
 # For repeated measures: Check sphericity
 # Automatically tested in pingouin's rm_anova
 ```
 **What to do if sphericity violated** (repeated measures):
 - Greenhouse-Geisser correction (ε < 0.75)
 - Huynh-Feldt correction (ε > 0.75)
 - Use multivariate approach (MANOVA)
 ---
 ### Linear Regression
 **Assumptions**:
 1. **Linearity**: Relationship between X and Y is linear
 2. **Independence**: Residuals are independent
 3. **Homoscedasticity**: Constant variance of residuals
 4. **Normality**: Residuals are normally distributed
 5. **No multicollinearity**: Predictors are not highly correlated (multiple regression)
 **Diagnostic workflow**:
 **1. Linearity**:
 ```python
 import matplotlib.pyplot as plt
 import seaborn as sns
 # Scatter plots of Y vs each X
 # Residuals vs. fitted values (should be randomly scattered)
 plt.scatter(fitted_values, residuals)
 plt.axhline(y=0, color='r', linestyle='--')
 ```
 **2. Independence**:
 ```python
 from statsmodels.stats.stattools import durbin_watson
 # Durbin-Watson test (for time series)
 dw_statistic = durbin_watson(residuals)
 # Values between 1.5-2.5 suggest independence
 ```
 **3. Homoscedasticity**:
 ```python
 # Breusch-Pagan test
 from statsmodels.stats.diagnostic import het_breuschpagan
 _, p_value, _, _ = het_breuschpagan(residuals, exog)
 # Visual: Scale-location plot
 plt.scatter(fitted_values, np.sqrt(np.abs(std_residuals)))
 ```
 **4. Normality of residuals**:
 ```python
 # Q-Q plot of residuals
 stats.probplot(residuals, dist="norm", plot=plt)
 # Shapiro-Wilk test
 stats.shapiro(residuals)
 ```
 **5. Multicollinearity**:
 ```python
 from statsmodels.stats.outliers_influence import variance_inflation_factor
 # Calculate VIF for each predictor
 vif_data = pd.DataFrame()
 vif_data["feature"] = X.columns
 vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(len(X.columns))]
 # VIF > 10 indicates severe multicollinearity
 # VIF > 5 indicates moderate multicollinearity
 ```
 **What to do if violated**:
 - **Non-linearity**: Add polynomial terms, use GAM, or transform variables
 - **Heteroscedasticity**: Transform Y, use WLS, use robust SE
 - **Non-normal residuals**: Transform Y, use robust methods, check for outliers
 - **Multicollinearity**: Remove correlated predictors, use PCA, ridge regression
 ---
 ### Logistic Regression
 **Assumptions**:
 1. **Independence**: Observations are independent
 2. **Linearity**: Linear relationship between log-odds and continuous predictors
 3. **No perfect multicollinearity**: Predictors not perfectly correlated
 4. **Large sample size**: At least 10-20 events per predictor
 **Diagnostic workflow**:
 **1. Linearity of logit**:
 ```python
 # Box-Tidwell test: Add interaction with log of continuous predictor
 # If interaction is significant, linearity violated
 ```
 **2. Multicollinearity**:
 ```python
 # Use VIF as in linear regression
 ```
 **3. Influential observations**:
 ```python
 # Cook's distance, DFBetas, leverage
 from statsmodels.stats.outliers_influence import OLSInfluence
 influence = OLSInfluence(model)
 cooks_d = influence.cooks_distance
 ```
 **4. Model fit**:
 ```python
 # Hosmer-Lemeshow test
 # Pseudo R-squared
 # Classification metrics (accuracy, AUC-ROC)
 ```
 ---
 ## Outlier Detection
 **Methods**:
 1. **Visual**: Box plots, scatter plots
 2. **Statistical**:
   - Z-scores: |z| > 3 suggests outlier
   - IQR method: Values < Q1 - 1.5×IQR or > Q3 + 1.5×IQR
   - Modified Z-score using median absolute deviation (robust to outliers)
 **For regression**:
 - **Leverage**: High leverage points (hat values)
 - **Influence**: Cook's distance > 4/n suggests influential point
 - **Outliers**: Studentized residuals > ±3
 **What to do**:
 1. Investigate data entry errors
 2. Consider if outliers are valid observations
 3. Report sensitivity analysis (results with and without outliers)
 4. Use robust methods if outliers are legitimate
 ---
 ## Sample Size Considerations
 ### Minimum Sample Sizes (Rules of Thumb)
 - **T-test**: n ≥ 30 per group for robustness to non-normality
 - **ANOVA**: n ≥ 30 per group
 - **Correlation**: n ≥ 30 for adequate power
 - **Simple regression**: n ≥ 50
 - **Multiple regression**: n ≥ 10-20 per predictor (minimum 10 + k predictors)
 - **Logistic regression**: n ≥ 10-20 events per predictor
 ### Small Sample Considerations
 For small samples:
 - Assumptions become more critical
 - Use exact tests when available (Fisher's exact, exact logistic regression)
 - Consider non-parametric alternatives
 - Use permutation tests or bootstrap methods
 - Be conservative with interpretation
 ---
 ## Reporting Assumption Checks
 When reporting analyses, include:
 1. **Statement of assumptions checked**: List all assumptions tested
 2. **Methods used**: Describe visual and formal tests employed
 3. **Results of diagnostic tests**: Report test statistics and p-values
 4. **Assessment**: State whether assumptions were met or violated
 5. **Actions taken**: If violated, describe remedial actions (transformations, alternative tests, robust methods)
 **Example reporting statement**:
 > "Normality was assessed using Shapiro-Wilk tests and Q-Q plots. Data for Group A (W = 0.97, p = .18) and Group B (W = 0.96, p = .12) showed no significant departure from normality. Homogeneity of variance was assessed using Levene's test, which was non-significant (F(1, 58) = 1.23, p = .27), indicating equal variances across groups. Therefore, assumptions for the independent samples t-test were satisfied."
--- a/scientific-thinking/statistical-analysis/references/bayesian_statistics.md
+++ b/scientific-thinking/statistical-analysis/references/bayesian_statistics.md
@@ -0,0 +1,661 @@
 # Bayesian Statistical Analysis
 This document provides guidance on conducting and interpreting Bayesian statistical analyses, which offer an alternative framework to frequentist (classical) statistics.
 ## Bayesian vs. Frequentist Philosophy
 ### Fundamental Differences
 | Aspect | Frequentist | Bayesian |
 |--------|-------------|----------|
 | **Probability interpretation** | Long-run frequency of events | Degree of belief/uncertainty |
 | **Parameters** | Fixed but unknown | Random variables with distributions |
 | **Inference** | Based on sampling distributions | Based on posterior distributions |
 | **Primary output** | p-values, confidence intervals | Posterior probabilities, credible intervals |
 | **Prior information** | Not formally incorporated | Explicitly incorporated via priors |
 | **Hypothesis testing** | Reject/fail to reject null | Probability of hypotheses given data |
 | **Sample size** | Often requires minimum | Can work with any sample size |
 | **Interpretation** | Indirect (probability of data given H₀) | Direct (probability of hypothesis given data) |
 ### Key Question Difference
 **Frequentist**: "If the null hypothesis is true, what is the probability of observing data this extreme or more extreme?"
 **Bayesian**: "Given the observed data, what is the probability that the hypothesis is true?"
 The Bayesian question is more intuitive and directly addresses what researchers want to know.
 ---
 ## Bayes' Theorem
 **Formula**:
 ```
 P(θ|D) = P(D|θ) × P(θ) / P(D)
 ```
 **In words**:
 ```
 Posterior = Likelihood × Prior / Evidence
 ```
 Where:
 - **θ (theta)**: Parameter of interest (e.g., mean difference, correlation)
 - **D**: Observed data
 - **P(θ|D)**: Posterior distribution (belief about θ after seeing data)
 - **P(D|θ)**: Likelihood (probability of data given θ)
 - **P(θ)**: Prior distribution (belief about θ before seeing data)
 - **P(D)**: Marginal likelihood/evidence (normalizing constant)
 ---
 ## Prior Distributions
 ### Types of Priors
 #### 1. Informative Priors
 **When to use**: When you have substantial prior knowledge from:
 - Previous studies
 - Expert knowledge
 - Theory
 - Pilot data
 **Example**: Meta-analysis shows effect size d ≈ 0.40, SD = 0.15
 - Prior: Normal(0.40, 0.15)
 **Advantages**:
 - Incorporates existing knowledge
 - More efficient (smaller samples needed)
 - Can stabilize estimates with small data
 **Disadvantages**:
 - Subjective (but subjectivity can be strength)
 - Must be justified and transparent
 - May be controversial if strong prior conflicts with data
 ---
 #### 2. Weakly Informative Priors
 **When to use**: Default choice for most applications
 **Characteristics**:
 - Regularizes estimates (prevents extreme values)
 - Has minimal influence on posterior with moderate data
 - Prevents computational issues
 **Example priors**:
 - Effect size: Normal(0, 1) or Cauchy(0, 0.707)
 - Variance: Half-Cauchy(0, 1)
 - Correlation: Uniform(-1, 1) or Beta(2, 2)
 **Advantages**:
 - Balances objectivity and regularization
 - Computationally stable
 - Broadly acceptable
 ---
 #### 3. Non-Informative (Flat/Uniform) Priors
 **When to use**: When attempting to be "objective"
 **Example**: Uniform(-∞, ∞) for any value
 **⚠️ Caution**:
 - Can lead to improper posteriors
 - May produce non-sensible results
 - Not truly "non-informative" (still makes assumptions)
 - Often not recommended in modern Bayesian practice
 **Better alternative**: Use weakly informative priors
 ---
 ### Prior Sensitivity Analysis
 **Always conduct**: Test how results change with different priors
 **Process**:
 1. Fit model with default/planned prior
 2. Fit model with more diffuse prior
 3. Fit model with more concentrated prior
 4. Compare posterior distributions
 **Reporting**:
 - If results are similar: Evidence is robust
 - If results differ substantially: Data are not strong enough to overwhelm prior
 **Python example**:
 ```python
 import pymc as pm
 # Model with different priors
 priors = [
    ('weakly_informative', pm.Normal.dist(0, 1)),
    ('diffuse', pm.Normal.dist(0, 10)),
    ('informative', pm.Normal.dist(0.5, 0.3))
 ]
 results = {}
 for name, prior in priors:
    with pm.Model():
        effect = pm.Normal('effect', mu=prior.mu, sigma=prior.sigma)
        # ... rest of model
        trace = pm.sample()
        results[name] = trace
 ```
 ---
 ## Bayesian Hypothesis Testing
 ### Bayes Factor (BF)
 **What it is**: Ratio of evidence for two competing hypotheses
 **Formula**:
 ```
 BF₁₀ = P(D|H₁) / P(D|H₀)
 ```
 **Interpretation**:
 | BF₁₀ | Evidence |
 |------|----------|
 | >100 | Decisive for H₁ |
 | 30-100 | Very strong for H₁ |
 | 10-30 | Strong for H₁ |
 | 3-10 | Moderate for H₁ |
 | 1-3 | Anecdotal for H₁ |
 | 1 | No evidence |
 | 1/3-1 | Anecdotal for H₀ |
 | 1/10-1/3 | Moderate for H₀ |
 | 1/30-1/10 | Strong for H₀ |
 | 1/100-1/30 | Very strong for H₀ |
 | <1/100 | Decisive for H₀ |
 **Advantages over p-values**:
 1. Can provide evidence for null hypothesis
 2. Not dependent on sampling intentions (no "peeking" problem)
 3. Directly quantifies evidence
 4. Can be updated with more data
 **Python calculation**:
 ```python
 import pingouin as pg
 # Note: Limited BF support in Python
 # Better options: R packages (BayesFactor), JASP software
 # Approximate BF from t-statistic
 # Using Jeffreys-Zellner-Siow prior
 from scipy import stats
 def bf_from_t(t, n1, n2, r_scale=0.707):
    """
    Approximate Bayes Factor from t-statistic
    r_scale: Cauchy prior scale (default 0.707 for medium effect)
    """
    # This is simplified; use dedicated packages for accurate calculation
    df = n1 + n2 - 2
    # Implementation requires numerical integration
    pass
 ```
 ---
 ### Region of Practical Equivalence (ROPE)
 **Purpose**: Define range of negligible effect sizes
 **Process**:
 1. Define ROPE (e.g., d ∈ [-0.1, 0.1] for negligible effects)
 2. Calculate % of posterior inside ROPE
 3. Make decision:
   - >95% in ROPE: Accept practical equivalence
   - >95% outside ROPE: Reject equivalence
   - Otherwise: Inconclusive
 **Advantage**: Directly tests for practical significance
 **Python example**:
 ```python
 # Define ROPE
 rope_lower, rope_upper = -0.1, 0.1
 # Calculate % of posterior in ROPE
 in_rope = np.mean((posterior_samples > rope_lower) &
                  (posterior_samples < rope_upper))
 print(f"{in_rope*100:.1f}% of posterior in ROPE")
 ```
 ---
 ## Bayesian Estimation
 ### Credible Intervals
 **What it is**: Interval containing parameter with X% probability
 **95% Credible Interval interpretation**:
 > "There is a 95% probability that the true parameter lies in this interval."
 **This is what people THINK confidence intervals mean** (but don't in frequentist framework)
 **Types**:
 #### Equal-Tailed Interval (ETI)
 - 2.5th to 97.5th percentile
 - Simple to calculate
 - May not include mode for skewed distributions
 #### Highest Density Interval (HDI)
 - Narrowest interval containing 95% of distribution
 - Always includes mode
 - Better for skewed distributions
 **Python calculation**:
 ```python
 import arviz as az
 # Equal-tailed interval
 eti = np.percentile(posterior_samples, [2.5, 97.5])
 # HDI
 hdi = az.hdi(posterior_samples, hdi_prob=0.95)
 ```
 ---
 ### Posterior Distributions
 **Interpreting posterior distributions**:
 1. **Central tendency**:
   - Mean: Average posterior value
   - Median: 50th percentile
   - Mode: Most probable value (MAP - Maximum A Posteriori)
 2. **Uncertainty**:
   - SD: Spread of posterior
   - Credible intervals: Quantify uncertainty
 3. **Shape**:
   - Symmetric: Similar to normal
   - Skewed: Asymmetric uncertainty
   - Multimodal: Multiple plausible values
 **Visualization**:
 ```python
 import matplotlib.pyplot as plt
 import arviz as az
 # Posterior plot with HDI
 az.plot_posterior(trace, hdi_prob=0.95)
 # Trace plot (check convergence)
 az.plot_trace(trace)
 # Forest plot (multiple parameters)
 az.plot_forest(trace)
 ```
 ---
 ## Common Bayesian Analyses
 ### Bayesian T-Test
 **Purpose**: Compare two groups (Bayesian alternative to t-test)
 **Outputs**:
 1. Posterior distribution of mean difference
 2. 95% credible interval
 3. Bayes Factor (BF₁₀)
 4. Probability of directional hypothesis (e.g., P(μ₁ > μ₂))
 **Python implementation**:
 ```python
 import pymc as pm
 import arviz as az
 # Bayesian independent samples t-test
 with pm.Model() as model:
    # Priors for group means
    mu1 = pm.Normal('mu1', mu=0, sigma=10)
    mu2 = pm.Normal('mu2', mu=0, sigma=10)
    # Prior for pooled standard deviation
    sigma = pm.HalfNormal('sigma', sigma=10)
    # Likelihood
    y1 = pm.Normal('y1', mu=mu1, sigma=sigma, observed=group1)
    y2 = pm.Normal('y2', mu=mu2, sigma=sigma, observed=group2)
    # Derived quantity: mean difference
    diff = pm.Deterministic('diff', mu1 - mu2)
    # Sample posterior
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)
 # Analyze results
 print(az.summary(trace, var_names=['mu1', 'mu2', 'diff']))
 # Probability that group1 > group2
 prob_greater = np.mean(trace.posterior['diff'].values > 0)
 print(f"P(μ₁ > μ₂) = {prob_greater:.3f}")
 # Plot posterior
 az.plot_posterior(trace, var_names=['diff'], ref_val=0)
 ```
 ---
 ### Bayesian ANOVA
 **Purpose**: Compare three or more groups
 **Model**:
 ```python
 import pymc as pm
 with pm.Model() as anova_model:
    # Hyperpriors
    mu_global = pm.Normal('mu_global', mu=0, sigma=10)
    sigma_between = pm.HalfNormal('sigma_between', sigma=5)
    sigma_within = pm.HalfNormal('sigma_within', sigma=5)
    # Group means (hierarchical)
    group_means = pm.Normal('group_means',
                            mu=mu_global,
                            sigma=sigma_between,
                            shape=n_groups)
    # Likelihood
    y = pm.Normal('y',
                  mu=group_means[group_idx],
                  sigma=sigma_within,
                  observed=data)
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)
 # Posterior contrasts
 contrast_1_2 = trace.posterior['group_means'][:,:,0] - trace.posterior['group_means'][:,:,1]
 ```
 ---
 ### Bayesian Correlation
 **Purpose**: Estimate correlation between two variables
 **Advantage**: Provides distribution of correlation values
 **Python implementation**:
 ```python
 import pymc as pm
 with pm.Model() as corr_model:
    # Prior on correlation
    rho = pm.Uniform('rho', lower=-1, upper=1)
    # Convert to covariance matrix
    cov_matrix = pm.math.stack([[1, rho],
                                [rho, 1]])
    # Likelihood (bivariate normal)
    obs = pm.MvNormal('obs',
                     mu=[0, 0],
                     cov=cov_matrix,
                     observed=np.column_stack([x, y]))
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)
 # Summarize correlation
 print(az.summary(trace, var_names=['rho']))
 # Probability that correlation is positive
 prob_positive = np.mean(trace.posterior['rho'].values > 0)
 ```
 ---
 ### Bayesian Linear Regression
 **Purpose**: Model relationship between predictors and outcome
 **Advantages**:
 - Uncertainty in all parameters
 - Natural regularization (via priors)
 - Can incorporate prior knowledge
 - Credible intervals for predictions
 **Python implementation**:
 ```python
 import pymc as pm
 with pm.Model() as regression_model:
    # Priors for coefficients
    alpha = pm.Normal('alpha', mu=0, sigma=10)  # Intercept
    beta = pm.Normal('beta', mu=0, sigma=10, shape=n_predictors)
    sigma = pm.HalfNormal('sigma', sigma=10)
    # Expected value
    mu = alpha + pm.math.dot(X, beta)
    # Likelihood
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)
 # Posterior predictive checks
 with regression_model:
    ppc = pm.sample_posterior_predictive(trace)
 az.plot_ppc(ppc)
 # Predictions with uncertainty
 with regression_model:
    pm.set_data({'X': X_new})
    posterior_pred = pm.sample_posterior_predictive(trace)
 ```
 ---
 ## Hierarchical (Multilevel) Models
 **When to use**:
 - Nested/clustered data (students within schools)
 - Repeated measures
 - Meta-analysis
 - Varying effects across groups
 **Key concept**: Partial pooling
 - Complete pooling: Ignore groups (biased)
 - No pooling: Analyze groups separately (high variance)
 - Partial pooling: Borrow strength across groups (Bayesian)
 **Example: Varying intercepts**:
 ```python
 with pm.Model() as hierarchical_model:
    # Hyperpriors
    mu_global = pm.Normal('mu_global', mu=0, sigma=10)
    sigma_between = pm.HalfNormal('sigma_between', sigma=5)
    sigma_within = pm.HalfNormal('sigma_within', sigma=5)
    # Group-level intercepts
    alpha = pm.Normal('alpha',
                     mu=mu_global,
                     sigma=sigma_between,
                     shape=n_groups)
    # Likelihood
    y_obs = pm.Normal('y_obs',
                     mu=alpha[group_idx],
                     sigma=sigma_within,
                     observed=y)
    trace = pm.sample()
 ```
 ---
 ## Model Comparison
 ### Methods
 #### 1. Bayes Factor
 - Directly compares model evidence
 - Sensitive to prior specification
 - Can be computationally intensive
 #### 2. Information Criteria
 **WAIC (Widely Applicable Information Criterion)**:
 - Bayesian analog of AIC
 - Lower is better
 - Accounts for effective number of parameters
 **LOO (Leave-One-Out Cross-Validation)**:
 - Estimates out-of-sample prediction error
 - Lower is better
 - More robust than WAIC
 **Python calculation**:
 ```python
 import arviz as az
 # Calculate WAIC and LOO
 waic = az.waic(trace)
 loo = az.loo(trace)
 print(f"WAIC: {waic.elpd_waic:.2f}")
 print(f"LOO: {loo.elpd_loo:.2f}")
 # Compare multiple models
 comparison = az.compare({
    'model1': trace1,
    'model2': trace2,
    'model3': trace3
 })
 print(comparison)
 ```
 ---
 ## Checking Bayesian Models
 ### 1. Convergence Diagnostics
 **R-hat (Gelman-Rubin statistic)**:
 - Compares within-chain and between-chain variance
 - Values close to 1.0 indicate convergence
 - R-hat < 1.01: Good
 - R-hat > 1.05: Poor convergence
 **Effective Sample Size (ESS)**:
 - Number of independent samples
 - Higher is better
 - ESS > 400 per chain recommended
 **Trace plots**:
 - Should look like "fuzzy caterpillar"
 - No trends, no stuck chains
 **Python checking**:
 ```python
 # Automatic summary with diagnostics
 print(az.summary(trace, var_names=['parameter']))
 # Visual diagnostics
 az.plot_trace(trace)
 az.plot_rank(trace)  # Rank plots
 ```
 ---
 ### 2. Posterior Predictive Checks
 **Purpose**: Does model generate data similar to observed data?
 **Process**:
 1. Generate predictions from posterior
 2. Compare to actual data
 3. Look for systematic discrepancies
 **Python implementation**:
 ```python
 with model:
    ppc = pm.sample_posterior_predictive(trace)
 # Visual check
 az.plot_ppc(ppc, num_pp_samples=100)
 # Quantitative checks
 obs_mean = np.mean(observed_data)
 pred_means = [np.mean(sample) for sample in ppc.posterior_predictive['y_obs']]
 p_value = np.mean(pred_means >= obs_mean)  # Bayesian p-value
 ```
 ---
 ## Reporting Bayesian Results
 ### Example T-Test Report
 > "A Bayesian independent samples t-test was conducted to compare groups A and B. Weakly informative priors were used: Normal(0, 1) for the mean difference and Half-Cauchy(0, 1) for the pooled standard deviation. The posterior distribution of the mean difference had a mean of 5.2 (95% CI [2.3, 8.1]), indicating that Group A scored higher than Group B. The Bayes Factor BF₁₀ = 23.5 provided strong evidence for a difference between groups, and there was a 99.7% probability that Group A's mean exceeded Group B's mean."
 ### Example Regression Report
 > "A Bayesian linear regression was fitted with weakly informative priors (Normal(0, 10) for coefficients, Half-Cauchy(0, 5) for residual SD). The model explained substantial variance (R² = 0.47, 95% CI [0.38, 0.55]). Study hours (β = 0.52, 95% CI [0.38, 0.66]) and prior GPA (β = 0.31, 95% CI [0.17, 0.45]) were credible predictors (95% CIs excluded zero). Posterior predictive checks showed good model fit. Convergence diagnostics were satisfactory (all R-hat < 1.01, ESS > 1000)."
 ---
 ## Advantages and Limitations
 ### Advantages
 1. **Intuitive interpretation**: Direct probability statements about parameters
 2. **Incorporates prior knowledge**: Uses all available information
 3. **Flexible**: Handles complex models easily
 4. **No p-hacking**: Can look at data as it arrives
 5. **Quantifies uncertainty**: Full posterior distribution
 6. **Small samples**: Works with any sample size
 ### Limitations
 1. **Computational**: Requires MCMC sampling (can be slow)
 2. **Prior specification**: Requires thought and justification
 3. **Complexity**: Steeper learning curve
 4. **Software**: Fewer tools than frequentist methods
 5. **Communication**: May need to educate reviewers/readers
 ---
 ## Key Python Packages
 - **PyMC**: Full Bayesian modeling framework
 - **ArviZ**: Visualization and diagnostics
 - **Bambi**: High-level interface for regression models
 - **PyStan**: Python interface to Stan
 - **TensorFlow Probability**: Bayesian inference with TensorFlow
 ---
 ## When to Use Bayesian Methods
 **Use Bayesian when**:
 - You have prior information to incorporate
 - You want direct probability statements
 - Sample size is small
 - Model is complex (hierarchical, missing data, etc.)
 - You want to update analysis as data arrives
 **Frequentist may be sufficient when**:
 - Standard analysis with large sample
 - No prior information
 - Computational resources limited
 - Reviewers unfamiliar with Bayesian methods
--- a/scientific-thinking/statistical-analysis/references/effect_sizes_and_power.md
+++ b/scientific-thinking/statistical-analysis/references/effect_sizes_and_power.md
@@ -0,0 +1,581 @@
 # Effect Sizes and Power Analysis
 This document provides guidance on calculating, interpreting, and reporting effect sizes, as well as conducting power analyses for study planning.
 ## Why Effect Sizes Matter
 1. **Statistical significance ≠ practical significance**: p-values only tell if an effect exists, not how large it is
 2. **Sample size dependent**: With large samples, trivial effects become "significant"
 3. **Interpretation**: Effect sizes provide magnitude and practical importance
 4. **Meta-analysis**: Effect sizes enable combining results across studies
 5. **Power analysis**: Required for sample size determination
 **Golden rule**: ALWAYS report effect sizes alongside p-values.
 ---
 ## Effect Sizes by Analysis Type
 ### T-Tests and Mean Differences
 #### Cohen's d (Standardized Mean Difference)
 **Formula**:
 - Independent groups: d = (M₁ - M₂) / SD_pooled
 - Paired groups: d = M_diff / SD_diff
 **Interpretation** (Cohen, 1988):
 - Small: |d| = 0.20
 - Medium: |d| = 0.50
 - Large: |d| = 0.80
 **Context-dependent interpretation**:
 - In education: d = 0.40 is typical for successful interventions
 - In psychology: d = 0.40 is considered meaningful
 - In medicine: Small effect sizes can be clinically important
 **Python calculation**:
 ```python
 import pingouin as pg
 import numpy as np
 # Independent t-test with effect size
 result = pg.ttest(group1, group2, correction=False)
 cohens_d = result['cohen-d'].values[0]
 # Manual calculation
 mean_diff = np.mean(group1) - np.mean(group2)
 pooled_std = np.sqrt((np.var(group1, ddof=1) + np.var(group2, ddof=1)) / 2)
 cohens_d = mean_diff / pooled_std
 # Paired t-test
 result = pg.ttest(pre, post, paired=True)
 cohens_d = result['cohen-d'].values[0]
 ```
 **Confidence intervals for d**:
 ```python
 from pingouin import compute_effsize_from_t
 d, ci = compute_effsize_from_t(t_statistic, nx=n1, ny=n2, eftype='cohen')
 ```
 ---
 #### Hedges' g (Bias-Corrected d)
 **Why use it**: Cohen's d has slight upward bias with small samples (n < 20)
 **Formula**: g = d × correction_factor, where correction_factor = 1 - 3/(4df - 1)
 **Python calculation**:
 ```python
 result = pg.ttest(group1, group2, correction=False)
 hedges_g = result['hedges'].values[0]
 ```
 **Use Hedges' g when**:
 - Sample sizes are small (n < 20 per group)
 - Conducting meta-analyses (standard in meta-analysis)
 ---
 #### Glass's Δ (Delta)
 **When to use**: When one group is a control with known variability
 **Formula**: Δ = (M₁ - M₂) / SD_control
 **Use cases**:
 - Clinical trials (use control group SD)
 - When treatment affects variability
 ---
 ### ANOVA
 #### Eta-squared (η²)
 **What it measures**: Proportion of total variance explained by factor
 **Formula**: η² = SS_effect / SS_total
 **Interpretation**:
 - Small: η² = 0.01 (1% of variance)
 - Medium: η² = 0.06 (6% of variance)
 - Large: η² = 0.14 (14% of variance)
 **Limitation**: Biased with multiple factors (sums to > 1.0)
 **Python calculation**:
 ```python
 import pingouin as pg
 # One-way ANOVA
 aov = pg.anova(dv='value', between='group', data=df)
 eta_squared = aov['SS'][0] / aov['SS'].sum()
 # Or use pingouin directly
 aov = pg.anova(dv='value', between='group', data=df, detailed=True)
 eta_squared = aov['np2'][0]  # Note: pingouin reports partial eta-squared
 ```
 ---
 #### Partial Eta-squared (η²_p)
 **What it measures**: Proportion of variance explained by factor, excluding other factors
 **Formula**: η²_p = SS_effect / (SS_effect + SS_error)
 **Interpretation**: Same benchmarks as η²
 **When to use**: Multi-factor ANOVA (standard in factorial designs)
 **Python calculation**:
 ```python
 aov = pg.anova(dv='value', between=['factor1', 'factor2'], data=df)
 # pingouin reports partial eta-squared by default
 partial_eta_sq = aov['np2']
 ```
 ---
 #### Omega-squared (ω²)
 **What it measures**: Less biased estimate of population variance explained
 **Why use it**: η² overestimates effect size; ω² provides better population estimate
 **Formula**: ω² = (SS_effect - df_effect × MS_error) / (SS_total + MS_error)
 **Interpretation**: Same benchmarks as η², but typically smaller values
 **Python calculation**:
 ```python
 def omega_squared(aov_table):
    ss_effect = aov_table.loc[0, 'SS']
    ss_total = aov_table['SS'].sum()
    ms_error = aov_table.loc[aov_table.index[-1], 'MS']  # Residual MS
    df_effect = aov_table.loc[0, 'DF']
    omega_sq = (ss_effect - df_effect * ms_error) / (ss_total + ms_error)
    return omega_sq
 ```
 ---
 #### Cohen's f
 **What it measures**: Effect size for ANOVA (analogous to Cohen's d)
 **Formula**: f = √(η² / (1 - η²))
 **Interpretation**:
 - Small: f = 0.10
 - Medium: f = 0.25
 - Large: f = 0.40
 **Python calculation**:
 ```python
 eta_squared = 0.06  # From ANOVA
 cohens_f = np.sqrt(eta_squared / (1 - eta_squared))
 ```
 **Use in power analysis**: Required for ANOVA power calculations
 ---
 ### Correlation
 #### Pearson's r / Spearman's ρ
 **Interpretation**:
 - Small: |r| = 0.10
 - Medium: |r| = 0.30
 - Large: |r| = 0.50
 **Important notes**:
 - r² = coefficient of determination (proportion of variance explained)
 - r = 0.30 means 9% shared variance (0.30² = 0.09)
 - Consider direction (positive/negative) and context
 **Python calculation**:
 ```python
 import pingouin as pg
 # Pearson correlation with CI
 result = pg.corr(x, y, method='pearson')
 r = result['r'].values[0]
 ci = [result['CI95%'][0][0], result['CI95%'][0][1]]
 # Spearman correlation
 result = pg.corr(x, y, method='spearman')
 rho = result['r'].values[0]
 ```
 ---
 ### Regression
 #### R² (Coefficient of Determination)
 **What it measures**: Proportion of variance in Y explained by model
 **Interpretation**:
 - Small: R² = 0.02
 - Medium: R² = 0.13
 - Large: R² = 0.26
 **Context-dependent**:
 - Physical sciences: R² > 0.90 expected
 - Social sciences: R² > 0.30 considered good
 - Behavior prediction: R² > 0.10 may be meaningful
 **Python calculation**:
 ```python
 from sklearn.metrics import r2_score
 from statsmodels.api import OLS
 # Using statsmodels
 model = OLS(y, X).fit()
 r_squared = model.rsquared
 adjusted_r_squared = model.rsquared_adj
 # Manual
 r_squared = 1 - (SS_residual / SS_total)
 ```
 ---
 #### Adjusted R²
 **Why use it**: R² artificially increases when adding predictors; adjusted R² penalizes model complexity
 **Formula**: R²_adj = 1 - (1 - R²) × (n - 1) / (n - k - 1)
 **When to use**: Always report alongside R² for multiple regression
 ---
 #### Standardized Regression Coefficients (β)
 **What it measures**: Effect of one-SD change in predictor on outcome (in SD units)
 **Interpretation**: Similar to Cohen's d
 - Small: |β| = 0.10
 - Medium: |β| = 0.30
 - Large: |β| = 0.50
 **Python calculation**:
 ```python
 from scipy import stats
 # Standardize variables first
 X_std = (X - X.mean()) / X.std()
 y_std = (y - y.mean()) / y.std()
 model = OLS(y_std, X_std).fit()
 beta = model.params
 ```
 ---
 #### f² (Cohen's f-squared for Regression)
 **What it measures**: Effect size for individual predictors or model comparison
 **Formula**: f² = R²_AB - R²_A / (1 - R²_AB)
 Where:
 - R²_AB = R² for full model with predictor
 - R²_A = R² for reduced model without predictor
 **Interpretation**:
 - Small: f² = 0.02
 - Medium: f² = 0.15
 - Large: f² = 0.35
 **Python calculation**:
 ```python
 # Compare two nested models
 model_full = OLS(y, X_full).fit()
 model_reduced = OLS(y, X_reduced).fit()
 r2_full = model_full.rsquared
 r2_reduced = model_reduced.rsquared
 f_squared = (r2_full - r2_reduced) / (1 - r2_full)
 ```
 ---
 ### Categorical Data Analysis
 #### Cramér's V
 **What it measures**: Association strength for χ² test (works for any table size)
 **Formula**: V = √(χ² / (n × (k - 1)))
 Where k = min(rows, columns)
 **Interpretation** (for k > 2):
 - Small: V = 0.07
 - Medium: V = 0.21
 - Large: V = 0.35
 **For 2×2 tables**: Use phi coefficient (φ)
 **Python calculation**:
 ```python
 from scipy.stats.contingency import association
 # Cramér's V
 cramers_v = association(contingency_table, method='cramer')
 # Phi coefficient (for 2x2)
 phi = association(contingency_table, method='pearson')
 ```
 ---
 #### Odds Ratio (OR) and Risk Ratio (RR)
 **For 2×2 contingency tables**:
 |           | Outcome + | Outcome - |
 |-----------|-----------|-----------|
 | Exposed   | a         | b         |
 | Unexposed | c         | d         |
 **Odds Ratio**: OR = (a/b) / (c/d) = ad / bc
 **Interpretation**:
 - OR = 1: No association
 - OR > 1: Positive association (increased odds)
 - OR < 1: Negative association (decreased odds)
 - OR = 2: Twice the odds
 - OR = 0.5: Half the odds
 **Risk Ratio**: RR = (a/(a+b)) / (c/(c+d))
 **When to use**:
 - Cohort studies: Use RR (more interpretable)
 - Case-control studies: Use OR (RR not available)
 - Logistic regression: OR is natural output
 **Python calculation**:
 ```python
 import statsmodels.api as sm
 # From contingency table
 odds_ratio = (a * d) / (b * c)
 # Confidence interval
 table = np.array([[a, b], [c, d]])
 oddsratio, pvalue = stats.fisher_exact(table)
 # From logistic regression
 model = sm.Logit(y, X).fit()
 odds_ratios = np.exp(model.params)  # Exponentiate coefficients
 ci = np.exp(model.conf_int())  # Exponentiate CIs
 ```
 ---
 ### Bayesian Effect Sizes
 #### Bayes Factor (BF)
 **What it measures**: Ratio of evidence for alternative vs. null hypothesis
 **Interpretation**:
 - BF₁₀ = 1: Equal evidence for H₁ and H₀
 - BF₁₀ = 3: H₁ is 3× more likely than H₀ (moderate evidence)
 - BF₁₀ = 10: H₁ is 10× more likely than H₀ (strong evidence)
 - BF₁₀ = 100: H₁ is 100× more likely than H₀ (decisive evidence)
 - BF₁₀ = 0.33: H₀ is 3× more likely than H₁
 - BF₁₀ = 0.10: H₀ is 10× more likely than H₁
 **Classification** (Jeffreys, 1961):
 - 1-3: Anecdotal evidence
 - 3-10: Moderate evidence
 - 10-30: Strong evidence
 - 30-100: Very strong evidence
 - >100: Decisive evidence
 **Python calculation**:
 ```python
 import pingouin as pg
 # Bayesian t-test
 result = pg.ttest(group1, group2, correction=False)
 # Note: pingouin doesn't include BF; use other packages
 # Using JASP or BayesFactor (R) via rpy2
 # Or implement using numerical integration
 ```
 ---
 ## Power Analysis
 ### Concepts
 **Statistical power**: Probability of detecting an effect if it exists (1 - β)
 **Conventional standards**:
 - Power = 0.80 (80% chance of detecting effect)
 - α = 0.05 (5% Type I error rate)
 **Four interconnected parameters** (given 3, can solve for 4th):
 1. Sample size (n)
 2. Effect size (d, f, etc.)
 3. Significance level (α)
 4. Power (1 - β)
 ---
 ### A Priori Power Analysis (Planning)
 **Purpose**: Determine required sample size before study
 **Steps**:
 1. Specify expected effect size (from literature, pilot data, or minimum meaningful effect)
 2. Set α level (typically 0.05)
 3. Set desired power (typically 0.80)
 4. Calculate required n
 **Python implementation**:
 ```python
 from statsmodels.stats.power import (
    tt_ind_solve_power,
    zt_ind_solve_power,
    FTestAnovaPower,
    NormalIndPower
 )
 # T-test power analysis
 n_required = tt_ind_solve_power(
    effect_size=0.5,  # Cohen's d
    alpha=0.05,
    power=0.80,
    ratio=1.0,  # Equal group sizes
    alternative='two-sided'
 )
 # ANOVA power analysis
 anova_power = FTestAnovaPower()
 n_per_group = anova_power.solve_power(
    effect_size=0.25,  # Cohen's f
    ngroups=3,
    alpha=0.05,
    power=0.80
 )
 # Correlation power analysis
 from pingouin import power_corr
 n_required = power_corr(r=0.30, power=0.80, alpha=0.05)
 ```
 ---
 ### Post Hoc Power Analysis (After Study)
 **⚠️ CAUTION**: Post hoc power is controversial and often not recommended
 **Why it's problematic**:
 - Observed power is a direct function of p-value
 - If p > 0.05, power is always low
 - Provides no additional information beyond p-value
 - Can be misleading
 **When it might be acceptable**:
 - Study planning for future research
 - Using effect size from multiple studies (not just your own)
 - Explicit goal is sample size for replication
 **Better alternatives**:
 - Report confidence intervals for effect sizes
 - Conduct sensitivity analysis
 - Report minimum detectable effect size
 ---
 ### Sensitivity Analysis
 **Purpose**: Determine minimum detectable effect size given study parameters
 **When to use**: After study is complete, to understand study's capability
 **Python implementation**:
 ```python
 # What effect size could we detect with n=50 per group?
 detectable_effect = tt_ind_solve_power(
    effect_size=None,  # Solve for this
    nobs1=50,
    alpha=0.05,
    power=0.80,
    ratio=1.0,
    alternative='two-sided'
 )
 print(f"With n=50 per group, we could detect d ≥ {detectable_effect:.2f}")
 ```
 ---
 ## Reporting Effect Sizes
 ### APA Style Guidelines
 **T-test example**:
 > "Group A (M = 75.2, SD = 8.5) scored significantly higher than Group B (M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77, 95% CI [0.36, 1.18]."
 **ANOVA example**:
 > "There was a significant main effect of treatment condition on test scores, F(2, 87) = 8.45, p < .001, η²p = .16. Post hoc comparisons using Tukey's HSD revealed..."
 **Correlation example**:
 > "There was a moderate positive correlation between study time and exam scores, r(148) = .42, p < .001, 95% CI [.27, .55]."
 **Regression example**:
 > "The regression model significantly predicted exam scores, F(3, 146) = 45.2, p < .001, R² = .48. Study hours (β = .52, p < .001) and prior GPA (β = .31, p < .001) were significant predictors."
 **Bayesian example**:
 > "A Bayesian independent samples t-test provided strong evidence for a difference between groups, BF₁₀ = 23.5, indicating the data are 23.5 times more likely under H₁ than H₀."
 ---
 ## Effect Size Pitfalls
 1. **Don't only rely on benchmarks**: Context matters; small effects can be meaningful
 2. **Report confidence intervals**: CIs show precision of effect size estimate
 3. **Distinguish statistical vs. practical significance**: Large n can make trivial effects "significant"
 4. **Consider cost-benefit**: Even small effects may be valuable if intervention is low-cost
 5. **Multiple outcomes**: Effect sizes vary across outcomes; report all
 6. **Don't cherry-pick**: Report effects for all planned analyses
 7. **Publication bias**: Published effects are often overestimated
 ---
 ## Quick Reference Table
 | Analysis | Effect Size | Small | Medium | Large |
 |----------|-------------|-------|--------|-------|
 | T-test | Cohen's d | 0.20 | 0.50 | 0.80 |
 | ANOVA | η², ω² | 0.01 | 0.06 | 0.14 |
 | ANOVA | Cohen's f | 0.10 | 0.25 | 0.40 |
 | Correlation | r, ρ | 0.10 | 0.30 | 0.50 |
 | Regression | R² | 0.02 | 0.13 | 0.26 |
 | Regression | f² | 0.02 | 0.15 | 0.35 |
 | Chi-square | Cramér's V | 0.07 | 0.21 | 0.35 |
 | Chi-square (2×2) | φ | 0.10 | 0.30 | 0.50 |
 ---
 ## Resources
 - Cohen, J. (1988). *Statistical Power Analysis for the Behavioral Sciences* (2nd ed.)
 - Lakens, D. (2013). Calculating and reporting effect sizes
 - Ellis, P. D. (2010). *The Essential Guide to Effect Sizes*
--- a/scientific-thinking/statistical-analysis/references/reporting_standards.md
+++ b/scientific-thinking/statistical-analysis/references/reporting_standards.md
@@ -0,0 +1,469 @@
 # Statistical Reporting Standards
 This document provides guidelines for reporting statistical analyses according to APA (American Psychological Association) style and general best practices for academic publications.
 ## General Principles
 1. **Transparency**: Report enough detail for replication
 2. **Completeness**: Include all planned analyses and outcomes
 3. **Honesty**: Report non-significant findings and violations
 4. **Clarity**: Write for your audience, define technical terms
 5. **Reproducibility**: Provide code, data, or supplements when possible
 ---
 ## Pre-Registration and Planning
 ### What to Report (Ideally Before Data Collection)
 1. **Hypotheses**: Clearly stated, directional when appropriate
 2. **Sample size justification**: Power analysis or other rationale
 3. **Data collection stopping rule**: When will you stop collecting data?
 4. **Variables**: All variables collected (not just those analyzed)
 5. **Exclusion criteria**: Rules for excluding participants/data points
 6. **Statistical analyses**: Planned tests, including:
   - Primary analysis
   - Secondary analyses
   - Exploratory analyses (labeled as such)
   - Handling of missing data
   - Multiple comparison corrections
   - Assumption checks
 **Why pre-register?**
 - Prevents HARKing (Hypothesizing After Results are Known)
 - Distinguishes confirmatory from exploratory analyses
 - Increases credibility and reproducibility
 **Platforms**: OSF, AsPredicted, ClinicalTrials.gov
 ---
 ## Methods Section
 ### Participants
 **What to report**:
 - Total N, including excluded participants
 - Relevant demographics (age, gender, etc.)
 - Recruitment method
 - Inclusion/exclusion criteria
 - Attrition/dropout with reasons
 **Example**:
 > "Participants were 150 undergraduate students (98 female, 52 male; M_age = 19.4 years, SD = 1.2, range 18-24) recruited from psychology courses in exchange for course credit. Five participants were excluded due to incomplete data (n = 3) or failing attention checks (n = 2), resulting in a final sample of 145."
 ### Design
 **What to report**:
 - Study design (between-subjects, within-subjects, mixed)
 - Independent variables and levels
 - Dependent variables
 - Control variables/covariates
 - Randomization procedure
 - Blinding (single-blind, double-blind)
 **Example**:
 > "A 2 (feedback: positive vs. negative) × 2 (timing: immediate vs. delayed) between-subjects factorial design was used. Participants were randomly assigned to conditions using a computer-generated randomization sequence. The primary outcome was task performance measured as number of correct responses (0-20 scale)."
 ### Measures
 **What to report**:
 - Full name of measure/instrument
 - Number of items
 - Scale/response format
 - Scoring method
 - Reliability (Cronbach's α, ICC, etc.)
 - Validity evidence (if applicable)
 **Example**:
 > "Depression was assessed using the Beck Depression Inventory-II (BDI-II; Beck et al., 1996), a 21-item self-report measure rated on a 4-point scale (0-3). Total scores range from 0 to 63, with higher scores indicating greater depression severity. The BDI-II demonstrated excellent internal consistency in this sample (α = .91)."
 ### Procedure
 **What to report**:
 - Step-by-step description of what participants did
 - Timing and duration
 - Instructions given
 - Any manipulations or interventions
 **Example**:
 > "Participants completed the study online via Qualtrics. After providing informed consent, they completed demographic questions, were randomly assigned to one of four conditions, completed the experimental task (approximately 15 minutes), and finished with the outcome measures and debriefing. The entire session lasted approximately 30 minutes."
 ### Data Analysis
 **What to report**:
 - Software used (with version)
 - Significance level (α)
 - Tail(s) of tests (one-tailed or two-tailed)
 - Assumption checks conducted
 - Missing data handling
 - Outlier treatment
 - Multiple comparison corrections
 - Effect size measures used
 **Example**:
 > "All analyses were conducted using Python 3.10 with scipy 1.11 and statsmodels 0.14. An alpha level of .05 was used for all significance tests. Assumptions of normality and homogeneity of variance were assessed using Shapiro-Wilk and Levene's tests, respectively. Missing data (< 2% for all variables) were handled using listwise deletion. Outliers beyond 3 SD from the mean were winsorized. For the primary ANOVA, partial eta-squared (η²_p) is reported as the effect size measure. Post hoc comparisons used Tukey's HSD to control family-wise error rate."
 ---
 ## Results Section
 ### Descriptive Statistics
 **What to report**:
 - Sample size (for each group if applicable)
 - Measures of central tendency (M, Mdn)
 - Measures of variability (SD, IQR, range)
 - Confidence intervals (when appropriate)
 **Example (continuous outcome)**:
 > "Group A (n = 48) had a mean score of 75.2 (SD = 8.5, 95% CI [72.7, 77.7]), while Group B (n = 52) scored 68.3 (SD = 9.2, 95% CI [65.7, 70.9])."
 **Example (categorical outcome)**:
 > "Of the 145 participants, 89 (61.4%) chose Option A, 42 (29.0%) chose Option B, and 14 (9.7%) chose Option C."
 **Tables for descriptive statistics**:
 - Use tables for multiple variables or groups
 - Include M, SD, and n (minimum)
 - Can include range, skewness, kurtosis if relevant
 ---
 ### Assumption Checks
 **What to report**:
 - Which assumptions were tested
 - Results of diagnostic tests
 - Whether assumptions were met
 - Actions taken if violated
 **Example**:
 > "Normality was assessed using Shapiro-Wilk tests. Data for Group A (W = 0.97, p = .18) and Group B (W = 0.96, p = .12) did not significantly deviate from normality. Levene's test indicated homogeneity of variance, F(1, 98) = 1.23, p = .27. Therefore, assumptions for the independent samples t-test were satisfied."
 **Example (violated)**:
 > "Shapiro-Wilk tests indicated significant departure from normality for Group C (W = 0.89, p = .003). Therefore, the non-parametric Mann-Whitney U test was used instead of the independent samples t-test."
 ---
 ### Inferential Statistics
 #### T-Tests
 **What to report**:
 - Test statistic (t)
 - Degrees of freedom
 - p-value (exact if p > .001, otherwise p < .001)
 - Effect size (Cohen's d or Hedges' g) with CI
 - Direction of effect
 - Whether test was one- or two-tailed
 **Format**: t(df) = value, p = value, d = value, 95% CI [lower, upper]
 **Example (independent t-test)**:
 > "Group A (M = 75.2, SD = 8.5) scored significantly higher than Group B (M = 68.3, SD = 9.2), t(98) = 3.82, p < .001, d = 0.77, 95% CI [0.36, 1.18], two-tailed."
 **Example (paired t-test)**:
 > "Scores increased significantly from pretest (M = 65.4, SD = 10.2) to posttest (M = 71.8, SD = 9.7), t(49) = 4.21, p < .001, d = 0.64, 95% CI [0.33, 0.95]."
 **Example (Welch's t-test)**:
 > "Due to unequal variances, Welch's t-test was used. Group A scored significantly higher than Group B, t(94.3) = 3.65, p < .001, d = 0.74."
 **Example (non-significant)**:
 > "There was no significant difference between Group A (M = 72.1, SD = 8.3) and Group B (M = 70.5, SD = 8.9), t(98) = 0.91, p = .36, d = 0.18, 95% CI [-0.21, 0.57]."
 ---
 #### ANOVA
 **What to report**:
 - F statistic
 - Degrees of freedom (effect, error)
 - p-value
 - Effect size (η², η²_p, or ω²)
 - Means and SDs for all groups
 - Post hoc test results (if significant)
 **Format**: F(df_effect, df_error) = value, p = value, η²_p = value
 **Example (one-way ANOVA)**:
 > "There was a significant main effect of treatment condition on test scores, F(2, 147) = 8.45, p < .001, η²_p = .10. Post hoc comparisons using Tukey's HSD revealed that Condition A (M = 78.2, SD = 7.3) scored significantly higher than Condition B (M = 71.5, SD = 8.1, p = .002, d = 0.87) and Condition C (M = 70.1, SD = 7.9, p < .001, d = 1.07). Conditions B and C did not differ significantly (p = .52, d = 0.18)."
 **Example (factorial ANOVA)**:
 > "A 2 (feedback: positive vs. negative) × 2 (timing: immediate vs. delayed) between-subjects ANOVA revealed a significant main effect of feedback, F(1, 146) = 12.34, p < .001, η²_p = .08, but no significant main effect of timing, F(1, 146) = 2.10, p = .15, η²_p = .01. Critically, the interaction was significant, F(1, 146) = 6.78, p = .01, η²_p = .04. Simple effects analysis showed that positive feedback improved performance for immediate timing (M_diff = 8.2, p < .001) but not for delayed timing (M_diff = 1.3, p = .42)."
 **Example (repeated measures ANOVA)**:
 > "A one-way repeated measures ANOVA revealed a significant effect of time point on anxiety scores, F(2, 98) = 15.67, p < .001, η²_p = .24. Mauchly's test indicated that the assumption of sphericity was violated, χ²(2) = 8.45, p = .01, therefore Greenhouse-Geisser corrected values are reported (ε = 0.87). Pairwise comparisons with Bonferroni correction showed..."
 ---
 #### Correlation
 **What to report**:
 - Correlation coefficient (r or ρ)
 - Sample size
 - p-value
 - Direction and strength
 - Confidence interval
 - Coefficient of determination (r²) if relevant
 **Format**: r(df) = value, p = value, 95% CI [lower, upper]
 **Example (Pearson)**:
 > "There was a moderate positive correlation between study time and exam score, r(148) = .42, p < .001, 95% CI [.27, .55], indicating that 18% of the variance in exam scores was shared with study time (r² = .18)."
 **Example (Spearman)**:
 > "A Spearman rank-order correlation revealed a significant positive association between class rank and motivation, ρ(118) = .38, p < .001, 95% CI [.21, .52]."
 **Example (non-significant)**:
 > "There was no significant correlation between age and reaction time, r(98) = -.12, p = .23, 95% CI [-.31, .08]."
 ---
 #### Regression
 **What to report**:
 - Overall model fit (R², adjusted R², F-test)
 - Coefficients (B, SE, β, t, p) for each predictor
 - Effect sizes
 - Confidence intervals for coefficients
 - Variance inflation factors (if multicollinearity assessed)
 **Format**: B = value, SE = value, β = value, t = value, p = value, 95% CI [lower, upper]
 **Example (simple regression)**:
 > "Simple linear regression showed that study hours significantly predicted exam scores, F(1, 148) = 42.5, p < .001, R² = .22. Specifically, each additional hour of study was associated with a 2.4-point increase in exam score (B = 2.40, SE = 0.37, β = .47, t = 6.52, p < .001, 95% CI [1.67, 3.13])."
 **Example (multiple regression)**:
 > "Multiple linear regression was conducted to predict exam scores from study hours, prior GPA, and attendance. The overall model was significant, F(3, 146) = 45.2, p < .001, R² = .48, adjusted R² = .47. Study hours (B = 1.80, SE = 0.31, β = .35, t = 5.78, p < .001, 95% CI [1.18, 2.42]) and prior GPA (B = 8.52, SE = 1.95, β = .28, t = 4.37, p < .001, 95% CI [4.66, 12.38]) were significant predictors, but attendance was not (B = 0.15, SE = 0.12, β = .08, t = 1.25, p = .21, 95% CI [-0.09, 0.39]). Multicollinearity was not a concern, as all VIF values were below 1.5."
 **Example (logistic regression)**:
 > "Logistic regression was conducted to predict pass/fail status from study hours. The overall model was significant, χ²(1) = 28.7, p < .001, Nagelkerke R² = .31. Each additional study hour increased the odds of passing by 1.35 times (OR = 1.35, 95% CI [1.18, 1.54], p < .001). The model correctly classified 76% of cases (sensitivity = 81%, specificity = 68%)."
 ---
 #### Chi-Square Tests
 **What to report**:
 - χ² statistic
 - Degrees of freedom
 - p-value
 - Effect size (Cramér's V or φ)
 - Observed and expected frequencies (or percentages)
 **Format**: χ²(df, N = total) = value, p = value, Cramér's V = value
 **Example (2×2)**:
 > "A chi-square test of independence revealed a significant association between treatment group and outcome, χ²(1, N = 150) = 8.45, p = .004, φ = .24. Specifically, 72% of participants in the treatment group improved compared to 48% in the control group."
 **Example (larger table)**:
 > "A chi-square test examined the relationship between education level (high school, bachelor's, graduate) and political affiliation (liberal, moderate, conservative). The association was significant, χ²(4, N = 300) = 18.7, p = .001, Cramér's V = .18, indicating a small to moderate association."
 **Example (Fisher's exact)**:
 > "Due to expected cell counts below 5, Fisher's exact test was used. The association between treatment and outcome was significant, p = .018 (two-tailed), OR = 3.42, 95% CI [1.21, 9.64]."
 ---
 #### Non-Parametric Tests
 **Mann-Whitney U**:
 > "A Mann-Whitney U test indicated that Group A (Mdn = 75, IQR = 10) had significantly higher scores than Group B (Mdn = 68, IQR = 12), U = 845, z = 3.21, p = .001, r = .32."
 **Wilcoxon signed-rank**:
 > "A Wilcoxon signed-rank test showed that scores increased significantly from pretest (Mdn = 65, IQR = 15) to posttest (Mdn = 72, IQR = 14), z = 3.89, p < .001, r = .39."
 **Kruskal-Wallis**:
 > "A Kruskal-Wallis test revealed significant differences among the three conditions, H(2) = 15.7, p < .001, η² = .09. Follow-up pairwise comparisons with Bonferroni correction showed..."
 ---
 #### Bayesian Statistics
 **What to report**:
 - Prior distributions used
 - Posterior estimates (mean/median, credible intervals)
 - Bayes Factor (if hypothesis testing)
 - Convergence diagnostics (R-hat, ESS)
 - Posterior predictive checks
 **Example (Bayesian t-test)**:
 > "A Bayesian independent samples t-test was conducted using weakly informative priors (Normal(0, 1) for mean difference). The posterior distribution of the mean difference had a mean of 6.8 (95% credible interval [3.2, 10.4]), indicating that Group A scored higher than Group B. The Bayes Factor BF₁₀ = 45.3 provided very strong evidence for a difference between groups. There was a 99.8% posterior probability that Group A's mean exceeded Group B's mean."
 **Example (Bayesian regression)**:
 > "A Bayesian linear regression was fitted with weakly informative priors (Normal(0, 10) for coefficients, Half-Cauchy(0, 5) for residual SD). The model showed that study hours credibly predicted exam scores (β = 0.52, 95% CI [0.38, 0.66]; 0 not included in interval). All convergence diagnostics were satisfactory (R-hat < 1.01, ESS > 1000 for all parameters). Posterior predictive checks indicated adequate model fit."
 ---
 ## Effect Sizes
 ### Always Report
 **Why**:
 - p-values don't indicate magnitude
 - Required by APA and most journals
 - Essential for meta-analysis
 - Informs practical significance
 **Which effect size?**
 - T-tests: Cohen's d or Hedges' g
 - ANOVA: η², η²_p, or ω²
 - Correlation: r (already is an effect size)
 - Regression: β (standardized), R², f²
 - Chi-square: Cramér's V or φ
 **With confidence intervals**:
 - Always report CIs for effect sizes when possible
 - Shows precision of estimate
 - More informative than point estimate alone
 ---
 ## Figures and Tables
 ### When to Use Tables vs. Figures
 **Tables**:
 - Exact values needed
 - Many variables/conditions
 - Descriptive statistics
 - Regression coefficients
 - Correlation matrices
 **Figures**:
 - Patterns and trends
 - Distributions
 - Interactions
 - Comparisons across groups
 - Time series
 ### Figure Guidelines
 **General**:
 - Clear, readable labels
 - Sufficient font size (≥ 10pt)
 - High resolution (≥ 300 dpi for publications)
 - Monochrome-friendly (colorblind-accessible)
 - Error bars (SE or 95% CI; specify which!)
 - Legend when needed
 **Common figure types**:
 - Bar charts: Group comparisons (include error bars)
 - Box plots: Distributions, outliers
 - Scatter plots: Correlations, relationships
 - Line graphs: Change over time, interactions
 - Violin plots: Distributions (better than box plots)
 **Example figure caption**:
 > "Figure 1. Mean exam scores by study condition. Error bars represent 95% confidence intervals. * p < .05, ** p < .01, *** p < .001."
 ### Table Guidelines
 **General**:
 - Clear column and row labels
 - Consistent decimal places (usually 2)
 - Horizontal lines only (not vertical)
 - Notes below table for clarifications
 - Statistical symbols in italics (p, M, SD, F, t, r)
 **Example table**:
 **Table 1**
 *Descriptive Statistics and Intercorrelations*
 | Variable | M | SD | 1 | 2 | 3 |
 |----------|---|----|----|----|----|
 | 1. Study hours | 5.2 | 2.1 | — | | |
 | 2. Prior GPA | 3.1 | 0.5 | .42** | — | |
 | 3. Exam score | 75.3 | 10.2 | .47*** | .52*** | — |
 *Note*. N = 150. ** p < .01. *** p < .001.
 ---
 ## Common Mistakes to Avoid
 1. **Reporting p = .000**: Report p < .001 instead
 2. **Omitting effect sizes**: Always include them
 3. **Not reporting assumption checks**: Describe tests and outcomes
 4. **Confusing statistical and practical significance**: Discuss both
 5. **Only reporting significant results**: Report all planned analyses
 6. **Using "prove" or "confirm"**: Use "support" or "consistent with"
 7. **Saying "marginally significant" for .05 < p < .10**: Either significant or not
 8. **Reporting only one decimal for p-values**: Use two (p = .03, not p = .0)
 9. **Not specifying one- vs. two-tailed**: Always clarify
 10. **Inconsistent rounding**: Be consistent throughout
 ---
 ## Null Results
 ### How to Report Non-Significant Findings
 **Don't say**:
 - "There was no effect"
 - "X and Y are unrelated"
 - "Groups are equivalent"
 **Do say**:
 - "There was no significant difference"
 - "The effect was not statistically significant"
 - "We did not find evidence for a relationship"
 **Include**:
 - Exact p-value (not just "ns" or "p > .05")
 - Effect size (shows magnitude even if not significant)
 - Confidence interval (may include meaningful values)
 - Power analysis (was study adequately powered?)
 **Example**:
 > "Contrary to our hypothesis, there was no significant difference in creativity scores between the music (M = 72.1, SD = 8.3) and silence (M = 70.5, SD = 8.9) conditions, t(98) = 0.91, p = .36, d = 0.18, 95% CI [-0.21, 0.57]. A post hoc sensitivity analysis revealed that the study had 80% power to detect an effect of d = 0.57 or larger, suggesting the null finding may reflect insufficient power to detect small effects."
 ---
 ## Reproducibility
 ### Materials to Share
 1. **Data**: De-identified raw data (or aggregate if sensitive)
 2. **Code**: Analysis scripts
 3. **Materials**: Stimuli, measures, protocols
 4. **Supplements**: Additional analyses, tables
 **Where to share**:
 - Open Science Framework (OSF)
 - GitHub (for code)
 - Journal supplements
 - Institutional repository
 **In paper**:
 > "Data, analysis code, and materials are available at https://osf.io/xxxxx/"
 ---
 ## Checklist for Statistical Reporting
 - [ ] Sample size and demographics
 - [ ] Study design clearly described
 - [ ] All measures described with reliability
 - [ ] Procedure detailed
 - [ ] Software and versions specified
 - [ ] Alpha level stated
 - [ ] Assumption checks reported
 - [ ] Descriptive statistics (M, SD, n)
 - [ ] Test statistics with df and p-values
 - [ ] Effect sizes with confidence intervals
 - [ ] All planned analyses reported (including non-significant)
 - [ ] Figures/tables properly formatted and labeled
 - [ ] Multiple comparisons corrections described
 - [ ] Missing data handling explained
 - [ ] Limitations discussed
 - [ ] Data/code availability statement
 ---
 ## Additional Resources
 - APA Publication Manual (7th edition)
 - CONSORT guidelines (for RCTs)
 - STROBE guidelines (for observational studies)
 - PRISMA guidelines (for systematic reviews/meta-analyses)
 - Wilkinson & Task Force on Statistical Inference (1999). Statistical methods in psychology journals.
--- a/scientific-thinking/statistical-analysis/references/test_selection_guide.md
+++ b/scientific-thinking/statistical-analysis/references/test_selection_guide.md
@@ -0,0 +1,129 @@
 # Statistical Test Selection Guide
 This guide provides a decision tree for selecting appropriate statistical tests based on research questions, data types, and study designs.
 ## Decision Tree for Test Selection
 ### 1. Comparing Groups
 #### Two Independent Groups
 - **Continuous outcome, normally distributed**: Independent samples t-test
 - **Continuous outcome, non-normal**: Mann-Whitney U test (Wilcoxon rank-sum test)
 - **Binary outcome**: Chi-square test or Fisher's exact test (if expected counts < 5)
 - **Ordinal outcome**: Mann-Whitney U test
 #### Two Paired/Dependent Groups
 - **Continuous outcome, normally distributed**: Paired t-test
 - **Continuous outcome, non-normal**: Wilcoxon signed-rank test
 - **Binary outcome**: McNemar's test
 - **Ordinal outcome**: Wilcoxon signed-rank test
 #### Three or More Independent Groups
 - **Continuous outcome, normally distributed, equal variances**: One-way ANOVA
 - **Continuous outcome, normally distributed, unequal variances**: Welch's ANOVA
 - **Continuous outcome, non-normal**: Kruskal-Wallis H test
 - **Binary/categorical outcome**: Chi-square test
 - **Ordinal outcome**: Kruskal-Wallis H test
 #### Three or More Paired/Dependent Groups
 - **Continuous outcome, normally distributed**: Repeated measures ANOVA
 - **Continuous outcome, non-normal**: Friedman test
 - **Binary outcome**: Cochran's Q test
 #### Multiple Factors (Factorial Designs)
 - **Continuous outcome**: Two-way ANOVA (or higher-way ANOVA)
 - **With covariates**: ANCOVA
 - **Mixed within and between factors**: Mixed ANOVA
 ### 2. Relationships Between Variables
 #### Two Continuous Variables
 - **Linear relationship, bivariate normal**: Pearson correlation
 - **Monotonic relationship or non-normal**: Spearman rank correlation
 - **Rank-based data**: Spearman or Kendall's tau
 #### One Continuous Outcome, One or More Predictors
 - **Single continuous predictor**: Simple linear regression
 - **Multiple continuous/categorical predictors**: Multiple linear regression
 - **Categorical predictors**: ANOVA/ANCOVA framework
 - **Non-linear relationships**: Polynomial regression or generalized additive models (GAM)
 #### Binary Outcome
 - **Single predictor**: Logistic regression
 - **Multiple predictors**: Multiple logistic regression
 - **Rare events**: Exact logistic regression or Firth's method
 #### Count Outcome
 - **Poisson-distributed**: Poisson regression
 - **Overdispersed counts**: Negative binomial regression
 - **Zero-inflated**: Zero-inflated Poisson/negative binomial
 #### Time-to-Event Outcome
 - **Comparing survival curves**: Log-rank test
 - **Modeling with covariates**: Cox proportional hazards regression
 - **Parametric survival models**: Weibull, exponential, log-normal
 ### 3. Agreement and Reliability
 #### Inter-Rater Reliability
 - **Categorical ratings, 2 raters**: Cohen's kappa
 - **Categorical ratings, >2 raters**: Fleiss' kappa or Krippendorff's alpha
 - **Continuous ratings**: Intraclass correlation coefficient (ICC)
 #### Test-Retest Reliability
 - **Continuous measurements**: ICC or Pearson correlation
 - **Internal consistency**: Cronbach's alpha
 #### Agreement Between Methods
 - **Continuous measurements**: Bland-Altman analysis
 - **Categorical classifications**: Cohen's kappa
 ### 4. Categorical Data Analysis
 #### Contingency Tables
 - **2x2 table**: Chi-square test or Fisher's exact test
 - **Larger than 2x2**: Chi-square test
 - **Ordered categories**: Cochran-Armitage trend test
 - **Paired categories**: McNemar's test (2x2) or McNemar-Bowker test (larger)
 ### 5. Bayesian Alternatives
 Any of the above tests can be performed using Bayesian methods:
 - **Group comparisons**: Bayesian t-test, Bayesian ANOVA
 - **Correlations**: Bayesian correlation
 - **Regression**: Bayesian linear/logistic regression
 **Advantages of Bayesian approaches:**
 - Provides probability of hypotheses given data
 - Naturally incorporates prior information
 - Provides credible intervals instead of confidence intervals
 - No p-value interpretation issues
 ## Key Considerations
 ### Sample Size
 - Small samples (n < 30): Consider non-parametric tests or exact methods
 - Very large samples: Even small effects may be statistically significant; focus on effect sizes
 ### Multiple Comparisons
 - When conducting multiple tests, adjust for multiple comparisons using:
  - Bonferroni correction (conservative)
  - Holm-Bonferroni (less conservative)
  - False Discovery Rate (FDR) control (Benjamini-Hochberg)
  - Tukey HSD for post-hoc ANOVA comparisons
 ### Missing Data
 - Complete case analysis (listwise deletion)
 - Multiple imputation
 - Maximum likelihood methods
 - Ensure missing data mechanism is understood (MCAR, MAR, MNAR)
 ### Effect Sizes
 - Always report effect sizes alongside p-values
 - See `effect_sizes_and_power.md` for guidance
 ### Study Design Considerations
 - Randomized controlled trials: Standard parametric/non-parametric tests
 - Observational studies: Consider confounding and use regression/matching
 - Clustered/nested data: Use mixed-effects models or GEE
 - Time series: Use time series methods (ARIMA, etc.)
--- a/scientific-thinking/statistical-analysis/scripts/assumption_checks.py
+++ b/scientific-thinking/statistical-analysis/scripts/assumption_checks.py
@@ -0,0 +1,539 @@
 """
 Comprehensive statistical assumption checking utilities.
 This module provides functions to check common statistical assumptions:
 - Normality
 - Homogeneity of variance
 - Independence
 - Linearity
 - Outliers
 """
 import numpy as np
 import pandas as pd
 from scipy import stats
 import matplotlib.pyplot as plt
 import seaborn as sns
 from typing import Dict, List, Tuple, Optional, Union
 def check_normality(
    data: Union[np.ndarray, pd.Series, List],
    name: str = "data",
    alpha: float = 0.05,
    plot: bool = True
 ) -> Dict:
    """
    Check normality assumption using Shapiro-Wilk test and visualizations.
    Parameters
    ----------
    data : array-like
        Data to check for normality
    name : str
        Name of the variable (for labeling)
    alpha : float
        Significance level for Shapiro-Wilk test
    plot : bool
        Whether to create Q-Q plot and histogram
    Returns
    -------
    dict
        Results including test statistic, p-value, and interpretation
    """
    data = np.asarray(data)
    data_clean = data[~np.isnan(data)]
    # Shapiro-Wilk test
    statistic, p_value = stats.shapiro(data_clean)
    # Interpretation
    is_normal = p_value > alpha
    interpretation = (
        f"Data {'appear' if is_normal else 'do not appear'} normally distributed "
        f"(W = {statistic:.3f}, p = {p_value:.3f})"
    )
    # Visual checks
    if plot:
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        # Q-Q plot
        stats.probplot(data_clean, dist="norm", plot=ax1)
        ax1.set_title(f"Q-Q Plot: {name}")
        ax1.grid(alpha=0.3)
        # Histogram with normal curve
        ax2.hist(data_clean, bins='auto', density=True, alpha=0.7, color='steelblue', edgecolor='black')
        mu, sigma = data_clean.mean(), data_clean.std()
        x = np.linspace(data_clean.min(), data_clean.max(), 100)
        ax2.plot(x, stats.norm.pdf(x, mu, sigma), 'r-', linewidth=2, label='Normal curve')
        ax2.set_xlabel('Value')
        ax2.set_ylabel('Density')
        ax2.set_title(f'Histogram: {name}')
        ax2.legend()
        ax2.grid(alpha=0.3)
        plt.tight_layout()
        plt.show()
    return {
        'test': 'Shapiro-Wilk',
        'statistic': statistic,
        'p_value': p_value,
        'is_normal': is_normal,
        'interpretation': interpretation,
        'n': len(data_clean),
        'recommendation': (
            "Proceed with parametric test" if is_normal
            else "Consider non-parametric alternative or transformation"
        )
    }
 def check_normality_per_group(
    data: pd.DataFrame,
    value_col: str,
    group_col: str,
    alpha: float = 0.05,
    plot: bool = True
 ) -> pd.DataFrame:
    """
    Check normality assumption for each group separately.
    Parameters
    ----------
    data : pd.DataFrame
        Data containing values and group labels
    value_col : str
        Column name for values to check
    group_col : str
        Column name for group labels
    alpha : float
        Significance level
    plot : bool
        Whether to create Q-Q plots for each group
    Returns
    -------
    pd.DataFrame
        Results for each group
    """
    groups = data[group_col].unique()
    results = []
    if plot:
        n_groups = len(groups)
        fig, axes = plt.subplots(1, n_groups, figsize=(5 * n_groups, 4))
        if n_groups == 1:
            axes = [axes]
    for idx, group in enumerate(groups):
        group_data = data[data[group_col] == group][value_col].dropna()
        stat, p = stats.shapiro(group_data)
        results.append({
            'Group': group,
            'N': len(group_data),
            'W': stat,
            'p-value': p,
            'Normal': 'Yes' if p > alpha else 'No'
        })
        if plot:
            stats.probplot(group_data, dist="norm", plot=axes[idx])
            axes[idx].set_title(f"Q-Q Plot: {group}")
            axes[idx].grid(alpha=0.3)
    if plot:
        plt.tight_layout()
        plt.show()
    return pd.DataFrame(results)
 def check_homogeneity_of_variance(
    data: pd.DataFrame,
    value_col: str,
    group_col: str,
    alpha: float = 0.05,
    plot: bool = True
 ) -> Dict:
    """
    Check homogeneity of variance using Levene's test.
    Parameters
    ----------
    data : pd.DataFrame
        Data containing values and group labels
    value_col : str
        Column name for values
    group_col : str
        Column name for group labels
    alpha : float
        Significance level
    plot : bool
        Whether to create box plots
    Returns
    -------
    dict
        Results including test statistic, p-value, and interpretation
    """
    groups = [group[value_col].values for name, group in data.groupby(group_col)]
    # Levene's test (robust to non-normality)
    statistic, p_value = stats.levene(*groups)
    # Variance ratio (max/min)
    variances = [np.var(g, ddof=1) for g in groups]
    var_ratio = max(variances) / min(variances)
    is_homogeneous = p_value > alpha
    interpretation = (
        f"Variances {'appear' if is_homogeneous else 'do not appear'} homogeneous "
        f"(F = {statistic:.3f}, p = {p_value:.3f}, variance ratio = {var_ratio:.2f})"
    )
    if plot:
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        # Box plot
        data.boxplot(column=value_col, by=group_col, ax=ax1)
        ax1.set_title('Box Plots by Group')
        ax1.set_xlabel(group_col)
        ax1.set_ylabel(value_col)
        plt.sca(ax1)
        plt.xticks(rotation=45)
        # Variance plot
        group_names = data[group_col].unique()
        ax2.bar(range(len(variances)), variances, color='steelblue', edgecolor='black')
        ax2.set_xticks(range(len(variances)))
        ax2.set_xticklabels(group_names, rotation=45)
        ax2.set_ylabel('Variance')
        ax2.set_title('Variance by Group')
        ax2.grid(alpha=0.3, axis='y')
        plt.tight_layout()
        plt.show()
    return {
        'test': 'Levene',
        'statistic': statistic,
        'p_value': p_value,
        'is_homogeneous': is_homogeneous,
        'variance_ratio': var_ratio,
        'interpretation': interpretation,
        'recommendation': (
            "Proceed with standard test" if is_homogeneous
            else "Consider Welch's correction or transformation"
        )
    }
 def check_linearity(
    x: Union[np.ndarray, pd.Series],
    y: Union[np.ndarray, pd.Series],
    x_name: str = "X",
    y_name: str = "Y"
 ) -> Dict:
    """
    Check linearity assumption for regression.
    Parameters
    ----------
    x : array-like
        Predictor variable
    y : array-like
        Outcome variable
    x_name : str
        Name of predictor
    y_name : str
        Name of outcome
    Returns
    -------
    dict
        Visualization and recommendations
    """
    x = np.asarray(x)
    y = np.asarray(y)
    # Fit linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
    y_pred = intercept + slope * x
    # Calculate residuals
    residuals = y - y_pred
    # Visualization
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    # Scatter plot with regression line
    ax1.scatter(x, y, alpha=0.6, s=50, edgecolors='black', linewidths=0.5)
    ax1.plot(x, y_pred, 'r-', linewidth=2, label=f'y = {intercept:.2f} + {slope:.2f}x')
    ax1.set_xlabel(x_name)
    ax1.set_ylabel(y_name)
    ax1.set_title('Scatter Plot with Regression Line')
    ax1.legend()
    ax1.grid(alpha=0.3)
    # Residuals vs fitted
    ax2.scatter(y_pred, residuals, alpha=0.6, s=50, edgecolors='black', linewidths=0.5)
    ax2.axhline(y=0, color='r', linestyle='--', linewidth=2)
    ax2.set_xlabel('Fitted values')
    ax2.set_ylabel('Residuals')
    ax2.set_title('Residuals vs Fitted Values')
    ax2.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()
    return {
        'r': r_value,
        'r_squared': r_value ** 2,
        'interpretation': (
            "Examine residual plot. Points should be randomly scattered around zero. "
            "Patterns (curves, funnels) suggest non-linearity or heteroscedasticity."
        ),
        'recommendation': (
            "If non-linear pattern detected: Consider polynomial terms, "
            "transformations, or non-linear models"
        )
    }
 def detect_outliers(
    data: Union[np.ndarray, pd.Series, List],
    name: str = "data",
    method: str = "iqr",
    threshold: float = 1.5,
    plot: bool = True
 ) -> Dict:
    """
    Detect outliers using IQR method or z-score method.
    Parameters
    ----------
    data : array-like
        Data to check for outliers
    name : str
        Name of variable
    method : str
        Method to use: 'iqr' or 'zscore'
    threshold : float
        Threshold for outlier detection
        For IQR: typically 1.5 (mild) or 3 (extreme)
        For z-score: typically 3
    plot : bool
        Whether to create visualizations
    Returns
    -------
    dict
        Outlier indices, values, and visualizations
    """
    data = np.asarray(data)
    data_clean = data[~np.isnan(data)]
    if method == "iqr":
        q1 = np.percentile(data_clean, 25)
        q3 = np.percentile(data_clean, 75)
        iqr = q3 - q1
        lower_bound = q1 - threshold * iqr
        upper_bound = q3 + threshold * iqr
        outlier_mask = (data_clean < lower_bound) | (data_clean > upper_bound)
    elif method == "zscore":
        z_scores = np.abs(stats.zscore(data_clean))
        outlier_mask = z_scores > threshold
        lower_bound = data_clean.mean() - threshold * data_clean.std()
        upper_bound = data_clean.mean() + threshold * data_clean.std()
    else:
        raise ValueError("method must be 'iqr' or 'zscore'")
    outlier_indices = np.where(outlier_mask)[0]
    outlier_values = data_clean[outlier_mask]
    n_outliers = len(outlier_indices)
    pct_outliers = (n_outliers / len(data_clean)) * 100
    if plot:
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        # Box plot
        bp = ax1.boxplot(data_clean, vert=True, patch_artist=True)
        bp['boxes'][0].set_facecolor('steelblue')
        ax1.set_ylabel('Value')
        ax1.set_title(f'Box Plot: {name}')
        ax1.grid(alpha=0.3, axis='y')
        # Scatter plot highlighting outliers
        x_coords = np.arange(len(data_clean))
        ax2.scatter(x_coords[~outlier_mask], data_clean[~outlier_mask],
                   alpha=0.6, s=50, color='steelblue', label='Normal', edgecolors='black', linewidths=0.5)
        if n_outliers > 0:
            ax2.scatter(x_coords[outlier_mask], data_clean[outlier_mask],
                       alpha=0.8, s=100, color='red', label='Outliers', marker='D', edgecolors='black', linewidths=0.5)
        ax2.axhline(y=lower_bound, color='orange', linestyle='--', linewidth=1.5, label='Bounds')
        ax2.axhline(y=upper_bound, color='orange', linestyle='--', linewidth=1.5)
        ax2.set_xlabel('Index')
        ax2.set_ylabel('Value')
        ax2.set_title(f'Outlier Detection: {name}')
        ax2.legend()
        ax2.grid(alpha=0.3)
        plt.tight_layout()
        plt.show()
    return {
        'method': method,
        'threshold': threshold,
        'n_outliers': n_outliers,
        'pct_outliers': pct_outliers,
        'outlier_indices': outlier_indices,
        'outlier_values': outlier_values,
        'lower_bound': lower_bound,
        'upper_bound': upper_bound,
        'interpretation': f"Found {n_outliers} outliers ({pct_outliers:.1f}% of data)",
        'recommendation': (
            "Investigate outliers for data entry errors. "
            "Consider: (1) removing if errors, (2) winsorizing, "
            "(3) keeping if legitimate, (4) using robust methods"
        )
    }
 def comprehensive_assumption_check(
    data: pd.DataFrame,
    value_col: str,
    group_col: Optional[str] = None,
    alpha: float = 0.05
 ) -> Dict:
    """
    Perform comprehensive assumption checking for common statistical tests.
    Parameters
    ----------
    data : pd.DataFrame
        Data to check
    value_col : str
        Column name for dependent variable
    group_col : str, optional
        Column name for grouping variable (if applicable)
    alpha : float
        Significance level
    Returns
    -------
    dict
        Summary of all assumption checks
    """
    print("=" * 70)
    print("COMPREHENSIVE ASSUMPTION CHECK")
    print("=" * 70)
    results = {}
    # Outlier detection
    print("\n1. OUTLIER DETECTION")
    print("-" * 70)
    outlier_results = detect_outliers(
        data[value_col].dropna(),
        name=value_col,
        method='iqr',
        plot=True
    )
    results['outliers'] = outlier_results
    print(f"   {outlier_results['interpretation']}")
    print(f"   {outlier_results['recommendation']}")
    # Check if grouped data
    if group_col is not None:
        # Normality per group
        print(f"\n2. NORMALITY CHECK (by {group_col})")
        print("-" * 70)
        normality_results = check_normality_per_group(
            data, value_col, group_col, alpha=alpha, plot=True
        )
        results['normality_per_group'] = normality_results
        print(normality_results.to_string(index=False))
        all_normal = normality_results['Normal'].eq('Yes').all()
        print(f"\n   All groups normal: {'Yes' if all_normal else 'No'}")
        if not all_normal:
            print("   → Consider non-parametric alternative (Mann-Whitney, Kruskal-Wallis)")
        # Homogeneity of variance
        print(f"\n3. HOMOGENEITY OF VARIANCE")
        print("-" * 70)
        homogeneity_results = check_homogeneity_of_variance(
            data, value_col, group_col, alpha=alpha, plot=True
        )
        results['homogeneity'] = homogeneity_results
        print(f"   {homogeneity_results['interpretation']}")
        print(f"   {homogeneity_results['recommendation']}")
    else:
        # Overall normality
        print(f"\n2. NORMALITY CHECK")
        print("-" * 70)
        normality_results = check_normality(
            data[value_col].dropna(),
            name=value_col,
            alpha=alpha,
            plot=True
        )
        results['normality'] = normality_results
        print(f"   {normality_results['interpretation']}")
        print(f"   {normality_results['recommendation']}")
    # Summary
    print("\n" + "=" * 70)
    print("SUMMARY")
    print("=" * 70)
    if group_col is not None:
        all_normal = results.get('normality_per_group', pd.DataFrame()).get('Normal', pd.Series()).eq('Yes').all()
        is_homogeneous = results.get('homogeneity', {}).get('is_homogeneous', False)
        if all_normal and is_homogeneous:
            print("✓ All assumptions met. Proceed with parametric test (t-test, ANOVA).")
        elif not all_normal:
            print("✗ Normality violated. Use non-parametric alternative.")
        elif not is_homogeneous:
            print("✗ Homogeneity violated. Use Welch's correction or transformation.")
    else:
        is_normal = results.get('normality', {}).get('is_normal', False)
        if is_normal:
            print("✓ Normality assumption met.")
        else:
            print("✗ Normality violated. Consider transformation or non-parametric method.")
    print("=" * 70)
    return results
 if __name__ == "__main__":
    # Example usage
    np.random.seed(42)
    # Simulate data
    group_a = np.random.normal(75, 8, 50)
    group_b = np.random.normal(68, 10, 50)
    df = pd.DataFrame({
        'score': np.concatenate([group_a, group_b]),
        'group': ['A'] * 50 + ['B'] * 50
    })
    # Run comprehensive check
    results = comprehensive_assumption_check(
        df,
        value_col='score',
        group_col='group',
        alpha=0.05
    )