Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation

- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
2026-03-27 07:09:27 +08:00 · 2026-03-03 07:15:36 -05:00
parent b271271df4
commit 7f94783fab
27 changed files with 6961 additions and 0 deletions
--- a/scientific-skills/depmap/references/dependency_analysis.md
+++ b/scientific-skills/depmap/references/dependency_analysis.md
@@ -0,0 +1,178 @@
+# DepMap Dependency Analysis Guide
+
+## Understanding Chronos Scores
+
+Chronos is the current (v5+) algorithm for computing gene dependency scores from CRISPR screen data. It addresses systematic biases including:
+- Copy number effects (high-copy genes appear essential due to DNA cutting)
+- Guide RNA efficiency variation
+- Cell line growth rates
+
+### Score Interpretation
+
+| Score Range | Interpretation |
+|------------|----------------|
+| > 0 | Likely growth-promoting when knocked out (some noise) |
+| 0 to −0.3 | Non-essential: minimal fitness effect |
+| −0.3 to −0.5 | Mild dependency |
+| −0.5 to −1.0 | Significant dependency |
+| < −1.0 | Strong dependency (common essential range) |
+| ≈ −1.0 | Median of pan-essential genes (e.g., proteasome subunits) |
+
+### Common Essential Genes (Controls)
+
+Genes that are essential in nearly all cell lines (score ~−1 to −2):
+- Ribosomal proteins: RPL..., RPS...
+- Proteasome: PSMA..., PSMB...
+- Spliceosome: SNRPD1, SNRNP70
+- DNA replication: MCM2, PCNA
+- Transcription: POLR2A, TAF...
+
+These can be used as positive controls for screen quality.
+
+### Non-Essential Controls
+
+Genes with negligible fitness effect (score ~ 0):
+- Non-expressed genes (tissue-specific)
+- Safe harbor loci
+
+## Selectivity Assessment
+
+To determine if a dependency is cancer-selective:
+
+```python
+import pandas as pd
+import numpy as np
+
+def compute_selectivity(gene_effect_df, target_gene, cancer_lineage):
+    """Compute selectivity score for a cancer lineage."""
+    scores = gene_effect_df[target_gene].dropna()
+
+    # Get cell line metadata
+    from depmap_utils import load_cell_line_info
+    cell_info = load_cell_line_info()
+    scores_df = scores.reset_index()
+    scores_df.columns = ["DepMap_ID", "score"]
+    scores_df = scores_df.merge(cell_info[["DepMap_ID", "lineage"]])
+
+    cancer_scores = scores_df[scores_df["lineage"] == cancer_lineage]["score"]
+    other_scores = scores_df[scores_df["lineage"] != cancer_lineage]["score"]
+
+    # Selectivity: lower mean in cancer lineage vs others
+    selectivity = other_scores.mean() - cancer_scores.mean()
+    return {
+        "target_gene": target_gene,
+        "cancer_lineage": cancer_lineage,
+        "cancer_mean": cancer_scores.mean(),
+        "other_mean": other_scores.mean(),
+        "selectivity_score": selectivity,
+        "n_cancer": len(cancer_scores),
+        "fraction_dependent": (cancer_scores < -0.5).mean()
+    }
+```
+
+## CRISPR Dataset Versions
+
+| Dataset | Description | Recommended |
+|---------|-------------|-------------|
+| `CRISPRGeneEffect` | Chronos-corrected gene effect | Yes (current) |
+| `Achilles_gene_effect` | Older CERES algorithm | Legacy only |
+| `RNAi_merged` | DEMETER2 RNAi | For cross-validation |
+
+## Quality Metrics
+
+DepMap reports quality control metrics per screen:
+- **Skewness**: Pan-essential genes should show negative skew
+- **AUC**: Area under ROC for pan-essential vs non-essential controls
+
+Good screens: skewness < −1, AUC > 0.85
+
+## Cancer Lineage Codes
+
+Common values for `lineage` field in `sample_info.csv`:
+
+| Lineage | Description |
+|---------|-------------|
+| `lung` | Lung cancer |
+| `breast` | Breast cancer |
+| `colorectal` | Colorectal cancer |
+| `brain_cancer` | Brain cancer (GBM, etc.) |
+| `leukemia` | Leukemia |
+| `lymphoma` | Lymphoma |
+| `prostate` | Prostate cancer |
+| `ovarian` | Ovarian cancer |
+| `pancreatic` | Pancreatic cancer |
+| `skin` | Melanoma and other skin |
+| `liver` | Liver cancer |
+| `kidney` | Kidney cancer |
+
+## Synthetic Lethality Analysis
+
+```python
+import pandas as pd
+import numpy as np
+from scipy import stats
+
+def find_synthetic_lethal(gene_effect_df, mutation_df, biomarker_gene,
+                           fdr_threshold=0.1):
+    """
+    Find synthetic lethal partners for a loss-of-function mutation.
+
+    For each gene, tests if cell lines mutant in biomarker_gene
+    are more dependent on that gene vs. WT lines.
+    """
+    if biomarker_gene not in mutation_df.columns:
+        return pd.DataFrame()
+
+    # Get mutant vs WT cell lines
+    common = gene_effect_df.index.intersection(mutation_df.index)
+    is_mutant = mutation_df.loc[common, biomarker_gene] == 1
+
+    mutant_lines = common[is_mutant]
+    wt_lines = common[~is_mutant]
+
+    results = []
+    for gene in gene_effect_df.columns:
+        mut_scores = gene_effect_df.loc[mutant_lines, gene].dropna()
+        wt_scores = gene_effect_df.loc[wt_lines, gene].dropna()
+
+        if len(mut_scores) < 5 or len(wt_scores) < 10:
+            continue
+
+        stat, pval = stats.mannwhitneyu(mut_scores, wt_scores, alternative='less')
+        results.append({
+            "gene": gene,
+            "mean_mutant": mut_scores.mean(),
+            "mean_wt": wt_scores.mean(),
+            "effect_size": wt_scores.mean() - mut_scores.mean(),
+            "pval": pval,
+            "n_mutant": len(mut_scores),
+            "n_wt": len(wt_scores)
+        })
+
+    df = pd.DataFrame(results)
+    # FDR correction
+    from scipy.stats import false_discovery_control
+    df["qval"] = false_discovery_control(df["pval"], method="bh")
+    df = df[df["qval"] < fdr_threshold].sort_values("effect_size", ascending=False)
+    return df
+```
+
+## Drug Sensitivity (PRISM)
+
+DepMap also contains compound sensitivity data from the PRISM assay:
+
+```python
+import pandas as pd
+
+def load_prism_data(filepath="primary-screen-replicate-collapsed-logfold-change.csv"):
+    """
+    Load PRISM drug sensitivity data.
+    Rows = cell lines, Columns = compounds (broad_id::name::dose)
+    Values = log2 fold change (more negative = more sensitive)
+    """
+    return pd.read_csv(filepath, index_col=0)
+
+# Available datasets:
+# primary-screen: 4,518 compounds at single dose
+# secondary-screen: ~8,000 compounds at multiple doses (AUC available)
+```