Files
huangkuanlin 7f94783fab Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization.
- Added a script for running RNA velocity analysis with customizable parameters and output options.
- Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation.
- Included references for velocity models and their mathematical framework, along with a comparison of different models.
- Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
2026-03-03 07:15:36 -05:00

179 lines
5.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DepMap Dependency Analysis Guide
## Understanding Chronos Scores
Chronos is the current (v5+) algorithm for computing gene dependency scores from CRISPR screen data. It addresses systematic biases including:
- Copy number effects (high-copy genes appear essential due to DNA cutting)
- Guide RNA efficiency variation
- Cell line growth rates
### Score Interpretation
| Score Range | Interpretation |
|------------|----------------|
| > 0 | Likely growth-promoting when knocked out (some noise) |
| 0 to 0.3 | Non-essential: minimal fitness effect |
| 0.3 to 0.5 | Mild dependency |
| 0.5 to 1.0 | Significant dependency |
| < 1.0 | Strong dependency (common essential range) |
| ≈ 1.0 | Median of pan-essential genes (e.g., proteasome subunits) |
### Common Essential Genes (Controls)
Genes that are essential in nearly all cell lines (score ~1 to 2):
- Ribosomal proteins: RPL..., RPS...
- Proteasome: PSMA..., PSMB...
- Spliceosome: SNRPD1, SNRNP70
- DNA replication: MCM2, PCNA
- Transcription: POLR2A, TAF...
These can be used as positive controls for screen quality.
### Non-Essential Controls
Genes with negligible fitness effect (score ~ 0):
- Non-expressed genes (tissue-specific)
- Safe harbor loci
## Selectivity Assessment
To determine if a dependency is cancer-selective:
```python
import pandas as pd
import numpy as np
def compute_selectivity(gene_effect_df, target_gene, cancer_lineage):
"""Compute selectivity score for a cancer lineage."""
scores = gene_effect_df[target_gene].dropna()
# Get cell line metadata
from depmap_utils import load_cell_line_info
cell_info = load_cell_line_info()
scores_df = scores.reset_index()
scores_df.columns = ["DepMap_ID", "score"]
scores_df = scores_df.merge(cell_info[["DepMap_ID", "lineage"]])
cancer_scores = scores_df[scores_df["lineage"] == cancer_lineage]["score"]
other_scores = scores_df[scores_df["lineage"] != cancer_lineage]["score"]
# Selectivity: lower mean in cancer lineage vs others
selectivity = other_scores.mean() - cancer_scores.mean()
return {
"target_gene": target_gene,
"cancer_lineage": cancer_lineage,
"cancer_mean": cancer_scores.mean(),
"other_mean": other_scores.mean(),
"selectivity_score": selectivity,
"n_cancer": len(cancer_scores),
"fraction_dependent": (cancer_scores < -0.5).mean()
}
```
## CRISPR Dataset Versions
| Dataset | Description | Recommended |
|---------|-------------|-------------|
| `CRISPRGeneEffect` | Chronos-corrected gene effect | Yes (current) |
| `Achilles_gene_effect` | Older CERES algorithm | Legacy only |
| `RNAi_merged` | DEMETER2 RNAi | For cross-validation |
## Quality Metrics
DepMap reports quality control metrics per screen:
- **Skewness**: Pan-essential genes should show negative skew
- **AUC**: Area under ROC for pan-essential vs non-essential controls
Good screens: skewness < 1, AUC > 0.85
## Cancer Lineage Codes
Common values for `lineage` field in `sample_info.csv`:
| Lineage | Description |
|---------|-------------|
| `lung` | Lung cancer |
| `breast` | Breast cancer |
| `colorectal` | Colorectal cancer |
| `brain_cancer` | Brain cancer (GBM, etc.) |
| `leukemia` | Leukemia |
| `lymphoma` | Lymphoma |
| `prostate` | Prostate cancer |
| `ovarian` | Ovarian cancer |
| `pancreatic` | Pancreatic cancer |
| `skin` | Melanoma and other skin |
| `liver` | Liver cancer |
| `kidney` | Kidney cancer |
## Synthetic Lethality Analysis
```python
import pandas as pd
import numpy as np
from scipy import stats
def find_synthetic_lethal(gene_effect_df, mutation_df, biomarker_gene,
fdr_threshold=0.1):
"""
Find synthetic lethal partners for a loss-of-function mutation.
For each gene, tests if cell lines mutant in biomarker_gene
are more dependent on that gene vs. WT lines.
"""
if biomarker_gene not in mutation_df.columns:
return pd.DataFrame()
# Get mutant vs WT cell lines
common = gene_effect_df.index.intersection(mutation_df.index)
is_mutant = mutation_df.loc[common, biomarker_gene] == 1
mutant_lines = common[is_mutant]
wt_lines = common[~is_mutant]
results = []
for gene in gene_effect_df.columns:
mut_scores = gene_effect_df.loc[mutant_lines, gene].dropna()
wt_scores = gene_effect_df.loc[wt_lines, gene].dropna()
if len(mut_scores) < 5 or len(wt_scores) < 10:
continue
stat, pval = stats.mannwhitneyu(mut_scores, wt_scores, alternative='less')
results.append({
"gene": gene,
"mean_mutant": mut_scores.mean(),
"mean_wt": wt_scores.mean(),
"effect_size": wt_scores.mean() - mut_scores.mean(),
"pval": pval,
"n_mutant": len(mut_scores),
"n_wt": len(wt_scores)
})
df = pd.DataFrame(results)
# FDR correction
from scipy.stats import false_discovery_control
df["qval"] = false_discovery_control(df["pval"], method="bh")
df = df[df["qval"] < fdr_threshold].sort_values("effect_size", ascending=False)
return df
```
## Drug Sensitivity (PRISM)
DepMap also contains compound sensitivity data from the PRISM assay:
```python
import pandas as pd
def load_prism_data(filepath="primary-screen-replicate-collapsed-logfold-change.csv"):
"""
Load PRISM drug sensitivity data.
Rows = cell lines, Columns = compounds (broad_id::name::dose)
Values = log2 fold change (more negative = more sensitive)
"""
return pd.read_csv(filepath, index_col=0)
# Available datasets:
# primary-screen: 4,518 compounds at single dose
# secondary-screen: ~8,000 compounds at multiple doses (AUC available)
```