mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
This commit is contained in:
178
scientific-skills/depmap/references/dependency_analysis.md
Normal file
178
scientific-skills/depmap/references/dependency_analysis.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# DepMap Dependency Analysis Guide
|
||||
|
||||
## Understanding Chronos Scores
|
||||
|
||||
Chronos is the current (v5+) algorithm for computing gene dependency scores from CRISPR screen data. It addresses systematic biases including:
|
||||
- Copy number effects (high-copy genes appear essential due to DNA cutting)
|
||||
- Guide RNA efficiency variation
|
||||
- Cell line growth rates
|
||||
|
||||
### Score Interpretation
|
||||
|
||||
| Score Range | Interpretation |
|
||||
|------------|----------------|
|
||||
| > 0 | Likely growth-promoting when knocked out (some noise) |
|
||||
| 0 to −0.3 | Non-essential: minimal fitness effect |
|
||||
| −0.3 to −0.5 | Mild dependency |
|
||||
| −0.5 to −1.0 | Significant dependency |
|
||||
| < −1.0 | Strong dependency (common essential range) |
|
||||
| ≈ −1.0 | Median of pan-essential genes (e.g., proteasome subunits) |
|
||||
|
||||
### Common Essential Genes (Controls)
|
||||
|
||||
Genes that are essential in nearly all cell lines (score ~−1 to −2):
|
||||
- Ribosomal proteins: RPL..., RPS...
|
||||
- Proteasome: PSMA..., PSMB...
|
||||
- Spliceosome: SNRPD1, SNRNP70
|
||||
- DNA replication: MCM2, PCNA
|
||||
- Transcription: POLR2A, TAF...
|
||||
|
||||
These can be used as positive controls for screen quality.
|
||||
|
||||
### Non-Essential Controls
|
||||
|
||||
Genes with negligible fitness effect (score ~ 0):
|
||||
- Non-expressed genes (tissue-specific)
|
||||
- Safe harbor loci
|
||||
|
||||
## Selectivity Assessment
|
||||
|
||||
To determine if a dependency is cancer-selective:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
def compute_selectivity(gene_effect_df, target_gene, cancer_lineage):
|
||||
"""Compute selectivity score for a cancer lineage."""
|
||||
scores = gene_effect_df[target_gene].dropna()
|
||||
|
||||
# Get cell line metadata
|
||||
from depmap_utils import load_cell_line_info
|
||||
cell_info = load_cell_line_info()
|
||||
scores_df = scores.reset_index()
|
||||
scores_df.columns = ["DepMap_ID", "score"]
|
||||
scores_df = scores_df.merge(cell_info[["DepMap_ID", "lineage"]])
|
||||
|
||||
cancer_scores = scores_df[scores_df["lineage"] == cancer_lineage]["score"]
|
||||
other_scores = scores_df[scores_df["lineage"] != cancer_lineage]["score"]
|
||||
|
||||
# Selectivity: lower mean in cancer lineage vs others
|
||||
selectivity = other_scores.mean() - cancer_scores.mean()
|
||||
return {
|
||||
"target_gene": target_gene,
|
||||
"cancer_lineage": cancer_lineage,
|
||||
"cancer_mean": cancer_scores.mean(),
|
||||
"other_mean": other_scores.mean(),
|
||||
"selectivity_score": selectivity,
|
||||
"n_cancer": len(cancer_scores),
|
||||
"fraction_dependent": (cancer_scores < -0.5).mean()
|
||||
}
|
||||
```
|
||||
|
||||
## CRISPR Dataset Versions
|
||||
|
||||
| Dataset | Description | Recommended |
|
||||
|---------|-------------|-------------|
|
||||
| `CRISPRGeneEffect` | Chronos-corrected gene effect | Yes (current) |
|
||||
| `Achilles_gene_effect` | Older CERES algorithm | Legacy only |
|
||||
| `RNAi_merged` | DEMETER2 RNAi | For cross-validation |
|
||||
|
||||
## Quality Metrics
|
||||
|
||||
DepMap reports quality control metrics per screen:
|
||||
- **Skewness**: Pan-essential genes should show negative skew
|
||||
- **AUC**: Area under ROC for pan-essential vs non-essential controls
|
||||
|
||||
Good screens: skewness < −1, AUC > 0.85
|
||||
|
||||
## Cancer Lineage Codes
|
||||
|
||||
Common values for `lineage` field in `sample_info.csv`:
|
||||
|
||||
| Lineage | Description |
|
||||
|---------|-------------|
|
||||
| `lung` | Lung cancer |
|
||||
| `breast` | Breast cancer |
|
||||
| `colorectal` | Colorectal cancer |
|
||||
| `brain_cancer` | Brain cancer (GBM, etc.) |
|
||||
| `leukemia` | Leukemia |
|
||||
| `lymphoma` | Lymphoma |
|
||||
| `prostate` | Prostate cancer |
|
||||
| `ovarian` | Ovarian cancer |
|
||||
| `pancreatic` | Pancreatic cancer |
|
||||
| `skin` | Melanoma and other skin |
|
||||
| `liver` | Liver cancer |
|
||||
| `kidney` | Kidney cancer |
|
||||
|
||||
## Synthetic Lethality Analysis
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from scipy import stats
|
||||
|
||||
def find_synthetic_lethal(gene_effect_df, mutation_df, biomarker_gene,
|
||||
fdr_threshold=0.1):
|
||||
"""
|
||||
Find synthetic lethal partners for a loss-of-function mutation.
|
||||
|
||||
For each gene, tests if cell lines mutant in biomarker_gene
|
||||
are more dependent on that gene vs. WT lines.
|
||||
"""
|
||||
if biomarker_gene not in mutation_df.columns:
|
||||
return pd.DataFrame()
|
||||
|
||||
# Get mutant vs WT cell lines
|
||||
common = gene_effect_df.index.intersection(mutation_df.index)
|
||||
is_mutant = mutation_df.loc[common, biomarker_gene] == 1
|
||||
|
||||
mutant_lines = common[is_mutant]
|
||||
wt_lines = common[~is_mutant]
|
||||
|
||||
results = []
|
||||
for gene in gene_effect_df.columns:
|
||||
mut_scores = gene_effect_df.loc[mutant_lines, gene].dropna()
|
||||
wt_scores = gene_effect_df.loc[wt_lines, gene].dropna()
|
||||
|
||||
if len(mut_scores) < 5 or len(wt_scores) < 10:
|
||||
continue
|
||||
|
||||
stat, pval = stats.mannwhitneyu(mut_scores, wt_scores, alternative='less')
|
||||
results.append({
|
||||
"gene": gene,
|
||||
"mean_mutant": mut_scores.mean(),
|
||||
"mean_wt": wt_scores.mean(),
|
||||
"effect_size": wt_scores.mean() - mut_scores.mean(),
|
||||
"pval": pval,
|
||||
"n_mutant": len(mut_scores),
|
||||
"n_wt": len(wt_scores)
|
||||
})
|
||||
|
||||
df = pd.DataFrame(results)
|
||||
# FDR correction
|
||||
from scipy.stats import false_discovery_control
|
||||
df["qval"] = false_discovery_control(df["pval"], method="bh")
|
||||
df = df[df["qval"] < fdr_threshold].sort_values("effect_size", ascending=False)
|
||||
return df
|
||||
```
|
||||
|
||||
## Drug Sensitivity (PRISM)
|
||||
|
||||
DepMap also contains compound sensitivity data from the PRISM assay:
|
||||
|
||||
```python
|
||||
import pandas as pd
|
||||
|
||||
def load_prism_data(filepath="primary-screen-replicate-collapsed-logfold-change.csv"):
|
||||
"""
|
||||
Load PRISM drug sensitivity data.
|
||||
Rows = cell lines, Columns = compounds (broad_id::name::dose)
|
||||
Values = log2 fold change (more negative = more sensitive)
|
||||
"""
|
||||
return pd.read_csv(filepath, index_col=0)
|
||||
|
||||
# Available datasets:
|
||||
# primary-screen: 4,518 compounds at single dose
|
||||
# secondary-screen: ~8,000 compounds at multiple doses (AUC available)
|
||||
```
|
||||
Reference in New Issue
Block a user