mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-03-27 07:09:27 +08:00

Files

huangkuanlin 7f94783fab Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation

- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization.
- Added a script for running RNA velocity analysis with customizable parameters and output options.
- Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation.
- Included references for velocity models and their mathematical framework, along with a comparison of different models.
- Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.

2026-03-03 07:15:36 -05:00

5.6 KiB

Raw Blame History

DepMap Dependency Analysis Guide

Understanding Chronos Scores

Chronos is the current (v5+) algorithm for computing gene dependency scores from CRISPR screen data. It addresses systematic biases including:

Copy number effects (high-copy genes appear essential due to DNA cutting)
Guide RNA efficiency variation
Cell line growth rates

Score Interpretation

Score Range	Interpretation
> 0	Likely growth-promoting when knocked out (some noise)
0 to −0.3	Non-essential: minimal fitness effect
−0.3 to −0.5	Mild dependency
−0.5 to −1.0	Significant dependency
< −1.0	Strong dependency (common essential range)
≈ −1.0	Median of pan-essential genes (e.g., proteasome subunits)

Common Essential Genes (Controls)

Genes that are essential in nearly all cell lines (score ~−1 to −2):

Ribosomal proteins: RPL..., RPS...
Proteasome: PSMA..., PSMB...
Spliceosome: SNRPD1, SNRNP70
DNA replication: MCM2, PCNA
Transcription: POLR2A, TAF...

These can be used as positive controls for screen quality.

Non-Essential Controls

Genes with negligible fitness effect (score ~ 0):

Non-expressed genes (tissue-specific)
Safe harbor loci

Selectivity Assessment

To determine if a dependency is cancer-selective:

import pandas as pd
import numpy as np

def compute_selectivity(gene_effect_df, target_gene, cancer_lineage):
    """Compute selectivity score for a cancer lineage."""
    scores = gene_effect_df[target_gene].dropna()

    # Get cell line metadata
    from depmap_utils import load_cell_line_info
    cell_info = load_cell_line_info()
    scores_df = scores.reset_index()
    scores_df.columns = ["DepMap_ID", "score"]
    scores_df = scores_df.merge(cell_info[["DepMap_ID", "lineage"]])

    cancer_scores = scores_df[scores_df["lineage"] == cancer_lineage]["score"]
    other_scores = scores_df[scores_df["lineage"] != cancer_lineage]["score"]

    # Selectivity: lower mean in cancer lineage vs others
    selectivity = other_scores.mean() - cancer_scores.mean()
    return {
        "target_gene": target_gene,
        "cancer_lineage": cancer_lineage,
        "cancer_mean": cancer_scores.mean(),
        "other_mean": other_scores.mean(),
        "selectivity_score": selectivity,
        "n_cancer": len(cancer_scores),
        "fraction_dependent": (cancer_scores < -0.5).mean()
    }

CRISPR Dataset Versions

Dataset	Description	Recommended
`CRISPRGeneEffect`	Chronos-corrected gene effect	Yes (current)
`Achilles_gene_effect`	Older CERES algorithm	Legacy only
`RNAi_merged`	DEMETER2 RNAi	For cross-validation

Quality Metrics

DepMap reports quality control metrics per screen:

Skewness: Pan-essential genes should show negative skew
AUC: Area under ROC for pan-essential vs non-essential controls

Good screens: skewness < −1, AUC > 0.85

Cancer Lineage Codes

Common values for lineage field in sample_info.csv:

Lineage	Description
`lung`	Lung cancer
`breast`	Breast cancer
`colorectal`	Colorectal cancer
`brain_cancer`	Brain cancer (GBM, etc.)
`leukemia`	Leukemia
`lymphoma`	Lymphoma
`prostate`	Prostate cancer
`ovarian`	Ovarian cancer
`pancreatic`	Pancreatic cancer
`skin`	Melanoma and other skin
`liver`	Liver cancer
`kidney`	Kidney cancer

Synthetic Lethality Analysis

import pandas as pd
import numpy as np
from scipy import stats

def find_synthetic_lethal(gene_effect_df, mutation_df, biomarker_gene,
                           fdr_threshold=0.1):
    """
    Find synthetic lethal partners for a loss-of-function mutation.

    For each gene, tests if cell lines mutant in biomarker_gene
    are more dependent on that gene vs. WT lines.
    """
    if biomarker_gene not in mutation_df.columns:
        return pd.DataFrame()

    # Get mutant vs WT cell lines
    common = gene_effect_df.index.intersection(mutation_df.index)
    is_mutant = mutation_df.loc[common, biomarker_gene] == 1

    mutant_lines = common[is_mutant]
    wt_lines = common[~is_mutant]

    results = []
    for gene in gene_effect_df.columns:
        mut_scores = gene_effect_df.loc[mutant_lines, gene].dropna()
        wt_scores = gene_effect_df.loc[wt_lines, gene].dropna()

        if len(mut_scores) < 5 or len(wt_scores) < 10:
            continue

        stat, pval = stats.mannwhitneyu(mut_scores, wt_scores, alternative='less')
        results.append({
            "gene": gene,
            "mean_mutant": mut_scores.mean(),
            "mean_wt": wt_scores.mean(),
            "effect_size": wt_scores.mean() - mut_scores.mean(),
            "pval": pval,
            "n_mutant": len(mut_scores),
            "n_wt": len(wt_scores)
        })

    df = pd.DataFrame(results)
    # FDR correction
    from scipy.stats import false_discovery_control
    df["qval"] = false_discovery_control(df["pval"], method="bh")
    df = df[df["qval"] < fdr_threshold].sort_values("effect_size", ascending=False)
    return df

Drug Sensitivity (PRISM)

DepMap also contains compound sensitivity data from the PRISM assay:

import pandas as pd

def load_prism_data(filepath="primary-screen-replicate-collapsed-logfold-change.csv"):
    """
    Load PRISM drug sensitivity data.
    Rows = cell lines, Columns = compounds (broad_id::name::dose)
    Values = log2 fold change (more negative = more sensitive)
    """
    return pd.read_csv(filepath, index_col=0)

# Available datasets:
# primary-screen: 4,518 compounds at single dose
# secondary-screen: ~8,000 compounds at multiple doses (AUC available)

5.6 KiB Raw Blame History Unescape Escape