mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-29 07:43:46 +08:00
Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
This commit is contained in:
395
scientific-skills/gnomad-database/SKILL.md
Normal file
395
scientific-skills/gnomad-database/SKILL.md
Normal file
@@ -0,0 +1,395 @@
|
||||
---
|
||||
name: gnomad-database
|
||||
description: Query gnomAD (Genome Aggregation Database) for population allele frequencies, variant constraint scores (pLI, LOEUF), and loss-of-function intolerance. Essential for variant pathogenicity interpretation, rare disease genetics, and identifying loss-of-function intolerant genes.
|
||||
license: CC0-1.0
|
||||
metadata:
|
||||
skill-author: Kuan-lin Huang
|
||||
---
|
||||
|
||||
# gnomAD Database
|
||||
|
||||
## Overview
|
||||
|
||||
The Genome Aggregation Database (gnomAD) is the largest publicly available collection of human genetic variation, aggregated from large-scale sequencing projects. gnomAD v4 contains exome sequences from 730,947 individuals and genome sequences from 76,215 individuals across diverse ancestries. It provides population allele frequencies, variant consequence annotations, and gene-level constraint metrics that are essential for interpreting the clinical significance of genetic variants.
|
||||
|
||||
**Key resources:**
|
||||
- gnomAD browser: https://gnomad.broadinstitute.org/
|
||||
- GraphQL API: https://gnomad.broadinstitute.org/api
|
||||
- Data downloads: https://gnomad.broadinstitute.org/downloads
|
||||
- Documentation: https://gnomad.broadinstitute.org/help
|
||||
|
||||
## When to Use This Skill
|
||||
|
||||
Use gnomAD when:
|
||||
|
||||
- **Variant frequency lookup**: Checking if a variant is rare, common, or absent in the general population
|
||||
- **Pathogenicity assessment**: Rare variants (MAF < 1%) are candidates for disease causation; gnomAD helps filter benign common variants
|
||||
- **Loss-of-function intolerance**: Using pLI and LOEUF scores to assess whether a gene tolerates protein-truncating variants
|
||||
- **Population-stratified frequencies**: Comparing allele frequencies across ancestries (African/African American, Admixed American, Ashkenazi Jewish, East Asian, Finnish, Middle Eastern, Non-Finnish European, South Asian)
|
||||
- **ClinVar/ACMG variant classification**: gnomAD frequency data feeds into BA1/BS1 evidence codes for variant classification
|
||||
- **Constraint analysis**: Identifying genes depleted of missense or loss-of-function variation (z-scores, pLI, LOEUF)
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
### 1. gnomAD GraphQL API
|
||||
|
||||
gnomAD uses a GraphQL API accessible at `https://gnomad.broadinstitute.org/api`. Most queries fetch variants by gene or specific genomic position.
|
||||
|
||||
**Datasets available:**
|
||||
- `gnomad_r4` — gnomAD v4 exomes (recommended default, GRCh38)
|
||||
- `gnomad_r4_genomes` — gnomAD v4 genomes (GRCh38)
|
||||
- `gnomad_r3` — gnomAD v3 genomes (GRCh38)
|
||||
- `gnomad_r2_1` — gnomAD v2 exomes (GRCh37)
|
||||
|
||||
**Reference genomes:**
|
||||
- `GRCh38` — default for v3/v4
|
||||
- `GRCh37` — for v2
|
||||
|
||||
### 2. Querying Variants by Gene
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def query_gnomad_gene(gene_symbol, dataset="gnomad_r4", reference_genome="GRCh38"):
|
||||
"""Fetch variants in a gene from gnomAD."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
|
||||
query = """
|
||||
query GeneVariants($gene_symbol: String!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
gene_id
|
||||
gene_symbol
|
||||
variants(dataset: $dataset) {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
genome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations {
|
||||
id
|
||||
ac
|
||||
an
|
||||
af
|
||||
}
|
||||
}
|
||||
exome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
}
|
||||
lof
|
||||
lof_flags
|
||||
lof_filter
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
variables = {
|
||||
"gene_symbol": gene_symbol,
|
||||
"dataset": dataset,
|
||||
"reference_genome": reference_genome
|
||||
}
|
||||
|
||||
response = requests.post(url, json={"query": query, "variables": variables})
|
||||
return response.json()
|
||||
|
||||
# Example
|
||||
result = query_gnomad_gene("BRCA1")
|
||||
gene_data = result["data"]["gene"]
|
||||
variants = gene_data["variants"]
|
||||
|
||||
# Filter to rare PTVs
|
||||
rare_ptvs = [
|
||||
v for v in variants
|
||||
if v.get("lof") == "LC" or v.get("consequence") in ["stop_gained", "frameshift_variant"]
|
||||
and v.get("genome", {}).get("af", 1) < 0.001
|
||||
]
|
||||
print(f"Found {len(rare_ptvs)} rare PTVs in {gene_data['gene_symbol']}")
|
||||
```
|
||||
|
||||
### 3. Querying a Specific Variant
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def query_gnomad_variant(variant_id, dataset="gnomad_r4"):
|
||||
"""Fetch details for a specific variant (e.g., '1-55516888-G-GA')."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
|
||||
query = """
|
||||
query VariantDetails($variantId: String!, $dataset: DatasetId!) {
|
||||
variant(variantId: $variantId, dataset: $dataset) {
|
||||
variant_id
|
||||
chrom
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
genome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations {
|
||||
id
|
||||
ac
|
||||
an
|
||||
af
|
||||
}
|
||||
}
|
||||
exome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations {
|
||||
id
|
||||
ac
|
||||
an
|
||||
af
|
||||
}
|
||||
}
|
||||
consequence
|
||||
lof
|
||||
rsids
|
||||
in_silico_predictors {
|
||||
id
|
||||
value
|
||||
flags
|
||||
}
|
||||
clinvar_variation_id
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
response = requests.post(
|
||||
url,
|
||||
json={"query": query, "variables": {"variantId": variant_id, "dataset": dataset}}
|
||||
)
|
||||
return response.json()
|
||||
|
||||
# Example: query a specific variant
|
||||
result = query_gnomad_variant("17-43094692-G-A") # BRCA1 missense
|
||||
variant = result["data"]["variant"]
|
||||
|
||||
if variant:
|
||||
genome_af = variant.get("genome", {}).get("af", "N/A")
|
||||
exome_af = variant.get("exome", {}).get("af", "N/A")
|
||||
print(f"Variant: {variant['variant_id']}")
|
||||
print(f" Consequence: {variant['consequence']}")
|
||||
print(f" Genome AF: {genome_af}")
|
||||
print(f" Exome AF: {exome_af}")
|
||||
print(f" LoF: {variant.get('lof')}")
|
||||
```
|
||||
|
||||
### 4. Gene Constraint Scores
|
||||
|
||||
gnomAD constraint scores assess how tolerant a gene is to variation relative to expectation:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def query_gnomad_constraint(gene_symbol, reference_genome="GRCh38"):
|
||||
"""Fetch constraint scores for a gene."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
|
||||
query = """
|
||||
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
gene_id
|
||||
gene_symbol
|
||||
gnomad_constraint {
|
||||
exp_lof
|
||||
exp_mis
|
||||
exp_syn
|
||||
obs_lof
|
||||
obs_mis
|
||||
obs_syn
|
||||
oe_lof
|
||||
oe_mis
|
||||
oe_syn
|
||||
oe_lof_lower
|
||||
oe_lof_upper
|
||||
lof_z
|
||||
mis_z
|
||||
syn_z
|
||||
pLI
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
response = requests.post(
|
||||
url,
|
||||
json={"query": query, "variables": {"gene_symbol": gene_symbol, "reference_genome": reference_genome}}
|
||||
)
|
||||
return response.json()
|
||||
|
||||
# Example
|
||||
result = query_gnomad_constraint("KCNQ2")
|
||||
gene = result["data"]["gene"]
|
||||
constraint = gene["gnomad_constraint"]
|
||||
|
||||
print(f"Gene: {gene['gene_symbol']}")
|
||||
print(f" pLI: {constraint['pLI']:.3f} (>0.9 = LoF intolerant)")
|
||||
print(f" LOEUF: {constraint['oe_lof_upper']:.3f} (<0.35 = highly constrained)")
|
||||
print(f" Obs/Exp LoF: {constraint['oe_lof']:.3f}")
|
||||
print(f" Missense Z: {constraint['mis_z']:.3f}")
|
||||
```
|
||||
|
||||
**Constraint score interpretation:**
|
||||
| Score | Range | Meaning |
|
||||
|-------|-------|---------|
|
||||
| `pLI` | 0–1 | Probability of LoF intolerance; >0.9 = highly intolerant |
|
||||
| `LOEUF` | 0–∞ | LoF observed/expected upper bound; <0.35 = constrained |
|
||||
| `oe_lof` | 0–∞ | Observed/expected ratio for LoF variants |
|
||||
| `mis_z` | −∞ to ∞ | Missense constraint z-score; >3.09 = constrained |
|
||||
| `syn_z` | −∞ to ∞ | Synonymous z-score (control; should be near 0) |
|
||||
|
||||
### 5. Population Frequency Analysis
|
||||
|
||||
```python
|
||||
import requests
|
||||
import pandas as pd
|
||||
|
||||
def get_population_frequencies(variant_id, dataset="gnomad_r4"):
|
||||
"""Extract per-population allele frequencies for a variant."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
|
||||
query = """
|
||||
query PopFreqs($variantId: String!, $dataset: DatasetId!) {
|
||||
variant(variantId: $variantId, dataset: $dataset) {
|
||||
variant_id
|
||||
genome {
|
||||
populations {
|
||||
id
|
||||
ac
|
||||
an
|
||||
af
|
||||
ac_hom
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
response = requests.post(
|
||||
url,
|
||||
json={"query": query, "variables": {"variantId": variant_id, "dataset": dataset}}
|
||||
)
|
||||
data = response.json()
|
||||
populations = data["data"]["variant"]["genome"]["populations"]
|
||||
|
||||
df = pd.DataFrame(populations)
|
||||
df = df[df["an"] > 0].copy()
|
||||
df["af"] = df["ac"] / df["an"]
|
||||
df = df.sort_values("af", ascending=False)
|
||||
return df
|
||||
|
||||
# Population IDs in gnomAD v4:
|
||||
# afr = African/African American
|
||||
# ami = Amish
|
||||
# amr = Admixed American
|
||||
# asj = Ashkenazi Jewish
|
||||
# eas = East Asian
|
||||
# fin = Finnish
|
||||
# mid = Middle Eastern
|
||||
# nfe = Non-Finnish European
|
||||
# sas = South Asian
|
||||
# remaining = Other
|
||||
```
|
||||
|
||||
### 6. Structural Variants (gnomAD-SV)
|
||||
|
||||
gnomAD also contains a structural variant dataset:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
def query_gnomad_sv(gene_symbol):
|
||||
"""Query structural variants overlapping a gene."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
|
||||
query = """
|
||||
query SVsByGene($gene_symbol: String!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: GRCh38) {
|
||||
structural_variants {
|
||||
variant_id
|
||||
type
|
||||
chrom
|
||||
pos
|
||||
end
|
||||
af
|
||||
ac
|
||||
an
|
||||
}
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
response = requests.post(url, json={"query": query, "variables": {"gene_symbol": gene_symbol}})
|
||||
return response.json()
|
||||
```
|
||||
|
||||
## Query Workflows
|
||||
|
||||
### Workflow 1: Variant Pathogenicity Assessment
|
||||
|
||||
1. **Check population frequency** — Is the variant rare enough to be pathogenic?
|
||||
- Use gnomAD AF < 1% for recessive, < 0.1% for dominant conditions
|
||||
- Check ancestry-specific frequencies (a variant rare overall may be common in one population)
|
||||
|
||||
2. **Assess functional impact** — LoF variants have highest prior probability
|
||||
- Check `lof` field: `HC` = high-confidence LoF, `LC` = low-confidence
|
||||
- Check `lof_flags` for issues like "NAGNAG_SITE", "PHYLOCSF_WEAK"
|
||||
|
||||
3. **Apply ACMG criteria:**
|
||||
- BA1: AF > 5% → Benign Stand-Alone
|
||||
- BS1: AF > disease prevalence threshold → Benign Supporting
|
||||
- PM2: Absent or very rare in gnomAD → Pathogenic Moderate
|
||||
|
||||
### Workflow 2: Gene Prioritization in Rare Disease
|
||||
|
||||
1. Query constraint scores for candidate genes
|
||||
2. Filter for pLI > 0.9 (haploinsufficient) or LOEUF < 0.35
|
||||
3. Cross-reference with observed LoF variants in the gene
|
||||
4. Integrate with ClinVar and disease databases
|
||||
|
||||
### Workflow 3: Population Genetics Research
|
||||
|
||||
1. Identify variant of interest from GWAS or clinical data
|
||||
2. Query per-population frequencies
|
||||
3. Compare frequency differences across ancestries
|
||||
4. Test for enrichment in specific founder populations
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Use gnomAD v4 (gnomad_r4)** for the most current data; use v2 (gnomad_r2_1) only for GRCh37 compatibility
|
||||
- **Handle null responses**: Variants not observed in gnomAD are not necessarily pathogenic — absence is informative
|
||||
- **Distinguish exome vs. genome data**: Genome data has more uniform coverage; exome data is larger but may have coverage gaps
|
||||
- **Rate limit GraphQL queries**: Add delays between requests; batch queries when possible
|
||||
- **Homozygous counts** (`ac_hom`) are relevant for recessive disease analysis
|
||||
- **LOEUF is preferred over pLI** for gene constraint (less sensitive to sample size)
|
||||
|
||||
## Data Access
|
||||
|
||||
- **Browser**: https://gnomad.broadinstitute.org/ — interactive variant and gene browsing
|
||||
- **GraphQL API**: https://gnomad.broadinstitute.org/api — programmatic access
|
||||
- **Downloads**: https://gnomad.broadinstitute.org/downloads — VCF, Hail tables, constraint tables
|
||||
- **Google Cloud**: gs://gcp-public-data--gnomad/
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **gnomAD website**: https://gnomad.broadinstitute.org/
|
||||
- **gnomAD blog**: https://gnomad.broadinstitute.org/news
|
||||
- **Downloads**: https://gnomad.broadinstitute.org/downloads
|
||||
- **API explorer**: https://gnomad.broadinstitute.org/api (interactive GraphiQL)
|
||||
- **Constraint documentation**: https://gnomad.broadinstitute.org/help/constraint
|
||||
- **Citation**: Karczewski KJ et al. (2020) Nature. PMID: 32461654; Chen S et al. (2024) Nature. PMID: 38conservation
|
||||
- **GitHub**: https://github.com/broadinstitute/gnomad-browser
|
||||
219
scientific-skills/gnomad-database/references/graphql_queries.md
Normal file
219
scientific-skills/gnomad-database/references/graphql_queries.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# gnomAD GraphQL Query Reference
|
||||
|
||||
## API Endpoint
|
||||
|
||||
```
|
||||
POST https://gnomad.broadinstitute.org/api
|
||||
Content-Type: application/json
|
||||
|
||||
Body: { "query": "<graphql_query>", "variables": { ... } }
|
||||
```
|
||||
|
||||
## Dataset Identifiers
|
||||
|
||||
| ID | Description | Reference Genome |
|
||||
|----|-------------|-----------------|
|
||||
| `gnomad_r4` | gnomAD v4 exomes (730K individuals) | GRCh38 |
|
||||
| `gnomad_r4_genomes` | gnomAD v4 genomes (76K individuals) | GRCh38 |
|
||||
| `gnomad_r3` | gnomAD v3 genomes (76K individuals) | GRCh38 |
|
||||
| `gnomad_r2_1` | gnomAD v2 exomes (125K individuals) | GRCh37 |
|
||||
| `gnomad_r2_1_non_cancer` | v2 non-cancer subset | GRCh37 |
|
||||
| `gnomad_cnv_r4` | Copy number variants | GRCh38 |
|
||||
|
||||
## Core Query Templates
|
||||
|
||||
### 1. Variants in a Gene
|
||||
|
||||
```graphql
|
||||
query GeneVariants($gene_symbol: String!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
gene_id
|
||||
gene_symbol
|
||||
chrom
|
||||
start
|
||||
stop
|
||||
variants(dataset: $dataset) {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
lof
|
||||
lof_flags
|
||||
lof_filter
|
||||
genome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations { id ac an af ac_hom }
|
||||
}
|
||||
exome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations { id ac an af ac_hom }
|
||||
}
|
||||
rsids
|
||||
clinvar_variation_id
|
||||
in_silico_predictors { id value flags }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Single Variant Lookup
|
||||
|
||||
```graphql
|
||||
query VariantDetails($variantId: String!, $dataset: DatasetId!) {
|
||||
variant(variantId: $variantId, dataset: $dataset) {
|
||||
variant_id
|
||||
chrom
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
lof
|
||||
lof_flags
|
||||
rsids
|
||||
genome { af ac an ac_hom populations { id ac an af } }
|
||||
exome { af ac an ac_hom populations { id ac an af } }
|
||||
in_silico_predictors { id value flags }
|
||||
clinvar_variation_id
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variant ID format:** `{chrom}-{pos}-{ref}-{alt}` (e.g., `17-43094692-G-A`)
|
||||
|
||||
### 3. Gene Constraint
|
||||
|
||||
```graphql
|
||||
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
gene_id
|
||||
gene_symbol
|
||||
gnomad_constraint {
|
||||
exp_lof exp_mis exp_syn
|
||||
obs_lof obs_mis obs_syn
|
||||
oe_lof oe_mis oe_syn
|
||||
oe_lof_lower oe_lof_upper
|
||||
oe_mis_lower oe_mis_upper
|
||||
lof_z mis_z syn_z
|
||||
pLI
|
||||
flags
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Region Query (by genomic position)
|
||||
|
||||
```graphql
|
||||
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
|
||||
region(chrom: $chrom, start: $start, stop: $stop, reference_genome: $reference_genome) {
|
||||
variants(dataset: $dataset) {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
genome { af ac an }
|
||||
exome { af ac an }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. ClinVar Variants in Gene
|
||||
|
||||
```graphql
|
||||
query ClinVarVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
clinvar_variants {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
clinical_significance
|
||||
clinvar_variation_id
|
||||
gold_stars
|
||||
major_consequence
|
||||
in_gnomad
|
||||
gnomad_exomes { ac an af }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Population IDs
|
||||
|
||||
| ID | Population |
|
||||
|----|-----------|
|
||||
| `afr` | African/African American |
|
||||
| `ami` | Amish |
|
||||
| `amr` | Admixed American |
|
||||
| `asj` | Ashkenazi Jewish |
|
||||
| `eas` | East Asian |
|
||||
| `fin` | Finnish |
|
||||
| `mid` | Middle Eastern |
|
||||
| `nfe` | Non-Finnish European |
|
||||
| `sas` | South Asian |
|
||||
| `remaining` | Other/Unassigned |
|
||||
| `XX` | Female (appended to above, e.g., `afr_XX`) |
|
||||
| `XY` | Male |
|
||||
|
||||
## LoF Annotation Fields
|
||||
|
||||
| Field | Values | Meaning |
|
||||
|-------|--------|---------|
|
||||
| `lof` | `HC`, `LC`, `null` | High/low-confidence LoF, or not annotated as LoF |
|
||||
| `lof_flags` | comma-separated strings | Quality flags (e.g., `NAGNAG_SITE`, `NON_CANONICAL_SPLICE_SITE`) |
|
||||
| `lof_filter` | string or null | Reason for LC classification |
|
||||
|
||||
## In Silico Predictor IDs
|
||||
|
||||
Common values for `in_silico_predictors[].id`:
|
||||
- `cadd` — CADD PHRED score
|
||||
- `revel` — REVEL score
|
||||
- `spliceai_ds_max` — SpliceAI max delta score
|
||||
- `pangolin_largest_ds` — Pangolin splicing score
|
||||
- `polyphen` — PolyPhen-2 prediction
|
||||
- `sift` — SIFT prediction
|
||||
|
||||
## Python Helper
|
||||
|
||||
```python
|
||||
import requests
|
||||
import time
|
||||
|
||||
def gnomad_query(query: str, variables: dict, retries: int = 3) -> dict:
|
||||
"""Execute a gnomAD GraphQL query with retry logic."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
headers = {"Content-Type": "application/json"}
|
||||
|
||||
for attempt in range(retries):
|
||||
try:
|
||||
response = requests.post(
|
||||
url,
|
||||
json={"query": query, "variables": variables},
|
||||
headers=headers,
|
||||
timeout=60
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
if "errors" in result:
|
||||
print(f"GraphQL errors: {result['errors']}")
|
||||
return result
|
||||
|
||||
return result
|
||||
except requests.exceptions.RequestException as e:
|
||||
if attempt < retries - 1:
|
||||
time.sleep(2 ** attempt) # exponential backoff
|
||||
else:
|
||||
raise
|
||||
|
||||
return {}
|
||||
```
|
||||
@@ -0,0 +1,85 @@
|
||||
# gnomAD Variant Interpretation Guide
|
||||
|
||||
## Allele Frequency Thresholds for Disease Interpretation
|
||||
|
||||
### ACMG/AMP Criteria
|
||||
|
||||
| Criterion | AF threshold | Classification |
|
||||
|-----------|-------------|----------------|
|
||||
| BA1 | > 0.05 (5%) | Benign Stand-Alone |
|
||||
| BS1 | > disease prevalence | Benign Supporting |
|
||||
| PM2_Supporting | < 0.0001 (0.01%) for dominant; absent for recessive | Pathogenic Moderate → Supporting |
|
||||
|
||||
**Notes:**
|
||||
- BA1 applies to most conditions; exceptions include autosomal dominant with high penetrance (e.g., LDLR for FH: BA1 threshold is ~0.1%)
|
||||
- BS1 requires knowing disease prevalence; for rare diseases (1:10,000), BS1 if AF > 0.01%
|
||||
- Homozygous counts (`ac_hom`) matter for recessive diseases
|
||||
|
||||
### Practical Thresholds
|
||||
|
||||
| Inheritance | Suggested max AF |
|
||||
|-------------|-----------------|
|
||||
| Autosomal Dominant (high penetrance) | < 0.001 (0.1%) |
|
||||
| Autosomal Dominant (reduced penetrance) | < 0.01 (1%) |
|
||||
| Autosomal Recessive | < 0.01 (1%) |
|
||||
| X-linked recessive | < 0.001 in females |
|
||||
|
||||
## Absence in gnomAD
|
||||
|
||||
A variant **absent in gnomAD** (ac = 0) is evidence of rarity, but interpret carefully:
|
||||
- gnomAD does not capture all rare variants (sequencing depth, coverage, calling thresholds)
|
||||
- A variant absent in 730K exomes is very strong evidence of rarity for PM2
|
||||
- Check coverage at the position: if < 10x, absence is less informative
|
||||
|
||||
## Loss-of-Function Variant Assessment
|
||||
|
||||
### LOFTEE Classification (lof field)
|
||||
|
||||
- **HC (High Confidence):** Predicted to truncate functional protein
|
||||
- Stop-gained, splice site (±1,2), frameshift variants
|
||||
- Passes all LOFTEE quality filters
|
||||
|
||||
- **LC (Low Confidence):** LoF annotation with quality concerns
|
||||
- Check `lof_flags` for specific reason
|
||||
- May still be pathogenic — requires manual review
|
||||
|
||||
### Common lof_flags
|
||||
|
||||
| Flag | Meaning |
|
||||
|------|---------|
|
||||
| `NAGNAG_SITE` | Splice site may be rescued by nearby alternative site |
|
||||
| `NON_CANONICAL_SPLICE_SITE` | Not a canonical splice donor/acceptor |
|
||||
| `PHYLOCSF_WEAK` | Weak phylogenetic conservation signal |
|
||||
| `SMALL_INTRON` | Intron too small to affect splicing |
|
||||
| `SINGLE_EXON` | Single-exon gene (no splicing) |
|
||||
| `LAST_EXON` | In last exon (NMD may not apply) |
|
||||
|
||||
## Homozygous Observations
|
||||
|
||||
The `ac_hom` field counts homozygous (or hemizygous in males for chrX) observations.
|
||||
|
||||
**For recessive diseases:**
|
||||
- If a variant is observed homozygous in healthy individuals in gnomAD, it is strong evidence against pathogenicity (BS2 criterion)
|
||||
- Even a single homozygous observation can be informative
|
||||
|
||||
## Coverage at Position
|
||||
|
||||
Always check that gnomAD has adequate coverage at the variant position before concluding absence is meaningful. The gnomAD browser shows coverage tracks, and coverage data can be downloaded from:
|
||||
- https://gnomad.broadinstitute.org/downloads#v4-coverage
|
||||
|
||||
## In Silico Predictor Scores
|
||||
|
||||
| Predictor | Score Range | Pathogenic Threshold |
|
||||
|-----------|-------------|---------------------|
|
||||
| CADD PHRED | 0–99 | > 20 deleterious; > 30 highly deleterious |
|
||||
| REVEL | 0–1 | > 0.75 likely pathogenic (for missense) |
|
||||
| SpliceAI max_ds | 0–1 | > 0.5 likely splice-altering |
|
||||
| SIFT | 0–1 | < 0.05 deleterious |
|
||||
| PolyPhen-2 | 0–1 | > 0.909 probably damaging |
|
||||
|
||||
## Ancestry-Specific Considerations
|
||||
|
||||
- A variant rare overall may be a common founder variant in a specific population
|
||||
- Always check all ancestry-specific AFs, not just the total
|
||||
- Finnish and Ashkenazi Jewish populations have high rates of founder variants
|
||||
- Report ancestry-specific frequencies when relevant to patient ancestry
|
||||
Reference in New Issue
Block a user