mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-29 07:43:46 +08:00
Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
This commit is contained in:
219
scientific-skills/gnomad-database/references/graphql_queries.md
Normal file
219
scientific-skills/gnomad-database/references/graphql_queries.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# gnomAD GraphQL Query Reference
|
||||
|
||||
## API Endpoint
|
||||
|
||||
```
|
||||
POST https://gnomad.broadinstitute.org/api
|
||||
Content-Type: application/json
|
||||
|
||||
Body: { "query": "<graphql_query>", "variables": { ... } }
|
||||
```
|
||||
|
||||
## Dataset Identifiers
|
||||
|
||||
| ID | Description | Reference Genome |
|
||||
|----|-------------|-----------------|
|
||||
| `gnomad_r4` | gnomAD v4 exomes (730K individuals) | GRCh38 |
|
||||
| `gnomad_r4_genomes` | gnomAD v4 genomes (76K individuals) | GRCh38 |
|
||||
| `gnomad_r3` | gnomAD v3 genomes (76K individuals) | GRCh38 |
|
||||
| `gnomad_r2_1` | gnomAD v2 exomes (125K individuals) | GRCh37 |
|
||||
| `gnomad_r2_1_non_cancer` | v2 non-cancer subset | GRCh37 |
|
||||
| `gnomad_cnv_r4` | Copy number variants | GRCh38 |
|
||||
|
||||
## Core Query Templates
|
||||
|
||||
### 1. Variants in a Gene
|
||||
|
||||
```graphql
|
||||
query GeneVariants($gene_symbol: String!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
gene_id
|
||||
gene_symbol
|
||||
chrom
|
||||
start
|
||||
stop
|
||||
variants(dataset: $dataset) {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
lof
|
||||
lof_flags
|
||||
lof_filter
|
||||
genome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations { id ac an af ac_hom }
|
||||
}
|
||||
exome {
|
||||
af
|
||||
ac
|
||||
an
|
||||
ac_hom
|
||||
populations { id ac an af ac_hom }
|
||||
}
|
||||
rsids
|
||||
clinvar_variation_id
|
||||
in_silico_predictors { id value flags }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Single Variant Lookup
|
||||
|
||||
```graphql
|
||||
query VariantDetails($variantId: String!, $dataset: DatasetId!) {
|
||||
variant(variantId: $variantId, dataset: $dataset) {
|
||||
variant_id
|
||||
chrom
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
lof
|
||||
lof_flags
|
||||
rsids
|
||||
genome { af ac an ac_hom populations { id ac an af } }
|
||||
exome { af ac an ac_hom populations { id ac an af } }
|
||||
in_silico_predictors { id value flags }
|
||||
clinvar_variation_id
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Variant ID format:** `{chrom}-{pos}-{ref}-{alt}` (e.g., `17-43094692-G-A`)
|
||||
|
||||
### 3. Gene Constraint
|
||||
|
||||
```graphql
|
||||
query GeneConstraint($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
gene_id
|
||||
gene_symbol
|
||||
gnomad_constraint {
|
||||
exp_lof exp_mis exp_syn
|
||||
obs_lof obs_mis obs_syn
|
||||
oe_lof oe_mis oe_syn
|
||||
oe_lof_lower oe_lof_upper
|
||||
oe_mis_lower oe_mis_upper
|
||||
lof_z mis_z syn_z
|
||||
pLI
|
||||
flags
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Region Query (by genomic position)
|
||||
|
||||
```graphql
|
||||
query RegionVariants($chrom: String!, $start: Int!, $stop: Int!, $dataset: DatasetId!, $reference_genome: ReferenceGenomeId!) {
|
||||
region(chrom: $chrom, start: $start, stop: $stop, reference_genome: $reference_genome) {
|
||||
variants(dataset: $dataset) {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
consequence
|
||||
genome { af ac an }
|
||||
exome { af ac an }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. ClinVar Variants in Gene
|
||||
|
||||
```graphql
|
||||
query ClinVarVariants($gene_symbol: String!, $reference_genome: ReferenceGenomeId!) {
|
||||
gene(gene_symbol: $gene_symbol, reference_genome: $reference_genome) {
|
||||
clinvar_variants {
|
||||
variant_id
|
||||
pos
|
||||
ref
|
||||
alt
|
||||
clinical_significance
|
||||
clinvar_variation_id
|
||||
gold_stars
|
||||
major_consequence
|
||||
in_gnomad
|
||||
gnomad_exomes { ac an af }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Population IDs
|
||||
|
||||
| ID | Population |
|
||||
|----|-----------|
|
||||
| `afr` | African/African American |
|
||||
| `ami` | Amish |
|
||||
| `amr` | Admixed American |
|
||||
| `asj` | Ashkenazi Jewish |
|
||||
| `eas` | East Asian |
|
||||
| `fin` | Finnish |
|
||||
| `mid` | Middle Eastern |
|
||||
| `nfe` | Non-Finnish European |
|
||||
| `sas` | South Asian |
|
||||
| `remaining` | Other/Unassigned |
|
||||
| `XX` | Female (appended to above, e.g., `afr_XX`) |
|
||||
| `XY` | Male |
|
||||
|
||||
## LoF Annotation Fields
|
||||
|
||||
| Field | Values | Meaning |
|
||||
|-------|--------|---------|
|
||||
| `lof` | `HC`, `LC`, `null` | High/low-confidence LoF, or not annotated as LoF |
|
||||
| `lof_flags` | comma-separated strings | Quality flags (e.g., `NAGNAG_SITE`, `NON_CANONICAL_SPLICE_SITE`) |
|
||||
| `lof_filter` | string or null | Reason for LC classification |
|
||||
|
||||
## In Silico Predictor IDs
|
||||
|
||||
Common values for `in_silico_predictors[].id`:
|
||||
- `cadd` — CADD PHRED score
|
||||
- `revel` — REVEL score
|
||||
- `spliceai_ds_max` — SpliceAI max delta score
|
||||
- `pangolin_largest_ds` — Pangolin splicing score
|
||||
- `polyphen` — PolyPhen-2 prediction
|
||||
- `sift` — SIFT prediction
|
||||
|
||||
## Python Helper
|
||||
|
||||
```python
|
||||
import requests
|
||||
import time
|
||||
|
||||
def gnomad_query(query: str, variables: dict, retries: int = 3) -> dict:
|
||||
"""Execute a gnomAD GraphQL query with retry logic."""
|
||||
url = "https://gnomad.broadinstitute.org/api"
|
||||
headers = {"Content-Type": "application/json"}
|
||||
|
||||
for attempt in range(retries):
|
||||
try:
|
||||
response = requests.post(
|
||||
url,
|
||||
json={"query": query, "variables": variables},
|
||||
headers=headers,
|
||||
timeout=60
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
|
||||
if "errors" in result:
|
||||
print(f"GraphQL errors: {result['errors']}")
|
||||
return result
|
||||
|
||||
return result
|
||||
except requests.exceptions.RequestException as e:
|
||||
if attempt < retries - 1:
|
||||
time.sleep(2 ** attempt) # exponential backoff
|
||||
else:
|
||||
raise
|
||||
|
||||
return {}
|
||||
```
|
||||
@@ -0,0 +1,85 @@
|
||||
# gnomAD Variant Interpretation Guide
|
||||
|
||||
## Allele Frequency Thresholds for Disease Interpretation
|
||||
|
||||
### ACMG/AMP Criteria
|
||||
|
||||
| Criterion | AF threshold | Classification |
|
||||
|-----------|-------------|----------------|
|
||||
| BA1 | > 0.05 (5%) | Benign Stand-Alone |
|
||||
| BS1 | > disease prevalence | Benign Supporting |
|
||||
| PM2_Supporting | < 0.0001 (0.01%) for dominant; absent for recessive | Pathogenic Moderate → Supporting |
|
||||
|
||||
**Notes:**
|
||||
- BA1 applies to most conditions; exceptions include autosomal dominant with high penetrance (e.g., LDLR for FH: BA1 threshold is ~0.1%)
|
||||
- BS1 requires knowing disease prevalence; for rare diseases (1:10,000), BS1 if AF > 0.01%
|
||||
- Homozygous counts (`ac_hom`) matter for recessive diseases
|
||||
|
||||
### Practical Thresholds
|
||||
|
||||
| Inheritance | Suggested max AF |
|
||||
|-------------|-----------------|
|
||||
| Autosomal Dominant (high penetrance) | < 0.001 (0.1%) |
|
||||
| Autosomal Dominant (reduced penetrance) | < 0.01 (1%) |
|
||||
| Autosomal Recessive | < 0.01 (1%) |
|
||||
| X-linked recessive | < 0.001 in females |
|
||||
|
||||
## Absence in gnomAD
|
||||
|
||||
A variant **absent in gnomAD** (ac = 0) is evidence of rarity, but interpret carefully:
|
||||
- gnomAD does not capture all rare variants (sequencing depth, coverage, calling thresholds)
|
||||
- A variant absent in 730K exomes is very strong evidence of rarity for PM2
|
||||
- Check coverage at the position: if < 10x, absence is less informative
|
||||
|
||||
## Loss-of-Function Variant Assessment
|
||||
|
||||
### LOFTEE Classification (lof field)
|
||||
|
||||
- **HC (High Confidence):** Predicted to truncate functional protein
|
||||
- Stop-gained, splice site (±1,2), frameshift variants
|
||||
- Passes all LOFTEE quality filters
|
||||
|
||||
- **LC (Low Confidence):** LoF annotation with quality concerns
|
||||
- Check `lof_flags` for specific reason
|
||||
- May still be pathogenic — requires manual review
|
||||
|
||||
### Common lof_flags
|
||||
|
||||
| Flag | Meaning |
|
||||
|------|---------|
|
||||
| `NAGNAG_SITE` | Splice site may be rescued by nearby alternative site |
|
||||
| `NON_CANONICAL_SPLICE_SITE` | Not a canonical splice donor/acceptor |
|
||||
| `PHYLOCSF_WEAK` | Weak phylogenetic conservation signal |
|
||||
| `SMALL_INTRON` | Intron too small to affect splicing |
|
||||
| `SINGLE_EXON` | Single-exon gene (no splicing) |
|
||||
| `LAST_EXON` | In last exon (NMD may not apply) |
|
||||
|
||||
## Homozygous Observations
|
||||
|
||||
The `ac_hom` field counts homozygous (or hemizygous in males for chrX) observations.
|
||||
|
||||
**For recessive diseases:**
|
||||
- If a variant is observed homozygous in healthy individuals in gnomAD, it is strong evidence against pathogenicity (BS2 criterion)
|
||||
- Even a single homozygous observation can be informative
|
||||
|
||||
## Coverage at Position
|
||||
|
||||
Always check that gnomAD has adequate coverage at the variant position before concluding absence is meaningful. The gnomAD browser shows coverage tracks, and coverage data can be downloaded from:
|
||||
- https://gnomad.broadinstitute.org/downloads#v4-coverage
|
||||
|
||||
## In Silico Predictor Scores
|
||||
|
||||
| Predictor | Score Range | Pathogenic Threshold |
|
||||
|-----------|-------------|---------------------|
|
||||
| CADD PHRED | 0–99 | > 20 deleterious; > 30 highly deleterious |
|
||||
| REVEL | 0–1 | > 0.75 likely pathogenic (for missense) |
|
||||
| SpliceAI max_ds | 0–1 | > 0.5 likely splice-altering |
|
||||
| SIFT | 0–1 | < 0.05 deleterious |
|
||||
| PolyPhen-2 | 0–1 | > 0.909 probably damaging |
|
||||
|
||||
## Ancestry-Specific Considerations
|
||||
|
||||
- A variant rare overall may be a common founder variant in a specific population
|
||||
- Always check all ancestry-specific AFs, not just the total
|
||||
- Finnish and Ashkenazi Jewish populations have high rates of founder variants
|
||||
- Report ancestry-specific frequencies when relevant to patient ancestry
|
||||
Reference in New Issue
Block a user