mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
This commit is contained in:
181
scientific-skills/phylogenetics/references/iqtree_inference.md
Normal file
181
scientific-skills/phylogenetics/references/iqtree_inference.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# IQ-TREE 2 Phylogenetic Inference Reference
|
||||
|
||||
## Basic Command Syntax
|
||||
|
||||
```bash
|
||||
iqtree2 -s alignment.fasta --prefix output -m TEST -B 1000 -T AUTO --redo
|
||||
```
|
||||
|
||||
## Key Parameters
|
||||
|
||||
| Flag | Description | Default |
|
||||
|------|-------------|---------|
|
||||
| `-s` | Input alignment file | Required |
|
||||
| `--prefix` | Output file prefix | alignment name |
|
||||
| `-m` | Substitution model (or TEST) | GTR+G |
|
||||
| `-B` | Ultrafast bootstrap replicates | Off |
|
||||
| `-b` | Standard bootstrap replicates (slow) | Off |
|
||||
| `-T` | Number of threads (or AUTO) | 1 |
|
||||
| `-o` | Outgroup taxa name(s) | None (unrooted) |
|
||||
| `--redo` | Overwrite existing results | Off |
|
||||
| `-alrt` | SH-aLRT test replicates | Off |
|
||||
|
||||
## Model Selection
|
||||
|
||||
```bash
|
||||
# Full model testing (automatically selects best model)
|
||||
iqtree2 -s alignment.fasta -m TEST --prefix test_run -B 1000 -T 4
|
||||
|
||||
# Specify model explicitly
|
||||
iqtree2 -s alignment.fasta -m GTR+G4 --prefix gtr_run -B 1000
|
||||
|
||||
# Protein sequences
|
||||
iqtree2 -s protein.fasta -m TEST --prefix prot_tree -B 1000
|
||||
|
||||
# Codon-based analysis
|
||||
iqtree2 -s codon.fasta -m GY --prefix codon_tree -B 1000
|
||||
```
|
||||
|
||||
## Bootstrapping Methods
|
||||
|
||||
### Ultrafast Bootstrap (UFBoot, recommended)
|
||||
```bash
|
||||
iqtree2 -s alignment.fasta -B 1000 # 1000 replicates
|
||||
# Values ≥95 are reliable
|
||||
# ~10× faster than standard bootstrap
|
||||
```
|
||||
|
||||
### Standard Bootstrap
|
||||
```bash
|
||||
iqtree2 -s alignment.fasta -b 100 # 100 replicates (very slow)
|
||||
```
|
||||
|
||||
### SH-aLRT Test (fast alternative)
|
||||
```bash
|
||||
iqtree2 -s alignment.fasta -alrt 1000 -B 1000 # Both SH-aLRT and UFBoot
|
||||
# SH-aLRT ≥80 AND UFBoot ≥95 = well-supported branch
|
||||
```
|
||||
|
||||
## Branch Support Interpretation
|
||||
|
||||
| Bootstrap Value | Interpretation |
|
||||
|----------------|----------------|
|
||||
| ≥ 95 | Well-supported (strongly supported) |
|
||||
| 70–94 | Moderately supported |
|
||||
| 50–69 | Weakly supported |
|
||||
| < 50 | Unreliable (not supported) |
|
||||
|
||||
## Output Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `{prefix}.treefile` | Best ML tree in Newick format |
|
||||
| `{prefix}.iqtree` | Full analysis report |
|
||||
| `{prefix}.log` | Computation log |
|
||||
| `{prefix}.contree` | Consensus tree from bootstrap |
|
||||
| `{prefix}.splits.nex` | Network splits |
|
||||
| `{prefix}.bionj` | BioNJ starting tree |
|
||||
| `{prefix}.model.gz` | Saved model parameters |
|
||||
|
||||
## Advanced Analyses
|
||||
|
||||
### Molecular Clock (Dating)
|
||||
|
||||
```bash
|
||||
# Temporal analysis with sampling dates
|
||||
iqtree2 -s alignment.fasta -m GTR+G \
|
||||
--date dates.tsv \ # Tab-separated: taxon_name YYYY-MM-DD
|
||||
--clock-test \ # Test for clock-like evolution
|
||||
--date-CI 95 \ # 95% CI for node dates
|
||||
--prefix dated_tree
|
||||
```
|
||||
|
||||
### Concordance Factors
|
||||
|
||||
```bash
|
||||
# Gene concordance factor (gCF) - requires multiple gene alignments
|
||||
iqtree2 --gcf gene_trees.nwk \
|
||||
--tree main_tree.treefile \
|
||||
--cf-verbose \
|
||||
--prefix cf_analysis
|
||||
```
|
||||
|
||||
### Ancestral Sequence Reconstruction
|
||||
|
||||
```bash
|
||||
iqtree2 -s alignment.fasta -m LG+G4 \
|
||||
-asr \ # Marginal ancestral state reconstruction
|
||||
--prefix anc_tree
|
||||
# Output: {prefix}.state (ancestral sequences per node)
|
||||
```
|
||||
|
||||
### Partition Model (Multi-Gene)
|
||||
|
||||
```bash
|
||||
# Create partition file (partitions.txt):
|
||||
# DNA, gene1 = 1-500
|
||||
# DNA, gene2 = 501-1000
|
||||
|
||||
iqtree2 -s concat_alignment.fasta \
|
||||
-p partitions.txt \
|
||||
-m TEST \
|
||||
-B 1000 \
|
||||
--prefix partition_tree
|
||||
```
|
||||
|
||||
## IQ-TREE Log Parsing
|
||||
|
||||
```python
|
||||
def parse_iqtree_log(log_file: str) -> dict:
|
||||
"""Extract key results from IQ-TREE log file."""
|
||||
results = {}
|
||||
with open(log_file) as f:
|
||||
for line in f:
|
||||
if "Best-fit model" in line:
|
||||
results["best_model"] = line.split(":")[1].strip()
|
||||
elif "Log-likelihood of the tree:" in line:
|
||||
results["log_likelihood"] = float(line.split(":")[1].strip())
|
||||
elif "Number of free parameters" in line:
|
||||
results["free_params"] = int(line.split(":")[1].strip())
|
||||
elif "Akaike information criterion" in line:
|
||||
results["AIC"] = float(line.split(":")[1].strip())
|
||||
elif "Bayesian information criterion" in line:
|
||||
results["BIC"] = float(line.split(":")[1].strip())
|
||||
elif "Total CPU time used" in line:
|
||||
results["cpu_time"] = line.split(":")[1].strip()
|
||||
return results
|
||||
|
||||
# Example:
|
||||
# results = parse_iqtree_log("output.log")
|
||||
# print(f"Best model: {results['best_model']}")
|
||||
# print(f"Log-likelihood: {results['log_likelihood']:.2f}")
|
||||
```
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
| Issue | Likely Cause | Solution |
|
||||
|-------|-------------|---------|
|
||||
| All bootstrap values = 0 | Too few taxa | Need ≥4 taxa for bootstrap |
|
||||
| Very long branches | Alignment artifacts | Re-trim alignment; check for outliers |
|
||||
| Memory error | Too many sequences | Use FastTree; or reduce `-T` to 1 |
|
||||
| Poor model fit | Wrong alphabet | Check nucleotide vs. protein specification |
|
||||
| Identical sequences | Duplicate sequences | Remove duplicates before alignment |
|
||||
|
||||
## MAFFT Alignment Guide
|
||||
|
||||
```bash
|
||||
# Accurate (< 200 sequences)
|
||||
mafft --localpair --maxiterate 1000 input.fasta > aligned.fasta
|
||||
|
||||
# Medium (200-1000 sequences)
|
||||
mafft --auto input.fasta > aligned.fasta
|
||||
|
||||
# Fast (> 1000 sequences)
|
||||
mafft --fftns input.fasta > aligned.fasta
|
||||
|
||||
# Very large (> 10000 sequences)
|
||||
mafft --retree 1 input.fasta > aligned.fasta
|
||||
|
||||
# Using multiple threads
|
||||
mafft --thread 8 --auto input.fasta > aligned.fasta
|
||||
```
|
||||
Reference in New Issue
Block a user