Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation

- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
2026-03-27 07:09:27 +08:00 · 2026-03-03 07:15:36 -05:00
parent b271271df4
commit 7f94783fab
27 changed files with 6961 additions and 0 deletions
--- a/scientific-skills/phylogenetics/references/iqtree_inference.md
+++ b/scientific-skills/phylogenetics/references/iqtree_inference.md
@@ -0,0 +1,181 @@
+# IQ-TREE 2 Phylogenetic Inference Reference
+
+## Basic Command Syntax
+
+```bash
+iqtree2 -s alignment.fasta --prefix output -m TEST -B 1000 -T AUTO --redo
+```
+
+## Key Parameters
+
+| Flag | Description | Default |
+|------|-------------|---------|
+| `-s` | Input alignment file | Required |
+| `--prefix` | Output file prefix | alignment name |
+| `-m` | Substitution model (or TEST) | GTR+G |
+| `-B` | Ultrafast bootstrap replicates | Off |
+| `-b` | Standard bootstrap replicates (slow) | Off |
+| `-T` | Number of threads (or AUTO) | 1 |
+| `-o` | Outgroup taxa name(s) | None (unrooted) |
+| `--redo` | Overwrite existing results | Off |
+| `-alrt` | SH-aLRT test replicates | Off |
+
+## Model Selection
+
+```bash
+# Full model testing (automatically selects best model)
+iqtree2 -s alignment.fasta -m TEST --prefix test_run -B 1000 -T 4
+
+# Specify model explicitly
+iqtree2 -s alignment.fasta -m GTR+G4 --prefix gtr_run -B 1000
+
+# Protein sequences
+iqtree2 -s protein.fasta -m TEST --prefix prot_tree -B 1000
+
+# Codon-based analysis
+iqtree2 -s codon.fasta -m GY --prefix codon_tree -B 1000
+```
+
+## Bootstrapping Methods
+
+### Ultrafast Bootstrap (UFBoot, recommended)
+```bash
+iqtree2 -s alignment.fasta -B 1000  # 1000 replicates
+# Values ≥95 are reliable
+# ~10× faster than standard bootstrap
+```
+
+### Standard Bootstrap
+```bash
+iqtree2 -s alignment.fasta -b 100  # 100 replicates (very slow)
+```
+
+### SH-aLRT Test (fast alternative)
+```bash
+iqtree2 -s alignment.fasta -alrt 1000 -B 1000  # Both SH-aLRT and UFBoot
+# SH-aLRT ≥80 AND UFBoot ≥95 = well-supported branch
+```
+
+## Branch Support Interpretation
+
+| Bootstrap Value | Interpretation |
+|----------------|----------------|
+| ≥ 95 | Well-supported (strongly supported) |
+| 70–94 | Moderately supported |
+| 50–69 | Weakly supported |
+| < 50 | Unreliable (not supported) |
+
+## Output Files
+
+| File | Description |
+|------|-------------|
+| `{prefix}.treefile` | Best ML tree in Newick format |
+| `{prefix}.iqtree` | Full analysis report |
+| `{prefix}.log` | Computation log |
+| `{prefix}.contree` | Consensus tree from bootstrap |
+| `{prefix}.splits.nex` | Network splits |
+| `{prefix}.bionj` | BioNJ starting tree |
+| `{prefix}.model.gz` | Saved model parameters |
+
+## Advanced Analyses
+
+### Molecular Clock (Dating)
+
+```bash
+# Temporal analysis with sampling dates
+iqtree2 -s alignment.fasta -m GTR+G \
+        --date dates.tsv \           # Tab-separated: taxon_name  YYYY-MM-DD
+        --clock-test \               # Test for clock-like evolution
+        --date-CI 95 \              # 95% CI for node dates
+        --prefix dated_tree
+```
+
+### Concordance Factors
+
+```bash
+# Gene concordance factor (gCF) - requires multiple gene alignments
+iqtree2 --gcf gene_trees.nwk \
+        --tree main_tree.treefile \
+        --cf-verbose \
+        --prefix cf_analysis
+```
+
+### Ancestral Sequence Reconstruction
+
+```bash
+iqtree2 -s alignment.fasta -m LG+G4 \
+        -asr \                      # Marginal ancestral state reconstruction
+        --prefix anc_tree
+# Output: {prefix}.state (ancestral sequences per node)
+```
+
+### Partition Model (Multi-Gene)
+
+```bash
+# Create partition file (partitions.txt):
+# DNA, gene1 = 1-500
+# DNA, gene2 = 501-1000
+
+iqtree2 -s concat_alignment.fasta \
+        -p partitions.txt \
+        -m TEST \
+        -B 1000 \
+        --prefix partition_tree
+```
+
+## IQ-TREE Log Parsing
+
+```python
+def parse_iqtree_log(log_file: str) -> dict:
+    """Extract key results from IQ-TREE log file."""
+    results = {}
+    with open(log_file) as f:
+        for line in f:
+            if "Best-fit model" in line:
+                results["best_model"] = line.split(":")[1].strip()
+            elif "Log-likelihood of the tree:" in line:
+                results["log_likelihood"] = float(line.split(":")[1].strip())
+            elif "Number of free parameters" in line:
+                results["free_params"] = int(line.split(":")[1].strip())
+            elif "Akaike information criterion" in line:
+                results["AIC"] = float(line.split(":")[1].strip())
+            elif "Bayesian information criterion" in line:
+                results["BIC"] = float(line.split(":")[1].strip())
+            elif "Total CPU time used" in line:
+                results["cpu_time"] = line.split(":")[1].strip()
+    return results
+
+# Example:
+# results = parse_iqtree_log("output.log")
+# print(f"Best model: {results['best_model']}")
+# print(f"Log-likelihood: {results['log_likelihood']:.2f}")
+```
+
+## Common Issues and Solutions
+
+| Issue | Likely Cause | Solution |
+|-------|-------------|---------|
+| All bootstrap values = 0 | Too few taxa | Need ≥4 taxa for bootstrap |
+| Very long branches | Alignment artifacts | Re-trim alignment; check for outliers |
+| Memory error | Too many sequences | Use FastTree; or reduce `-T` to 1 |
+| Poor model fit | Wrong alphabet | Check nucleotide vs. protein specification |
+| Identical sequences | Duplicate sequences | Remove duplicates before alignment |
+
+## MAFFT Alignment Guide
+
+```bash
+# Accurate (< 200 sequences)
+mafft --localpair --maxiterate 1000 input.fasta > aligned.fasta
+
+# Medium (200-1000 sequences)
+mafft --auto input.fasta > aligned.fasta
+
+# Fast (> 1000 sequences)
+mafft --fftns input.fasta > aligned.fasta
+
+# Very large (> 10000 sequences)
+mafft --retree 1 input.fasta > aligned.fasta
+
+# Using multiple threads
+mafft --thread 8 --auto input.fasta > aligned.fasta
+```