mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-03-27 07:09:27 +08:00

Files

huangkuanlin 7f94783fab Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation

- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization.
- Added a script for running RNA velocity analysis with customizable parameters and output options.
- Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation.
- Included references for velocity models and their mathematical framework, along with a comparison of different models.
- Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.

2026-03-03 07:15:36 -05:00

5.3 KiB

Raw Blame History

IQ-TREE 2 Phylogenetic Inference Reference

Basic Command Syntax

iqtree2 -s alignment.fasta --prefix output -m TEST -B 1000 -T AUTO --redo

Key Parameters

Flag	Description	Default
`-s`	Input alignment file	Required
`--prefix`	Output file prefix	alignment name
`-m`	Substitution model (or TEST)	GTR+G
`-B`	Ultrafast bootstrap replicates	Off
`-b`	Standard bootstrap replicates (slow)	Off
`-T`	Number of threads (or AUTO)	1
`-o`	Outgroup taxa name(s)	None (unrooted)
`--redo`	Overwrite existing results	Off
`-alrt`	SH-aLRT test replicates	Off

Model Selection

# Full model testing (automatically selects best model)
iqtree2 -s alignment.fasta -m TEST --prefix test_run -B 1000 -T 4

# Specify model explicitly
iqtree2 -s alignment.fasta -m GTR+G4 --prefix gtr_run -B 1000

# Protein sequences
iqtree2 -s protein.fasta -m TEST --prefix prot_tree -B 1000

# Codon-based analysis
iqtree2 -s codon.fasta -m GY --prefix codon_tree -B 1000

Bootstrapping Methods

Ultrafast Bootstrap (UFBoot, recommended)

iqtree2 -s alignment.fasta -B 1000  # 1000 replicates
# Values ≥95 are reliable
# ~10× faster than standard bootstrap

Standard Bootstrap

iqtree2 -s alignment.fasta -b 100  # 100 replicates (very slow)

SH-aLRT Test (fast alternative)

iqtree2 -s alignment.fasta -alrt 1000 -B 1000  # Both SH-aLRT and UFBoot
# SH-aLRT ≥80 AND UFBoot ≥95 = well-supported branch

Branch Support Interpretation

Bootstrap Value	Interpretation
≥ 95	Well-supported (strongly supported)
70–94	Moderately supported
50–69	Weakly supported
< 50	Unreliable (not supported)

Output Files

File	Description
`{prefix}.treefile`	Best ML tree in Newick format
`{prefix}.iqtree`	Full analysis report
`{prefix}.log`	Computation log
`{prefix}.contree`	Consensus tree from bootstrap
`{prefix}.splits.nex`	Network splits
`{prefix}.bionj`	BioNJ starting tree
`{prefix}.model.gz`	Saved model parameters

Advanced Analyses

Molecular Clock (Dating)

# Temporal analysis with sampling dates
iqtree2 -s alignment.fasta -m GTR+G \
        --date dates.tsv \           # Tab-separated: taxon_name  YYYY-MM-DD
        --clock-test \               # Test for clock-like evolution
        --date-CI 95 \              # 95% CI for node dates
        --prefix dated_tree

Concordance Factors

# Gene concordance factor (gCF) - requires multiple gene alignments
iqtree2 --gcf gene_trees.nwk \
        --tree main_tree.treefile \
        --cf-verbose \
        --prefix cf_analysis

Ancestral Sequence Reconstruction

iqtree2 -s alignment.fasta -m LG+G4 \
        -asr \                      # Marginal ancestral state reconstruction
        --prefix anc_tree
# Output: {prefix}.state (ancestral sequences per node)

Partition Model (Multi-Gene)

# Create partition file (partitions.txt):
# DNA, gene1 = 1-500
# DNA, gene2 = 501-1000

iqtree2 -s concat_alignment.fasta \
        -p partitions.txt \
        -m TEST \
        -B 1000 \
        --prefix partition_tree

IQ-TREE Log Parsing

def parse_iqtree_log(log_file: str) -> dict:
    """Extract key results from IQ-TREE log file."""
    results = {}
    with open(log_file) as f:
        for line in f:
            if "Best-fit model" in line:
                results["best_model"] = line.split(":")[1].strip()
            elif "Log-likelihood of the tree:" in line:
                results["log_likelihood"] = float(line.split(":")[1].strip())
            elif "Number of free parameters" in line:
                results["free_params"] = int(line.split(":")[1].strip())
            elif "Akaike information criterion" in line:
                results["AIC"] = float(line.split(":")[1].strip())
            elif "Bayesian information criterion" in line:
                results["BIC"] = float(line.split(":")[1].strip())
            elif "Total CPU time used" in line:
                results["cpu_time"] = line.split(":")[1].strip()
    return results

# Example:
# results = parse_iqtree_log("output.log")
# print(f"Best model: {results['best_model']}")
# print(f"Log-likelihood: {results['log_likelihood']:.2f}")

Common Issues and Solutions

Issue	Likely Cause	Solution
All bootstrap values = 0	Too few taxa	Need ≥4 taxa for bootstrap
Very long branches	Alignment artifacts	Re-trim alignment; check for outliers
Memory error	Too many sequences	Use FastTree; or reduce `-T` to 1
Poor model fit	Wrong alphabet	Check nucleotide vs. protein specification
Identical sequences	Duplicate sequences	Remove duplicates before alignment

MAFFT Alignment Guide

# Accurate (< 200 sequences)
mafft --localpair --maxiterate 1000 input.fasta > aligned.fasta

# Medium (200-1000 sequences)
mafft --auto input.fasta > aligned.fasta

# Fast (> 1000 sequences)
mafft --fftns input.fasta > aligned.fasta

# Very large (> 10000 sequences)
mafft --retree 1 input.fasta > aligned.fasta

# Using multiple threads
mafft --thread 8 --auto input.fasta > aligned.fasta

5.3 KiB Raw Blame History Unescape Escape

IQ-TREE 2 Phylogenetic Inference Reference

Basic Command Syntax

Key Parameters

Model Selection

Bootstrapping Methods

Ultrafast Bootstrap (UFBoot, recommended)

Standard Bootstrap

SH-aLRT Test (fast alternative)

Branch Support Interpretation

Output Files

Advanced Analyses

Molecular Clock (Dating)

Concordance Factors

Ancestral Sequence Reconstruction

Partition Model (Multi-Gene)

IQ-TREE Log Parsing

Common Issues and Solutions

MAFFT Alignment Guide

5.3 KiB

Raw Blame History