Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation

- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
2026-03-27 07:09:27 +08:00 · 2026-03-03 07:15:36 -05:00
parent b271271df4
commit 7f94783fab
27 changed files with 6961 additions and 0 deletions
--- a/scientific-skills/scvelo/SKILL.md
+++ b/scientific-skills/scvelo/SKILL.md
@@ -0,0 +1,321 @@
+---
+name: scvelo
+description: RNA velocity analysis with scVelo. Estimate cell state transitions from unspliced/spliced mRNA dynamics, infer trajectory directions, compute latent time, and identify driver genes in single-cell RNA-seq data. Complements Scanpy/scVI-tools for trajectory inference.
+license: BSD-3-Clause
+metadata:
+    skill-author: Kuan-lin Huang
+---
+
+# scVelo — RNA Velocity Analysis
+
+## Overview
+
+scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
+
+**Installation:** `pip install scvelo`
+
+**Key resources:**
+- Documentation: https://scvelo.readthedocs.io/
+- GitHub: https://github.com/theislab/scvelo
+- Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759
+
+## When to Use This Skill
+
+Use scVelo when:
+
+- **Trajectory inference from snapshot data**: Determine which direction cells are differentiating
+- **Cell fate prediction**: Identify progenitor cells and their downstream fates
+- **Driver gene identification**: Find genes whose dynamics best explain observed trajectories
+- **Developmental biology**: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
+- **Latent time estimation**: Order cells along a pseudotime derived from splicing dynamics
+- **Complement to Scanpy**: Add directional information to UMAP embeddings
+
+## Prerequisites
+
+scVelo requires count matrices for both **unspliced** and **spliced** RNA. These are generated by:
+1. **STARsolo** or **kallisto|bustools** with `lamanno` mode
+2. **velocyto** CLI: `velocyto run10x` / `velocyto run`
+3. **alevin-fry** / **simpleaf** with spliced/unspliced output
+
+Data is stored in an `AnnData` object with `layers["spliced"]` and `layers["unspliced"]`.
+
+## Standard RNA Velocity Workflow
+
+### 1. Setup and Data Loading
+
+```python
+import scvelo as scv
+import scanpy as sc
+import numpy as np
+import matplotlib.pyplot as plt
+
+# Configure settings
+scv.settings.verbosity = 3       # Show computation steps
+scv.settings.presenter_view = True
+scv.settings.set_figure_params('scvelo')
+
+# Load data (AnnData with spliced/unspliced layers)
+# Option A: Load from loom (velocyto output)
+adata = scv.read("cellranger_output.loom", cache=True)
+
+# Option B: Merge velocyto loom with Scanpy-processed AnnData
+adata_processed = sc.read_h5ad("processed.h5ad")  # Has UMAP, clusters
+adata_velocity = scv.read("velocyto.loom")
+adata = scv.utils.merge(adata_processed, adata_velocity)
+
+# Verify layers
+print(adata)
+# obs × var: N × G
+# layers: 'spliced', 'unspliced' (required)
+# obsm['X_umap'] (required for visualization)
+```
+
+### 2. Preprocessing
+
+```python
+# Filter and normalize (follows Scanpy conventions)
+scv.pp.filter_and_normalize(
+    adata,
+    min_shared_counts=20,   # Minimum counts in spliced+unspliced
+    n_top_genes=2000        # Top highly variable genes
+)
+
+# Compute first and second order moments (means and variances)
+# knn_connectivities must be computed first
+sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30)
+scv.pp.moments(
+    adata,
+    n_pcs=30,
+    n_neighbors=30
+)
+```
+
+### 3. Velocity Estimation — Stochastic Model
+
+The stochastic model is fast and suitable for exploratory analysis:
+
+```python
+# Stochastic velocity (faster, less accurate)
+scv.tl.velocity(adata, mode='stochastic')
+scv.tl.velocity_graph(adata)
+
+# Visualize
+scv.pl.velocity_embedding_stream(
+    adata,
+    basis='umap',
+    color='leiden',
+    title="RNA Velocity (Stochastic)"
+)
+```
+
+### 4. Velocity Estimation — Dynamical Model (Recommended)
+
+The dynamical model fits the full splicing kinetics and is more accurate:
+
+```python
+# Recover dynamics (computationally intensive; ~10-30 min for 10K cells)
+scv.tl.recover_dynamics(adata, n_jobs=4)
+
+# Compute velocity from dynamical model
+scv.tl.velocity(adata, mode='dynamical')
+scv.tl.velocity_graph(adata)
+```
+
+### 5. Latent Time
+
+The dynamical model enables computation of a shared latent time (pseudotime):
+
+```python
+# Compute latent time
+scv.tl.latent_time(adata)
+
+# Visualize latent time on UMAP
+scv.pl.scatter(
+    adata,
+    color='latent_time',
+    color_map='gnuplot',
+    size=80,
+    title='Latent time'
+)
+
+# Identify top genes ordered by latent time
+top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
+scv.pl.heatmap(
+    adata,
+    var_names=top_genes,
+    sortby='latent_time',
+    col_color='leiden',
+    n_convolve=100
+)
+```
+
+### 6. Driver Gene Analysis
+
+```python
+# Identify genes with highest velocity fit
+scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
+df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
+print(df.head(10))
+
+# Speed and coherence
+scv.tl.velocity_confidence(adata)
+scv.pl.scatter(
+    adata,
+    c=['velocity_length', 'velocity_confidence'],
+    cmap='coolwarm',
+    perc=[5, 95]
+)
+
+# Phase portraits for specific genes
+scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'],
+               ncols=3, figsize=(16, 4))
+```
+
+### 7. Velocity Arrows and Pseudotime
+
+```python
+# Arrow plot on UMAP
+scv.pl.velocity_embedding(
+    adata,
+    arrow_length=3,
+    arrow_size=2,
+    color='leiden',
+    basis='umap'
+)
+
+# Stream plot (cleaner visualization)
+scv.pl.velocity_embedding_stream(
+    adata,
+    basis='umap',
+    color='leiden',
+    smooth=0.8,
+    min_mass=4
+)
+
+# Velocity pseudotime (alternative to latent time)
+scv.tl.velocity_pseudotime(adata)
+scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
+```
+
+### 8. PAGA Trajectory Graph
+
+```python
+# PAGA graph with velocity-informed transitions
+scv.tl.paga(adata, groups='leiden')
+df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
+df.style.background_gradient(cmap='Blues').format('{:.2g}')
+
+# Plot PAGA with velocity
+scv.pl.paga(
+    adata,
+    basis='umap',
+    size=50,
+    alpha=0.1,
+    min_edge_width=2,
+    node_size_scale=1.5
+)
+```
+
+## Complete Workflow Script
+
+```python
+import scvelo as scv
+import scanpy as sc
+
+def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
+    """
+    Complete RNA velocity workflow.
+
+    Args:
+        adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
+        n_top_genes: Number of top HVGs for velocity
+        mode: 'stochastic' (fast) or 'dynamical' (accurate)
+        n_jobs: Parallel jobs for dynamical model
+
+    Returns:
+        Processed AnnData with velocity information
+    """
+    scv.settings.verbosity = 2
+
+    # 1. Preprocessing
+    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
+
+    if 'neighbors' not in adata.uns:
+        sc.pp.neighbors(adata, n_neighbors=30)
+
+    scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
+
+    # 2. Velocity estimation
+    if mode == 'dynamical':
+        scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
+
+    scv.tl.velocity(adata, mode=mode)
+    scv.tl.velocity_graph(adata)
+
+    # 3. Downstream analyses
+    if mode == 'dynamical':
+        scv.tl.latent_time(adata)
+        scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
+
+    scv.tl.velocity_confidence(adata)
+    scv.tl.velocity_pseudotime(adata)
+
+    return adata
+```
+
+## Key Output Fields in AnnData
+
+After running the workflow, the following fields are added:
+
+| Location | Key | Description |
+|----------|-----|-------------|
+| `adata.layers` | `velocity` | RNA velocity per gene per cell |
+| `adata.layers` | `fit_t` | Fitted latent time per gene per cell |
+| `adata.obsm` | `velocity_umap` | 2D velocity vectors on UMAP |
+| `adata.obs` | `velocity_pseudotime` | Pseudotime from velocity |
+| `adata.obs` | `latent_time` | Latent time from dynamical model |
+| `adata.obs` | `velocity_length` | Speed of each cell |
+| `adata.obs` | `velocity_confidence` | Confidence score per cell |
+| `adata.var` | `fit_likelihood` | Gene-level model fit quality |
+| `adata.var` | `fit_alpha` | Transcription rate |
+| `adata.var` | `fit_beta` | Splicing rate |
+| `adata.var` | `fit_gamma` | Degradation rate |
+| `adata.uns` | `velocity_graph` | Cell-cell transition probability matrix |
+
+## Velocity Models Comparison
+
+| Model | Speed | Accuracy | When to Use |
+|-------|-------|----------|-------------|
+| `stochastic` | Fast | Moderate | Exploratory; large datasets |
+| `deterministic` | Medium | Moderate | Simple linear kinetics |
+| `dynamical` | Slow | High | Publication-quality; identifies driver genes |
+
+## Best Practices
+
+- **Start with stochastic mode** for exploration; switch to dynamical for final analysis
+- **Need good coverage of unspliced reads**: Short reads (< 100 bp) may miss intron coverage
+- **Minimum 2,000 cells**: RNA velocity is noisy with fewer cells
+- **Velocity should be coherent**: Arrows should follow known biology; randomness indicates issues
+- **k-NN bandwidth matters**: Too few neighbors → noisy velocity; too many → oversmoothed
+- **Sanity check**: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
+- **Dynamical model requires distinct kinetic states**: Works best for clear differentiation processes
+
+## Troubleshooting
+
+| Problem | Solution |
+|---------|---------|
+| Missing unspliced layer | Re-run velocyto or use STARsolo with `--soloFeatures Gene Velocyto` |
+| Very few velocity genes | Lower `min_shared_counts`; check sequencing depth |
+| Random-looking arrows | Try different `n_neighbors` or velocity model |
+| Memory error with dynamical | Set `n_jobs=1`; reduce `n_top_genes` |
+| Negative velocity everywhere | Check that spliced/unspliced layers are not swapped |
+
+## Additional Resources
+
+- **scVelo documentation**: https://scvelo.readthedocs.io/
+- **Tutorial notebooks**: https://scvelo.readthedocs.io/tutorials/
+- **GitHub**: https://github.com/theislab/scvelo
+- **Paper**: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
+- **velocyto** (preprocessing): http://velocyto.org/
+- **CellRank** (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
+- **dynamo** (metabolic labeling alternative): https://dynamo-release.readthedocs.io/
--- a/scientific-skills/scvelo/references/velocity_models.md
+++ b/scientific-skills/scvelo/references/velocity_models.md
@@ -0,0 +1,168 @@
+# scVelo Velocity Models Reference
+
+## Mathematical Framework
+
+RNA velocity is based on the kinetic model of transcription:
+
+```
+dx_s/dt = β·x_u - γ·x_s   (spliced dynamics)
+dx_u/dt = α(t) - β·x_u    (unspliced dynamics)
+```
+
+Where:
+- `x_s`: spliced mRNA abundance
+- `x_u`: unspliced (pre-mRNA) abundance
+- `α(t)`: transcription rate (varies over time)
+- `β`: splicing rate
+- `γ`: degradation rate
+
+**Velocity** is defined as: `v = dx_s/dt = β·x_u - γ·x_s`
+
+- **v > 0**: Gene is being upregulated (more unspliced than expected at steady state)
+- **v < 0**: Gene is being downregulated (less unspliced than expected)
+
+## Model Comparison
+
+### Steady-State (Velocyto, original)
+
+- Assumes constant α (transcription rate)
+- Fits γ using linear regression on steady-state cells
+- **Limitation**: Requires identifiable steady states; assumes constant transcription
+
+```python
+# Use with scVelo for backward compatibility
+scv.tl.velocity(adata, mode='steady_state')
+```
+
+### Stochastic Model (scVelo v1)
+
+- Extends steady-state with variance/covariance terms
+- Models cell-to-cell variability in mRNA counts
+- More robust to noise than steady-state
+
+```python
+scv.tl.velocity(adata, mode='stochastic')
+```
+
+### Dynamical Model (scVelo v2, recommended)
+
+- Jointly estimates all kinetic rates (α, β, γ) and cell-specific latent time
+- Does not assume steady state
+- Identifies induction vs. repression phases
+- Computes fit_likelihood per gene (quality measure)
+
+```python
+scv.tl.recover_dynamics(adata, n_jobs=4)
+scv.tl.velocity(adata, mode='dynamical')
+```
+
+**Kinetic states identified by dynamical model:**
+
+| State | Description |
+|-------|-------------|
+| Induction | α > 0, x_u increasing |
+| Steady-state on | α > 0, constant high expression |
+| Repression | α = 0, x_u decreasing |
+| Steady-state off | α = 0, constant low expression |
+
+## Velocity Graph
+
+The velocity graph connects cells based on their velocity similarity to neighboring cells' states:
+
+```python
+scv.tl.velocity_graph(adata)
+# Stored in adata.uns['velocity_graph']
+# Entry [i,j] = probability that cell i transitions to cell j
+```
+
+**Parameters:**
+- `n_neighbors`: Number of neighbors considered
+- `sqrt_transform`: Apply sqrt transform to data (default: False for spliced)
+- `approx`: Use approximate nearest neighbor search (faster for large datasets)
+
+## Latent Time Interpretation
+
+Latent time τ ∈ [0, 1] for each gene represents:
+- τ = 0: Gene is at onset of induction
+- τ = 0.5: Gene is at peak of induction (for a complete cycle)
+- τ = 1: Gene has returned to steady-state off
+
+**Shared latent time** is computed by taking the average over all velocity genes, weighted by fit_likelihood.
+
+## Quality Metrics
+
+### Gene-level
+- `fit_likelihood`: Goodness-of-fit of dynamical model (0-1; higher = better)
+  - Use for filtering driver genes: `adata.var[adata.var['fit_likelihood'] > 0.1]`
+- `fit_alpha`: Transcription rate during induction
+- `fit_gamma`: mRNA degradation rate
+- `fit_r2`: R² of kinetic fit
+
+### Cell-level
+- `velocity_length`: Magnitude of velocity vector (cell speed)
+- `velocity_confidence`: Coherence of velocity with neighboring cells (0-1)
+
+### Dataset-level
+```python
+# Check overall velocity quality
+scv.pl.proportions(adata)  # Ratio of spliced/unspliced per cell
+scv.pl.velocity_confidence(adata, groupby='leiden')
+```
+
+## Parameter Tuning Guide
+
+| Parameter | Function | Default | When to Change |
+|-----------|----------|---------|----------------|
+| `min_shared_counts` | Filter genes | 20 | Increase for deep sequencing; decrease for shallow |
+| `n_top_genes` | HVG selection | 2000 | Increase for complex datasets |
+| `n_neighbors` | kNN graph | 30 | Decrease for small datasets; increase for noisy |
+| `n_pcs` | PCA dimensions | 30 | Match to elbow in scree plot |
+| `t_max_rank` | Latent time constraint | None | Set if known developmental direction |
+
+## Integration with Other Tools
+
+### CellRank (Fate Prediction)
+
+```python
+import cellrank as cr
+from cellrank.kernels import VelocityKernel, ConnectivityKernel
+
+# Combine velocity and connectivity kernels
+vk = VelocityKernel(adata).compute_transition_matrix()
+ck = ConnectivityKernel(adata).compute_transition_matrix()
+combined = 0.8 * vk + 0.2 * ck
+
+# Compute macrostates (terminal and initial states)
+g = cr.estimators.GPCCA(combined)
+g.compute_macrostates(n_states=4, cluster_key='leiden')
+g.plot_macrostates(which="all")
+
+# Compute fate probabilities
+g.compute_fate_probabilities()
+g.plot_fate_probabilities()
+```
+
+### Scanpy Integration
+
+scVelo works natively with Scanpy's AnnData:
+
+```python
+import scanpy as sc
+import scvelo as scv
+
+# Run standard Scanpy pipeline first
+sc.pp.normalize_total(adata)
+sc.pp.log1p(adata)
+sc.pp.highly_variable_genes(adata)
+sc.pp.pca(adata)
+sc.pp.neighbors(adata)
+sc.tl.umap(adata)
+sc.tl.leiden(adata)
+
+# Then add velocity on top
+scv.pp.moments(adata)
+scv.tl.recover_dynamics(adata)
+scv.tl.velocity(adata, mode='dynamical')
+scv.tl.velocity_graph(adata)
+scv.tl.latent_time(adata)
+```
--- a/scientific-skills/scvelo/scripts/rna_velocity_workflow.py
+++ b/scientific-skills/scvelo/scripts/rna_velocity_workflow.py
@@ -0,0 +1,232 @@
+"""
+RNA Velocity Analysis Workflow using scVelo
+===========================================
+Complete pipeline from raw data to velocity visualization.
+
+Usage:
+    python rna_velocity_workflow.py
+
+Or import and use run_velocity_analysis() with your AnnData object.
+"""
+
+import scvelo as scv
+import scanpy as sc
+import numpy as np
+import matplotlib
+matplotlib.use('Agg')  # Non-interactive backend
+import matplotlib.pyplot as plt
+import os
+
+
+def run_velocity_analysis(
+    adata,
+    groupby="leiden",
+    n_top_genes=2000,
+    n_neighbors=30,
+    mode="dynamical",
+    n_jobs=4,
+    output_dir="velocity_results",
+):
+    """
+    Complete RNA velocity analysis workflow.
+
+    Parameters
+    ----------
+    adata : AnnData
+        AnnData object with 'spliced' and 'unspliced' layers.
+        Should already have UMAP and cluster annotations.
+    groupby : str
+        Column in adata.obs for cell type labels.
+    n_top_genes : int
+        Number of top highly variable genes.
+    n_neighbors : int
+        Number of neighbors for moment computation.
+    mode : str
+        Velocity model: 'stochastic' (fast) or 'dynamical' (accurate).
+    n_jobs : int
+        Parallel jobs for dynamical model fitting.
+    output_dir : str
+        Directory for saving output figures.
+
+    Returns
+    -------
+    AnnData with velocity annotations.
+    """
+    os.makedirs(output_dir, exist_ok=True)
+
+    # ── Settings ──────────────────────────────────────────────────────────────
+    scv.settings.verbosity = 2
+    scv.settings.figdir = output_dir
+
+    # ── Step 1: Check layers ───────────────────────────────────────────────────
+    assert "spliced" in adata.layers, "Missing 'spliced' layer. Run velocyto first."
+    assert "unspliced" in adata.layers, "Missing 'unspliced' layer. Run velocyto first."
+    print(f"Input: {adata.n_obs} cells × {adata.n_vars} genes")
+
+    # ── Step 2: Preprocessing ─────────────────────────────────────────────────
+    print("Step 1/5: Preprocessing...")
+    scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
+
+    if "neighbors" not in adata.uns:
+        sc.pp.neighbors(adata, n_neighbors=n_neighbors, n_pcs=30)
+
+    scv.pp.moments(adata, n_pcs=30, n_neighbors=n_neighbors)
+    print(f"  {adata.n_vars} velocity genes selected")
+
+    # ── Step 3: Velocity estimation ────────────────────────────────────────────
+    print(f"Step 2/5: Fitting velocity model ({mode})...")
+    if mode == "dynamical":
+        scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
+    scv.tl.velocity(adata, mode=mode)
+    scv.tl.velocity_graph(adata)
+    print("  Velocity graph computed")
+
+    # ── Step 4: Downstream analyses ────────────────────────────────────────────
+    print("Step 3/5: Computing latent time and confidence...")
+    scv.tl.velocity_confidence(adata)
+    scv.tl.velocity_pseudotime(adata)
+
+    if mode == "dynamical":
+        scv.tl.latent_time(adata)
+
+    if groupby in adata.obs.columns:
+        scv.tl.rank_velocity_genes(adata, groupby=groupby, min_corr=0.3)
+
+    # ── Step 5: Visualization ─────────────────────────────────────────────────
+    print("Step 4/5: Generating figures...")
+
+    # Stream plot
+    scv.pl.velocity_embedding_stream(
+        adata,
+        basis="umap",
+        color=groupby,
+        title="RNA Velocity",
+        save=f"{output_dir}/velocity_stream.png",
+    )
+
+    # Arrow plot
+    scv.pl.velocity_embedding(
+        adata,
+        arrow_length=3,
+        arrow_size=2,
+        color=groupby,
+        basis="umap",
+        save=f"{output_dir}/velocity_arrows.png",
+    )
+
+    # Pseudotime
+    scv.pl.scatter(
+        adata,
+        color="velocity_pseudotime",
+        cmap="gnuplot",
+        title="Velocity Pseudotime",
+        save=f"{output_dir}/pseudotime.png",
+    )
+
+    if mode == "dynamical" and "latent_time" in adata.obs:
+        scv.pl.scatter(
+            adata,
+            color="latent_time",
+            color_map="gnuplot",
+            title="Latent Time",
+            save=f"{output_dir}/latent_time.png",
+        )
+
+    # Speed and coherence
+    scv.pl.scatter(
+        adata,
+        c=["velocity_length", "velocity_confidence"],
+        cmap="coolwarm",
+        perc=[5, 95],
+        save=f"{output_dir}/velocity_quality.png",
+    )
+
+    # Top driver genes heatmap (dynamical only)
+    if mode == "dynamical" and "fit_likelihood" in adata.var:
+        top_genes = adata.var["fit_likelihood"].sort_values(ascending=False).index[:50]
+        scv.pl.heatmap(
+            adata,
+            var_names=top_genes,
+            sortby="latent_time",
+            col_color=groupby,
+            n_convolve=50,
+            save=f"{output_dir}/driver_gene_heatmap.png",
+        )
+
+    # ── Step 6: Save results ───────────────────────────────────────────────────
+    print("Step 5/5: Saving results...")
+    output_h5ad = os.path.join(output_dir, "adata_velocity.h5ad")
+    adata.write_h5ad(output_h5ad)
+    print(f"  Saved to {output_h5ad}")
+
+    # Summary statistics
+    confidence = adata.obs["velocity_confidence"].dropna()
+    print("\nSummary:")
+    print(f"  Velocity model: {mode}")
+    print(f"  Cells: {adata.n_obs}")
+    print(f"  Velocity genes: {adata.n_vars}")
+    print(f"  Mean velocity confidence: {confidence.mean():.3f}")
+    print(f"  High-confidence cells (>0.7): {(confidence > 0.7).sum()} ({(confidence > 0.7).mean():.1%})")
+
+    if mode == "dynamical" and "fit_likelihood" in adata.var:
+        good_genes = (adata.var["fit_likelihood"] > 0.1).sum()
+        print(f"  Well-fit genes (likelihood>0.1): {good_genes}")
+
+    print(f"\nOutput files saved to: {output_dir}/")
+    return adata
+
+
+def load_from_loom(loom_path, processed_h5ad=None):
+    """
+    Load velocity data from velocyto loom file.
+
+    Args:
+        loom_path: Path to velocyto output loom file
+        processed_h5ad: Optional path to pre-processed Scanpy h5ad file
+    """
+    adata_loom = scv.read(loom_path, cache=True)
+
+    if processed_h5ad:
+        adata_processed = sc.read_h5ad(processed_h5ad)
+        # Merge: keep processed metadata and add velocity layers
+        adata = scv.utils.merge(adata_processed, adata_loom)
+    else:
+        adata = adata_loom
+        # Run basic Scanpy pipeline
+        sc.pp.normalize_total(adata, target_sum=1e4)
+        sc.pp.log1p(adata)
+        sc.pp.highly_variable_genes(adata, n_top_genes=3000)
+        sc.pp.pca(adata)
+        sc.pp.neighbors(adata)
+        sc.tl.umap(adata)
+        sc.tl.leiden(adata, resolution=0.5)
+
+    return adata
+
+
+if __name__ == "__main__":
+    # Example usage with simulated data (for testing)
+    print("scVelo RNA Velocity Workflow - Demo Mode")
+    print("=" * 50)
+
+    # Load example dataset
+    adata = scv.datasets.pancreas()
+    print(f"Loaded pancreas dataset: {adata}")
+
+    # Run analysis
+    adata = run_velocity_analysis(
+        adata,
+        groupby="clusters",
+        n_top_genes=2000,
+        mode="dynamical",
+        n_jobs=2,
+        output_dir="pancreas_velocity",
+    )
+
+    print("\nAnalysis complete!")
+    print(f"Key results:")
+    print(f"  adata.layers['velocity']: velocity per gene per cell")
+    print(f"  adata.obs['latent_time']: pseudotime from dynamics")
+    print(f"  adata.obs['velocity_confidence']: per-cell confidence")
+    if "rank_velocity_genes" in adata.uns:
+        print(f"  adata.uns['rank_velocity_genes']: driver genes per cluster")