Files
claude-scientific-skills/scientific-skills/scvelo/references/velocity_models.md
huangkuanlin 7f94783fab Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization.
- Added a script for running RNA velocity analysis with customizable parameters and output options.
- Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation.
- Included references for velocity models and their mathematical framework, along with a comparison of different models.
- Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
2026-03-03 07:15:36 -05:00

169 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# scVelo Velocity Models Reference
## Mathematical Framework
RNA velocity is based on the kinetic model of transcription:
```
dx_s/dt = β·x_u - γ·x_s (spliced dynamics)
dx_u/dt = α(t) - β·x_u (unspliced dynamics)
```
Where:
- `x_s`: spliced mRNA abundance
- `x_u`: unspliced (pre-mRNA) abundance
- `α(t)`: transcription rate (varies over time)
- `β`: splicing rate
- `γ`: degradation rate
**Velocity** is defined as: `v = dx_s/dt = β·x_u - γ·x_s`
- **v > 0**: Gene is being upregulated (more unspliced than expected at steady state)
- **v < 0**: Gene is being downregulated (less unspliced than expected)
## Model Comparison
### Steady-State (Velocyto, original)
- Assumes constant α (transcription rate)
- Fits γ using linear regression on steady-state cells
- **Limitation**: Requires identifiable steady states; assumes constant transcription
```python
# Use with scVelo for backward compatibility
scv.tl.velocity(adata, mode='steady_state')
```
### Stochastic Model (scVelo v1)
- Extends steady-state with variance/covariance terms
- Models cell-to-cell variability in mRNA counts
- More robust to noise than steady-state
```python
scv.tl.velocity(adata, mode='stochastic')
```
### Dynamical Model (scVelo v2, recommended)
- Jointly estimates all kinetic rates (α, β, γ) and cell-specific latent time
- Does not assume steady state
- Identifies induction vs. repression phases
- Computes fit_likelihood per gene (quality measure)
```python
scv.tl.recover_dynamics(adata, n_jobs=4)
scv.tl.velocity(adata, mode='dynamical')
```
**Kinetic states identified by dynamical model:**
| State | Description |
|-------|-------------|
| Induction | α > 0, x_u increasing |
| Steady-state on | α > 0, constant high expression |
| Repression | α = 0, x_u decreasing |
| Steady-state off | α = 0, constant low expression |
## Velocity Graph
The velocity graph connects cells based on their velocity similarity to neighboring cells' states:
```python
scv.tl.velocity_graph(adata)
# Stored in adata.uns['velocity_graph']
# Entry [i,j] = probability that cell i transitions to cell j
```
**Parameters:**
- `n_neighbors`: Number of neighbors considered
- `sqrt_transform`: Apply sqrt transform to data (default: False for spliced)
- `approx`: Use approximate nearest neighbor search (faster for large datasets)
## Latent Time Interpretation
Latent time τ ∈ [0, 1] for each gene represents:
- τ = 0: Gene is at onset of induction
- τ = 0.5: Gene is at peak of induction (for a complete cycle)
- τ = 1: Gene has returned to steady-state off
**Shared latent time** is computed by taking the average over all velocity genes, weighted by fit_likelihood.
## Quality Metrics
### Gene-level
- `fit_likelihood`: Goodness-of-fit of dynamical model (0-1; higher = better)
- Use for filtering driver genes: `adata.var[adata.var['fit_likelihood'] > 0.1]`
- `fit_alpha`: Transcription rate during induction
- `fit_gamma`: mRNA degradation rate
- `fit_r2`: R² of kinetic fit
### Cell-level
- `velocity_length`: Magnitude of velocity vector (cell speed)
- `velocity_confidence`: Coherence of velocity with neighboring cells (0-1)
### Dataset-level
```python
# Check overall velocity quality
scv.pl.proportions(adata) # Ratio of spliced/unspliced per cell
scv.pl.velocity_confidence(adata, groupby='leiden')
```
## Parameter Tuning Guide
| Parameter | Function | Default | When to Change |
|-----------|----------|---------|----------------|
| `min_shared_counts` | Filter genes | 20 | Increase for deep sequencing; decrease for shallow |
| `n_top_genes` | HVG selection | 2000 | Increase for complex datasets |
| `n_neighbors` | kNN graph | 30 | Decrease for small datasets; increase for noisy |
| `n_pcs` | PCA dimensions | 30 | Match to elbow in scree plot |
| `t_max_rank` | Latent time constraint | None | Set if known developmental direction |
## Integration with Other Tools
### CellRank (Fate Prediction)
```python
import cellrank as cr
from cellrank.kernels import VelocityKernel, ConnectivityKernel
# Combine velocity and connectivity kernels
vk = VelocityKernel(adata).compute_transition_matrix()
ck = ConnectivityKernel(adata).compute_transition_matrix()
combined = 0.8 * vk + 0.2 * ck
# Compute macrostates (terminal and initial states)
g = cr.estimators.GPCCA(combined)
g.compute_macrostates(n_states=4, cluster_key='leiden')
g.plot_macrostates(which="all")
# Compute fate probabilities
g.compute_fate_probabilities()
g.plot_fate_probabilities()
```
### Scanpy Integration
scVelo works natively with Scanpy's AnnData:
```python
import scanpy as sc
import scvelo as scv
# Run standard Scanpy pipeline first
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)
# Then add velocity on top
scv.pp.moments(adata)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)
```