claude-scientific-skills/scientific-skills/scvelo/references/velocity_models.md

# scVelo Velocity Models Reference

## Mathematical Framework

RNA velocity is based on the kinetic model of transcription:

```
dx_s/dt = β·x_u - γ·x_s   (spliced dynamics)
dx_u/dt = α(t) - β·x_u    (unspliced dynamics)
```

Where:
- `x_s`: spliced mRNA abundance
- `x_u`: unspliced (pre-mRNA) abundance
- `α(t)`: transcription rate (varies over time)
- `β`: splicing rate
- `γ`: degradation rate

**Velocity** is defined as: `v = dx_s/dt = β·x_u - γ·x_s`

- **v > 0**: Gene is being upregulated (more unspliced than expected at steady state)
- **v < 0**: Gene is being downregulated (less unspliced than expected)

## Model Comparison

### Steady-State (Velocyto, original)

- Assumes constant α (transcription rate)
- Fits γ using linear regression on steady-state cells
- **Limitation**: Requires identifiable steady states; assumes constant transcription

```python
# Use with scVelo for backward compatibility
scv.tl.velocity(adata, mode='steady_state')
```

### Stochastic Model (scVelo v1)

- Extends steady-state with variance/covariance terms
- Models cell-to-cell variability in mRNA counts
- More robust to noise than steady-state

```python
scv.tl.velocity(adata, mode='stochastic')
```

### Dynamical Model (scVelo v2, recommended)

- Jointly estimates all kinetic rates (α, β, γ) and cell-specific latent time
- Does not assume steady state
- Identifies induction vs. repression phases
- Computes fit_likelihood per gene (quality measure)

```python
scv.tl.recover_dynamics(adata, n_jobs=4)
scv.tl.velocity(adata, mode='dynamical')
```

**Kinetic states identified by dynamical model:**

| State | Description |
|-------|-------------|
| Induction | α > 0, x_u increasing |
| Steady-state on | α > 0, constant high expression |
| Repression | α = 0, x_u decreasing |
| Steady-state off | α = 0, constant low expression |

## Velocity Graph

The velocity graph connects cells based on their velocity similarity to neighboring cells' states:

```python
scv.tl.velocity_graph(adata)
# Stored in adata.uns['velocity_graph']
# Entry [i,j] = probability that cell i transitions to cell j
```

**Parameters:**
- `n_neighbors`: Number of neighbors considered
- `sqrt_transform`: Apply sqrt transform to data (default: False for spliced)
- `approx`: Use approximate nearest neighbor search (faster for large datasets)

## Latent Time Interpretation

Latent time τ ∈ [0, 1] for each gene represents:
- τ = 0: Gene is at onset of induction
- τ = 0.5: Gene is at peak of induction (for a complete cycle)
- τ = 1: Gene has returned to steady-state off

**Shared latent time** is computed by taking the average over all velocity genes, weighted by fit_likelihood.

## Quality Metrics

### Gene-level
- `fit_likelihood`: Goodness-of-fit of dynamical model (0-1; higher = better)
  - Use for filtering driver genes: `adata.var[adata.var['fit_likelihood'] > 0.1]`
- `fit_alpha`: Transcription rate during induction
- `fit_gamma`: mRNA degradation rate
- `fit_r2`: R² of kinetic fit

### Cell-level
- `velocity_length`: Magnitude of velocity vector (cell speed)
- `velocity_confidence`: Coherence of velocity with neighboring cells (0-1)

### Dataset-level
```python
# Check overall velocity quality
scv.pl.proportions(adata)  # Ratio of spliced/unspliced per cell
scv.pl.velocity_confidence(adata, groupby='leiden')
```

## Parameter Tuning Guide

| Parameter | Function | Default | When to Change |
|-----------|----------|---------|----------------|
| `min_shared_counts` | Filter genes | 20 | Increase for deep sequencing; decrease for shallow |
| `n_top_genes` | HVG selection | 2000 | Increase for complex datasets |
| `n_neighbors` | kNN graph | 30 | Decrease for small datasets; increase for noisy |
| `n_pcs` | PCA dimensions | 30 | Match to elbow in scree plot |
| `t_max_rank` | Latent time constraint | None | Set if known developmental direction |

## Integration with Other Tools

### CellRank (Fate Prediction)

```python
import cellrank as cr
from cellrank.kernels import VelocityKernel, ConnectivityKernel

# Combine velocity and connectivity kernels
vk = VelocityKernel(adata).compute_transition_matrix()
ck = ConnectivityKernel(adata).compute_transition_matrix()
combined = 0.8 * vk + 0.2 * ck

# Compute macrostates (terminal and initial states)
g = cr.estimators.GPCCA(combined)
g.compute_macrostates(n_states=4, cluster_key='leiden')
g.plot_macrostates(which="all")

# Compute fate probabilities
g.compute_fate_probabilities()
g.plot_fate_probabilities()
```

### Scanpy Integration

scVelo works natively with Scanpy's AnnData:

```python
import scanpy as sc
import scvelo as scv

# Run standard Scanpy pipeline first
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.tl.leiden(adata)

# Then add velocity on top
scv.pp.moments(adata)
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)
```