mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
169 lines
4.9 KiB
Markdown
169 lines
4.9 KiB
Markdown
# scVelo Velocity Models Reference
|
||
|
||
## Mathematical Framework
|
||
|
||
RNA velocity is based on the kinetic model of transcription:
|
||
|
||
```
|
||
dx_s/dt = β·x_u - γ·x_s (spliced dynamics)
|
||
dx_u/dt = α(t) - β·x_u (unspliced dynamics)
|
||
```
|
||
|
||
Where:
|
||
- `x_s`: spliced mRNA abundance
|
||
- `x_u`: unspliced (pre-mRNA) abundance
|
||
- `α(t)`: transcription rate (varies over time)
|
||
- `β`: splicing rate
|
||
- `γ`: degradation rate
|
||
|
||
**Velocity** is defined as: `v = dx_s/dt = β·x_u - γ·x_s`
|
||
|
||
- **v > 0**: Gene is being upregulated (more unspliced than expected at steady state)
|
||
- **v < 0**: Gene is being downregulated (less unspliced than expected)
|
||
|
||
## Model Comparison
|
||
|
||
### Steady-State (Velocyto, original)
|
||
|
||
- Assumes constant α (transcription rate)
|
||
- Fits γ using linear regression on steady-state cells
|
||
- **Limitation**: Requires identifiable steady states; assumes constant transcription
|
||
|
||
```python
|
||
# Use with scVelo for backward compatibility
|
||
scv.tl.velocity(adata, mode='steady_state')
|
||
```
|
||
|
||
### Stochastic Model (scVelo v1)
|
||
|
||
- Extends steady-state with variance/covariance terms
|
||
- Models cell-to-cell variability in mRNA counts
|
||
- More robust to noise than steady-state
|
||
|
||
```python
|
||
scv.tl.velocity(adata, mode='stochastic')
|
||
```
|
||
|
||
### Dynamical Model (scVelo v2, recommended)
|
||
|
||
- Jointly estimates all kinetic rates (α, β, γ) and cell-specific latent time
|
||
- Does not assume steady state
|
||
- Identifies induction vs. repression phases
|
||
- Computes fit_likelihood per gene (quality measure)
|
||
|
||
```python
|
||
scv.tl.recover_dynamics(adata, n_jobs=4)
|
||
scv.tl.velocity(adata, mode='dynamical')
|
||
```
|
||
|
||
**Kinetic states identified by dynamical model:**
|
||
|
||
| State | Description |
|
||
|-------|-------------|
|
||
| Induction | α > 0, x_u increasing |
|
||
| Steady-state on | α > 0, constant high expression |
|
||
| Repression | α = 0, x_u decreasing |
|
||
| Steady-state off | α = 0, constant low expression |
|
||
|
||
## Velocity Graph
|
||
|
||
The velocity graph connects cells based on their velocity similarity to neighboring cells' states:
|
||
|
||
```python
|
||
scv.tl.velocity_graph(adata)
|
||
# Stored in adata.uns['velocity_graph']
|
||
# Entry [i,j] = probability that cell i transitions to cell j
|
||
```
|
||
|
||
**Parameters:**
|
||
- `n_neighbors`: Number of neighbors considered
|
||
- `sqrt_transform`: Apply sqrt transform to data (default: False for spliced)
|
||
- `approx`: Use approximate nearest neighbor search (faster for large datasets)
|
||
|
||
## Latent Time Interpretation
|
||
|
||
Latent time τ ∈ [0, 1] for each gene represents:
|
||
- τ = 0: Gene is at onset of induction
|
||
- τ = 0.5: Gene is at peak of induction (for a complete cycle)
|
||
- τ = 1: Gene has returned to steady-state off
|
||
|
||
**Shared latent time** is computed by taking the average over all velocity genes, weighted by fit_likelihood.
|
||
|
||
## Quality Metrics
|
||
|
||
### Gene-level
|
||
- `fit_likelihood`: Goodness-of-fit of dynamical model (0-1; higher = better)
|
||
- Use for filtering driver genes: `adata.var[adata.var['fit_likelihood'] > 0.1]`
|
||
- `fit_alpha`: Transcription rate during induction
|
||
- `fit_gamma`: mRNA degradation rate
|
||
- `fit_r2`: R² of kinetic fit
|
||
|
||
### Cell-level
|
||
- `velocity_length`: Magnitude of velocity vector (cell speed)
|
||
- `velocity_confidence`: Coherence of velocity with neighboring cells (0-1)
|
||
|
||
### Dataset-level
|
||
```python
|
||
# Check overall velocity quality
|
||
scv.pl.proportions(adata) # Ratio of spliced/unspliced per cell
|
||
scv.pl.velocity_confidence(adata, groupby='leiden')
|
||
```
|
||
|
||
## Parameter Tuning Guide
|
||
|
||
| Parameter | Function | Default | When to Change |
|
||
|-----------|----------|---------|----------------|
|
||
| `min_shared_counts` | Filter genes | 20 | Increase for deep sequencing; decrease for shallow |
|
||
| `n_top_genes` | HVG selection | 2000 | Increase for complex datasets |
|
||
| `n_neighbors` | kNN graph | 30 | Decrease for small datasets; increase for noisy |
|
||
| `n_pcs` | PCA dimensions | 30 | Match to elbow in scree plot |
|
||
| `t_max_rank` | Latent time constraint | None | Set if known developmental direction |
|
||
|
||
## Integration with Other Tools
|
||
|
||
### CellRank (Fate Prediction)
|
||
|
||
```python
|
||
import cellrank as cr
|
||
from cellrank.kernels import VelocityKernel, ConnectivityKernel
|
||
|
||
# Combine velocity and connectivity kernels
|
||
vk = VelocityKernel(adata).compute_transition_matrix()
|
||
ck = ConnectivityKernel(adata).compute_transition_matrix()
|
||
combined = 0.8 * vk + 0.2 * ck
|
||
|
||
# Compute macrostates (terminal and initial states)
|
||
g = cr.estimators.GPCCA(combined)
|
||
g.compute_macrostates(n_states=4, cluster_key='leiden')
|
||
g.plot_macrostates(which="all")
|
||
|
||
# Compute fate probabilities
|
||
g.compute_fate_probabilities()
|
||
g.plot_fate_probabilities()
|
||
```
|
||
|
||
### Scanpy Integration
|
||
|
||
scVelo works natively with Scanpy's AnnData:
|
||
|
||
```python
|
||
import scanpy as sc
|
||
import scvelo as scv
|
||
|
||
# Run standard Scanpy pipeline first
|
||
sc.pp.normalize_total(adata)
|
||
sc.pp.log1p(adata)
|
||
sc.pp.highly_variable_genes(adata)
|
||
sc.pp.pca(adata)
|
||
sc.pp.neighbors(adata)
|
||
sc.tl.umap(adata)
|
||
sc.tl.leiden(adata)
|
||
|
||
# Then add velocity on top
|
||
scv.pp.moments(adata)
|
||
scv.tl.recover_dynamics(adata)
|
||
scv.tl.velocity(adata, mode='dynamical')
|
||
scv.tl.velocity_graph(adata)
|
||
scv.tl.latent_time(adata)
|
||
```
|