Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation

- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization.
- Added a script for running RNA velocity analysis with customizable parameters and output options.
- Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation.
- Included references for velocity models and their mathematical framework, along with a comparison of different models.
- Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
This commit is contained in:
huangkuanlin
2026-03-03 07:15:36 -05:00
parent b271271df4
commit 7f94783fab
27 changed files with 6961 additions and 0 deletions

View File

@@ -0,0 +1,178 @@
# BindingDB Affinity Query Reference
## Affinity Measurement Types
### Ki (Inhibition Constant)
- **Definition**: Equilibrium constant for inhibitor-enzyme complex dissociation
- **Equation**: Ki = [E][I]/[EI]
- **Usage**: Enzyme inhibition; preferred for mechanistic studies
- **Note**: Independent of substrate concentration (unlike IC50)
### Kd (Dissociation Constant)
- **Definition**: Thermodynamic binding equilibrium constant
- **Equation**: Kd = [A][B]/[AB]
- **Usage**: Direct binding assays (SPR, ITC, fluorescence anisotropy)
- **Note**: True measure of binding strength; lower = tighter binding
### IC50 (Half-Maximal Inhibitory Concentration)
- **Definition**: Concentration of inhibitor that reduces target activity by 50%
- **Usage**: Most common in drug discovery; assay-dependent
- **Conversion to Ki**: Cheng-Prusoff equation: Ki = IC50 / (1 + [S]/Km)
- **Note**: Depends on substrate concentration and assay conditions
### EC50 (Half-Maximal Effective Concentration)
- **Definition**: Concentration that produces 50% of maximal effect
- **Usage**: Cell-based assays, agonist studies
### Kinetics Parameters
- **kon**: Association rate constant (M⁻¹s⁻¹); describes how fast complex forms
- **koff**: Dissociation rate constant (s⁻¹); describes how fast complex dissociates
- **Residence time**: τ = 1/koff; longer residence = more sustained effect
- **Kd from kinetics**: Kd = koff/kon
## Common API Query Patterns
### By UniProt ID (REST API)
```python
import requests
def query_by_uniprot(uniprot_id, affinity_type="Ki"):
"""
REST API query for BindingDB affinities by UniProt target ID.
"""
url = "https://www.bindingdb.org/axis2/services/BDBService/getLigandsByUniprotID"
params = {
"uniprot_id": uniprot_id,
"cutoff": "10000", # nM threshold
"affinity_type": affinity_type,
"response": "json"
}
response = requests.get(url, params=params)
return response.json()
# Important targets
COMMON_TARGETS = {
"ABL1": "P00519", # Imatinib, dasatinib target
"EGFR": "P00533", # Erlotinib, gefitinib target
"BRAF": "P15056", # Vemurafenib, dabrafenib target
"CDK2": "P24941", # Cell cycle kinase
"HDAC1": "Q13547", # Histone deacetylase
"BRD4": "O60885", # BET bromodomain reader
"MDM2": "Q00987", # p53 negative regulator
"BCL2": "P10415", # Antiapoptotic protein
"PCSK9": "Q8NBP7", # Cholesterol regulator
"JAK2": "O60674", # Cytokine signaling kinase
}
```
### By PubChem CID (REST API)
```python
def query_by_pubchem_cid(pubchem_cid):
"""Get all binding data for a specific compound by PubChem CID."""
url = "https://www.bindingdb.org/axis2/services/BDBService/getAffinitiesByCID"
params = {"cid": pubchem_cid, "response": "json"}
response = requests.get(url, params=params)
return response.json()
# Example: Imatinib PubChem CID = 5291
imatinib_data = query_by_pubchem_cid(5291)
```
### By Target Name
```python
def query_by_target_name(target_name, affinity_cutoff=100):
"""Query BindingDB by target name."""
url = "https://www.bindingdb.org/axis2/services/BDBService/getAffinitiesByTarget"
params = {
"target_name": target_name,
"cutoff": affinity_cutoff,
"response": "json"
}
response = requests.get(url, params=params)
return response.json()
```
## Dataset Download Guide
### Available Files
| File | Size | Contents |
|------|------|---------|
| `BindingDB_All.tsv.zip` | ~3.5 GB | All data: ~2.9M records |
| `BindingDB_All.sdf.zip` | ~7 GB | All data with 3D structures |
| `BindingDB_IC50.tsv` | ~1.5 GB | IC50 data only |
| `BindingDB_Ki.tsv` | ~0.8 GB | Ki data only |
| `BindingDB_Kd.tsv` | ~0.2 GB | Kd data only |
| `BindingDB_EC50.tsv` | ~0.5 GB | EC50 data only |
| `tdc_bindingdb_*` | Various | TDC-formatted subsets |
### Efficient Loading
```python
import pandas as pd
# For large files, use chunking
def load_bindingdb_chunked(filepath, uniprot_ids, affinity_col="Ki (nM)", chunk_size=100000):
"""Load BindingDB in chunks to filter for specific targets."""
results = []
for chunk in pd.read_csv(filepath, sep="\t", chunksize=chunk_size,
low_memory=False, on_bad_lines='skip'):
# Filter for target
mask = chunk["UniProt (SwissProt) Primary ID of Target Chain"].isin(uniprot_ids)
if mask.any():
results.append(chunk[mask])
if results:
return pd.concat(results)
return pd.DataFrame()
```
## pKi / pIC50 Conversion
Converting raw affinity to logarithmic scale (common in ML):
```python
import numpy as np
def to_log_affinity(affinity_nM):
"""Convert nM affinity to pAffinity (negative log molar)."""
affinity_M = affinity_nM * 1e-9 # Convert nM to M
return -np.log10(affinity_M)
# Examples:
# 1 nM → pAffinity = 9.0
# 10 nM → pAffinity = 8.0
# 100 nM → pAffinity = 7.0
# 1 μM → pAffinity = 6.0
# 10 μM → pAffinity = 5.0
```
## Quality Filters
When using BindingDB data for ML or SAR:
```python
def filter_quality(df):
"""Apply quality filters to BindingDB data."""
# 1. Require valid SMILES
df = df[df["Ligand SMILES"].notna() & (df["Ligand SMILES"] != "")]
# 2. Require valid affinity
df = df[df["Ki (nM)"].notna() | df["IC50 (nM)"].notna()]
# 3. Filter extreme values (artifacts)
for col in ["Ki (nM)", "IC50 (nM)", "Kd (nM)"]:
if col in df.columns:
df = df[~(df[col] > 1e6)] # Remove > 1 mM (non-specific)
# 4. Use only human targets
if "Target Source Organism According to Curator or DataSource" in df.columns:
df = df[df["Target Source Organism According to Curator or DataSource"].str.contains(
"Homo sapiens", na=False
)]
return df
```