mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization. - Added a script for running RNA velocity analysis with customizable parameters and output options. - Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation. - Included references for velocity models and their mathematical framework, along with a comparison of different models. - Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
5.7 KiB
5.7 KiB
BindingDB Affinity Query Reference
Affinity Measurement Types
Ki (Inhibition Constant)
- Definition: Equilibrium constant for inhibitor-enzyme complex dissociation
- Equation: Ki = [E][I]/[EI]
- Usage: Enzyme inhibition; preferred for mechanistic studies
- Note: Independent of substrate concentration (unlike IC50)
Kd (Dissociation Constant)
- Definition: Thermodynamic binding equilibrium constant
- Equation: Kd = [A][B]/[AB]
- Usage: Direct binding assays (SPR, ITC, fluorescence anisotropy)
- Note: True measure of binding strength; lower = tighter binding
IC50 (Half-Maximal Inhibitory Concentration)
- Definition: Concentration of inhibitor that reduces target activity by 50%
- Usage: Most common in drug discovery; assay-dependent
- Conversion to Ki: Cheng-Prusoff equation: Ki = IC50 / (1 + [S]/Km)
- Note: Depends on substrate concentration and assay conditions
EC50 (Half-Maximal Effective Concentration)
- Definition: Concentration that produces 50% of maximal effect
- Usage: Cell-based assays, agonist studies
Kinetics Parameters
- kon: Association rate constant (M⁻¹s⁻¹); describes how fast complex forms
- koff: Dissociation rate constant (s⁻¹); describes how fast complex dissociates
- Residence time: τ = 1/koff; longer residence = more sustained effect
- Kd from kinetics: Kd = koff/kon
Common API Query Patterns
By UniProt ID (REST API)
import requests
def query_by_uniprot(uniprot_id, affinity_type="Ki"):
"""
REST API query for BindingDB affinities by UniProt target ID.
"""
url = "https://www.bindingdb.org/axis2/services/BDBService/getLigandsByUniprotID"
params = {
"uniprot_id": uniprot_id,
"cutoff": "10000", # nM threshold
"affinity_type": affinity_type,
"response": "json"
}
response = requests.get(url, params=params)
return response.json()
# Important targets
COMMON_TARGETS = {
"ABL1": "P00519", # Imatinib, dasatinib target
"EGFR": "P00533", # Erlotinib, gefitinib target
"BRAF": "P15056", # Vemurafenib, dabrafenib target
"CDK2": "P24941", # Cell cycle kinase
"HDAC1": "Q13547", # Histone deacetylase
"BRD4": "O60885", # BET bromodomain reader
"MDM2": "Q00987", # p53 negative regulator
"BCL2": "P10415", # Antiapoptotic protein
"PCSK9": "Q8NBP7", # Cholesterol regulator
"JAK2": "O60674", # Cytokine signaling kinase
}
By PubChem CID (REST API)
def query_by_pubchem_cid(pubchem_cid):
"""Get all binding data for a specific compound by PubChem CID."""
url = "https://www.bindingdb.org/axis2/services/BDBService/getAffinitiesByCID"
params = {"cid": pubchem_cid, "response": "json"}
response = requests.get(url, params=params)
return response.json()
# Example: Imatinib PubChem CID = 5291
imatinib_data = query_by_pubchem_cid(5291)
By Target Name
def query_by_target_name(target_name, affinity_cutoff=100):
"""Query BindingDB by target name."""
url = "https://www.bindingdb.org/axis2/services/BDBService/getAffinitiesByTarget"
params = {
"target_name": target_name,
"cutoff": affinity_cutoff,
"response": "json"
}
response = requests.get(url, params=params)
return response.json()
Dataset Download Guide
Available Files
| File | Size | Contents |
|---|---|---|
BindingDB_All.tsv.zip |
~3.5 GB | All data: ~2.9M records |
BindingDB_All.sdf.zip |
~7 GB | All data with 3D structures |
BindingDB_IC50.tsv |
~1.5 GB | IC50 data only |
BindingDB_Ki.tsv |
~0.8 GB | Ki data only |
BindingDB_Kd.tsv |
~0.2 GB | Kd data only |
BindingDB_EC50.tsv |
~0.5 GB | EC50 data only |
tdc_bindingdb_* |
Various | TDC-formatted subsets |
Efficient Loading
import pandas as pd
# For large files, use chunking
def load_bindingdb_chunked(filepath, uniprot_ids, affinity_col="Ki (nM)", chunk_size=100000):
"""Load BindingDB in chunks to filter for specific targets."""
results = []
for chunk in pd.read_csv(filepath, sep="\t", chunksize=chunk_size,
low_memory=False, on_bad_lines='skip'):
# Filter for target
mask = chunk["UniProt (SwissProt) Primary ID of Target Chain"].isin(uniprot_ids)
if mask.any():
results.append(chunk[mask])
if results:
return pd.concat(results)
return pd.DataFrame()
pKi / pIC50 Conversion
Converting raw affinity to logarithmic scale (common in ML):
import numpy as np
def to_log_affinity(affinity_nM):
"""Convert nM affinity to pAffinity (negative log molar)."""
affinity_M = affinity_nM * 1e-9 # Convert nM to M
return -np.log10(affinity_M)
# Examples:
# 1 nM → pAffinity = 9.0
# 10 nM → pAffinity = 8.0
# 100 nM → pAffinity = 7.0
# 1 μM → pAffinity = 6.0
# 10 μM → pAffinity = 5.0
Quality Filters
When using BindingDB data for ML or SAR:
def filter_quality(df):
"""Apply quality filters to BindingDB data."""
# 1. Require valid SMILES
df = df[df["Ligand SMILES"].notna() & (df["Ligand SMILES"] != "")]
# 2. Require valid affinity
df = df[df["Ki (nM)"].notna() | df["IC50 (nM)"].notna()]
# 3. Filter extreme values (artifacts)
for col in ["Ki (nM)", "IC50 (nM)", "Kd (nM)"]:
if col in df.columns:
df = df[~(df[col] > 1e6)] # Remove > 1 mM (non-specific)
# 4. Use only human targets
if "Target Source Organism According to Curator or DataSource" in df.columns:
df = df[df["Target Source Organism According to Curator or DataSource"].str.contains(
"Homo sapiens", na=False
)]
return df