Files
huangkuanlin 7f94783fab Add scVelo RNA velocity analysis workflow and IQ-TREE reference documentation
- Introduced a comprehensive RNA velocity analysis pipeline using scVelo, including data loading, preprocessing, velocity estimation, and visualization.
- Added a script for running RNA velocity analysis with customizable parameters and output options.
- Created detailed documentation for IQ-TREE 2 phylogenetic inference, covering command syntax, model selection, bootstrapping methods, and output interpretation.
- Included references for velocity models and their mathematical framework, along with a comparison of different models.
- Enhanced the scVelo skill documentation with installation instructions, use cases, and best practices for RNA velocity analysis.
2026-03-03 07:15:36 -05:00

5.7 KiB

BindingDB Affinity Query Reference

Affinity Measurement Types

Ki (Inhibition Constant)

  • Definition: Equilibrium constant for inhibitor-enzyme complex dissociation
  • Equation: Ki = [E][I]/[EI]
  • Usage: Enzyme inhibition; preferred for mechanistic studies
  • Note: Independent of substrate concentration (unlike IC50)

Kd (Dissociation Constant)

  • Definition: Thermodynamic binding equilibrium constant
  • Equation: Kd = [A][B]/[AB]
  • Usage: Direct binding assays (SPR, ITC, fluorescence anisotropy)
  • Note: True measure of binding strength; lower = tighter binding

IC50 (Half-Maximal Inhibitory Concentration)

  • Definition: Concentration of inhibitor that reduces target activity by 50%
  • Usage: Most common in drug discovery; assay-dependent
  • Conversion to Ki: Cheng-Prusoff equation: Ki = IC50 / (1 + [S]/Km)
  • Note: Depends on substrate concentration and assay conditions

EC50 (Half-Maximal Effective Concentration)

  • Definition: Concentration that produces 50% of maximal effect
  • Usage: Cell-based assays, agonist studies

Kinetics Parameters

  • kon: Association rate constant (M⁻¹s⁻¹); describes how fast complex forms
  • koff: Dissociation rate constant (s⁻¹); describes how fast complex dissociates
  • Residence time: τ = 1/koff; longer residence = more sustained effect
  • Kd from kinetics: Kd = koff/kon

Common API Query Patterns

By UniProt ID (REST API)

import requests

def query_by_uniprot(uniprot_id, affinity_type="Ki"):
    """
    REST API query for BindingDB affinities by UniProt target ID.
    """
    url = "https://www.bindingdb.org/axis2/services/BDBService/getLigandsByUniprotID"
    params = {
        "uniprot_id": uniprot_id,
        "cutoff": "10000",  # nM threshold
        "affinity_type": affinity_type,
        "response": "json"
    }
    response = requests.get(url, params=params)
    return response.json()

# Important targets
COMMON_TARGETS = {
    "ABL1": "P00519",    # Imatinib, dasatinib target
    "EGFR": "P00533",    # Erlotinib, gefitinib target
    "BRAF": "P15056",    # Vemurafenib, dabrafenib target
    "CDK2": "P24941",    # Cell cycle kinase
    "HDAC1": "Q13547",   # Histone deacetylase
    "BRD4": "O60885",    # BET bromodomain reader
    "MDM2": "Q00987",    # p53 negative regulator
    "BCL2": "P10415",    # Antiapoptotic protein
    "PCSK9": "Q8NBP7",   # Cholesterol regulator
    "JAK2": "O60674",    # Cytokine signaling kinase
}

By PubChem CID (REST API)

def query_by_pubchem_cid(pubchem_cid):
    """Get all binding data for a specific compound by PubChem CID."""
    url = "https://www.bindingdb.org/axis2/services/BDBService/getAffinitiesByCID"
    params = {"cid": pubchem_cid, "response": "json"}
    response = requests.get(url, params=params)
    return response.json()

# Example: Imatinib PubChem CID = 5291
imatinib_data = query_by_pubchem_cid(5291)

By Target Name

def query_by_target_name(target_name, affinity_cutoff=100):
    """Query BindingDB by target name."""
    url = "https://www.bindingdb.org/axis2/services/BDBService/getAffinitiesByTarget"
    params = {
        "target_name": target_name,
        "cutoff": affinity_cutoff,
        "response": "json"
    }
    response = requests.get(url, params=params)
    return response.json()

Dataset Download Guide

Available Files

File Size Contents
BindingDB_All.tsv.zip ~3.5 GB All data: ~2.9M records
BindingDB_All.sdf.zip ~7 GB All data with 3D structures
BindingDB_IC50.tsv ~1.5 GB IC50 data only
BindingDB_Ki.tsv ~0.8 GB Ki data only
BindingDB_Kd.tsv ~0.2 GB Kd data only
BindingDB_EC50.tsv ~0.5 GB EC50 data only
tdc_bindingdb_* Various TDC-formatted subsets

Efficient Loading

import pandas as pd

# For large files, use chunking
def load_bindingdb_chunked(filepath, uniprot_ids, affinity_col="Ki (nM)", chunk_size=100000):
    """Load BindingDB in chunks to filter for specific targets."""
    results = []
    for chunk in pd.read_csv(filepath, sep="\t", chunksize=chunk_size,
                              low_memory=False, on_bad_lines='skip'):
        # Filter for target
        mask = chunk["UniProt (SwissProt) Primary ID of Target Chain"].isin(uniprot_ids)
        if mask.any():
            results.append(chunk[mask])

    if results:
        return pd.concat(results)
    return pd.DataFrame()

pKi / pIC50 Conversion

Converting raw affinity to logarithmic scale (common in ML):

import numpy as np

def to_log_affinity(affinity_nM):
    """Convert nM affinity to pAffinity (negative log molar)."""
    affinity_M = affinity_nM * 1e-9  # Convert nM to M
    return -np.log10(affinity_M)

# Examples:
# 1 nM   → pAffinity = 9.0
# 10 nM  → pAffinity = 8.0
# 100 nM → pAffinity = 7.0
# 1 μM   → pAffinity = 6.0
# 10 μM  → pAffinity = 5.0

Quality Filters

When using BindingDB data for ML or SAR:

def filter_quality(df):
    """Apply quality filters to BindingDB data."""
    # 1. Require valid SMILES
    df = df[df["Ligand SMILES"].notna() & (df["Ligand SMILES"] != "")]

    # 2. Require valid affinity
    df = df[df["Ki (nM)"].notna() | df["IC50 (nM)"].notna()]

    # 3. Filter extreme values (artifacts)
    for col in ["Ki (nM)", "IC50 (nM)", "Kd (nM)"]:
        if col in df.columns:
            df = df[~(df[col] > 1e6)]  # Remove > 1 mM (non-specific)

    # 4. Use only human targets
    if "Target Source Organism According to Curator or DataSource" in df.columns:
        df = df[df["Target Source Organism According to Curator or DataSource"].str.contains(
            "Homo sapiens", na=False
        )]

    return df