mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
399 lines
9.9 KiB
Markdown
399 lines
9.9 KiB
Markdown
---
|
|
name: medchem
|
|
description: Python library for molecular filtering and prioritization in drug discovery. Use when applying medicinal chemistry rules (Rule of Five, CNS, leadlike), detecting structural alerts (PAINS, NIBR, Lilly demerits), analyzing chemical groups, calculating molecular complexity, or filtering compound libraries. Works with SMILES strings and RDKit mol objects, with built-in parallelization for large datasets.
|
|
---
|
|
|
|
# Medchem
|
|
|
|
## Overview
|
|
|
|
Medchem is a Python library for molecular filtering and prioritization in drug discovery workflows. It provides hundreds of well-established and novel molecular filters, structural alerts, and medicinal chemistry rules to efficiently triage and prioritize compound libraries at scale.
|
|
|
|
**Key Principle:** Rules and filters are always context-specific. Avoid blindly applying filters—marketed drugs often don't pass standard medchem filters, and prodrugs may intentionally violate rules. Use these tools as guidelines combined with domain expertise.
|
|
|
|
## Installation
|
|
|
|
Install medchem via conda or pip:
|
|
|
|
```bash
|
|
# Via conda
|
|
micromamba install -c conda-forge medchem
|
|
|
|
# Via pip
|
|
pip install medchem
|
|
```
|
|
|
|
## Core Capabilities
|
|
|
|
### 1. Medicinal Chemistry Rules
|
|
|
|
Apply established drug-likeness rules to molecules using the `medchem.rules` module.
|
|
|
|
**Available Rules:**
|
|
- Rule of Five (Lipinski)
|
|
- Rule of Oprea
|
|
- Rule of CNS
|
|
- Rule of leadlike (soft and strict)
|
|
- Rule of three
|
|
- Rule of Reos
|
|
- Rule of drug
|
|
- Rule of Veber
|
|
- Golden triangle
|
|
- PAINS filters
|
|
|
|
**Single Rule Application:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Apply Rule of Five to a SMILES string
|
|
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin
|
|
passes = mc.rules.basic_rules.rule_of_five(smiles)
|
|
# Returns: True
|
|
|
|
# Check specific rules
|
|
passes_oprea = mc.rules.basic_rules.rule_of_oprea(smiles)
|
|
passes_cns = mc.rules.basic_rules.rule_of_cns(smiles)
|
|
```
|
|
|
|
**Multiple Rules with RuleFilters:**
|
|
|
|
```python
|
|
import datamol as dm
|
|
import medchem as mc
|
|
|
|
# Load molecules
|
|
mols = [dm.to_mol(smiles) for smiles in smiles_list]
|
|
|
|
# Create filter with multiple rules
|
|
rfilter = mc.rules.RuleFilters(
|
|
rule_list=[
|
|
"rule_of_five",
|
|
"rule_of_oprea",
|
|
"rule_of_cns",
|
|
"rule_of_leadlike_soft"
|
|
]
|
|
)
|
|
|
|
# Apply filters with parallelization
|
|
results = rfilter(
|
|
mols=mols,
|
|
n_jobs=-1, # Use all CPU cores
|
|
progress=True
|
|
)
|
|
```
|
|
|
|
**Result Format:**
|
|
Results are returned as dictionaries with pass/fail status and detailed information for each rule.
|
|
|
|
### 2. Structural Alert Filters
|
|
|
|
Detect potentially problematic structural patterns using the `medchem.structural` module.
|
|
|
|
**Available Filters:**
|
|
|
|
1. **Common Alerts** - General structural alerts derived from ChEMBL curation and literature
|
|
2. **NIBR Filters** - Novartis Institutes for BioMedical Research filter set
|
|
3. **Lilly Demerits** - Eli Lilly's demerit-based system (275 rules, molecules rejected at >100 demerits)
|
|
|
|
**Common Alerts:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Create filter
|
|
alert_filter = mc.structural.CommonAlertsFilters()
|
|
|
|
# Check single molecule
|
|
mol = dm.to_mol("c1ccccc1")
|
|
has_alerts, details = alert_filter.check_mol(mol)
|
|
|
|
# Batch filtering with parallelization
|
|
results = alert_filter(
|
|
mols=mol_list,
|
|
n_jobs=-1,
|
|
progress=True
|
|
)
|
|
```
|
|
|
|
**NIBR Filters:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Apply NIBR filters
|
|
nibr_filter = mc.structural.NIBRFilters()
|
|
results = nibr_filter(mols=mol_list, n_jobs=-1)
|
|
```
|
|
|
|
**Lilly Demerits:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Calculate Lilly demerits
|
|
lilly = mc.structural.LillyDemeritsFilters()
|
|
results = lilly(mols=mol_list, n_jobs=-1)
|
|
|
|
# Each result includes demerit score and whether it passes (≤100 demerits)
|
|
```
|
|
|
|
### 3. Functional API for High-Level Operations
|
|
|
|
The `medchem.functional` module provides convenient functions for common workflows.
|
|
|
|
**Quick Filtering:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Apply NIBR filters to a list
|
|
filter_ok = mc.functional.nibr_filter(
|
|
mols=mol_list,
|
|
n_jobs=-1
|
|
)
|
|
|
|
# Apply common alerts
|
|
alert_results = mc.functional.common_alerts_filter(
|
|
mols=mol_list,
|
|
n_jobs=-1
|
|
)
|
|
```
|
|
|
|
### 4. Chemical Groups Detection
|
|
|
|
Identify specific chemical groups and functional groups using `medchem.groups`.
|
|
|
|
**Available Groups:**
|
|
- Hinge binders
|
|
- Phosphate binders
|
|
- Michael acceptors
|
|
- Reactive groups
|
|
- Custom SMARTS patterns
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Create group detector
|
|
group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
|
|
|
|
# Check for matches
|
|
has_matches = group.has_match(mol_list)
|
|
|
|
# Get detailed match information
|
|
matches = group.get_matches(mol)
|
|
```
|
|
|
|
### 5. Named Catalogs
|
|
|
|
Access curated collections of chemical structures through `medchem.catalogs`.
|
|
|
|
**Available Catalogs:**
|
|
- Functional groups
|
|
- Protecting groups
|
|
- Common reagents
|
|
- Standard fragments
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Access named catalogs
|
|
catalogs = mc.catalogs.NamedCatalogs
|
|
|
|
# Use catalog for matching
|
|
catalog = catalogs.get("functional_groups")
|
|
matches = catalog.get_matches(mol)
|
|
```
|
|
|
|
### 6. Molecular Complexity
|
|
|
|
Calculate complexity metrics that approximate synthetic accessibility using `medchem.complexity`.
|
|
|
|
**Common Metrics:**
|
|
- Bertz complexity
|
|
- Whitlock complexity
|
|
- Barone complexity
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Calculate complexity
|
|
complexity_score = mc.complexity.calculate_complexity(mol)
|
|
|
|
# Filter by complexity threshold
|
|
complex_filter = mc.complexity.ComplexityFilter(max_complexity=500)
|
|
results = complex_filter(mols=mol_list)
|
|
```
|
|
|
|
### 7. Constraints Filtering
|
|
|
|
Apply custom property-based constraints using `medchem.constraints`.
|
|
|
|
**Example Constraints:**
|
|
- Molecular weight ranges
|
|
- LogP bounds
|
|
- TPSA limits
|
|
- Rotatable bond counts
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Define constraints
|
|
constraints = mc.constraints.Constraints(
|
|
mw_range=(200, 500),
|
|
logp_range=(-2, 5),
|
|
tpsa_max=140,
|
|
rotatable_bonds_max=10
|
|
)
|
|
|
|
# Apply constraints
|
|
results = constraints(mols=mol_list, n_jobs=-1)
|
|
```
|
|
|
|
### 8. Medchem Query Language
|
|
|
|
Use a specialized query language for complex filtering criteria.
|
|
|
|
**Query Examples:**
|
|
```
|
|
# Molecules passing Ro5 AND not having common alerts
|
|
"rule_of_five AND NOT common_alerts"
|
|
|
|
# CNS-like molecules with low complexity
|
|
"rule_of_cns AND complexity < 400"
|
|
|
|
# Leadlike molecules without Lilly demerits
|
|
"rule_of_leadlike AND lilly_demerits == 0"
|
|
```
|
|
|
|
**Usage:**
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Parse and apply query
|
|
query = mc.query.parse("rule_of_five AND NOT common_alerts")
|
|
results = query.apply(mols=mol_list, n_jobs=-1)
|
|
```
|
|
|
|
## Workflow Patterns
|
|
|
|
### Pattern 1: Initial Triage of Compound Library
|
|
|
|
Filter a large compound collection to identify drug-like candidates.
|
|
|
|
```python
|
|
import datamol as dm
|
|
import medchem as mc
|
|
import pandas as pd
|
|
|
|
# Load compound library
|
|
df = pd.read_csv("compounds.csv")
|
|
mols = [dm.to_mol(smi) for smi in df["smiles"]]
|
|
|
|
# Apply primary filters
|
|
rule_filter = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])
|
|
rule_results = rule_filter(mols=mols, n_jobs=-1, progress=True)
|
|
|
|
# Apply structural alerts
|
|
alert_filter = mc.structural.CommonAlertsFilters()
|
|
alert_results = alert_filter(mols=mols, n_jobs=-1, progress=True)
|
|
|
|
# Combine results
|
|
df["passes_rules"] = rule_results["pass"]
|
|
df["has_alerts"] = alert_results["has_alerts"]
|
|
df["drug_like"] = df["passes_rules"] & ~df["has_alerts"]
|
|
|
|
# Save filtered compounds
|
|
filtered_df = df[df["drug_like"]]
|
|
filtered_df.to_csv("filtered_compounds.csv", index=False)
|
|
```
|
|
|
|
### Pattern 2: Lead Optimization Filtering
|
|
|
|
Apply stricter criteria during lead optimization.
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Create comprehensive filter
|
|
filters = {
|
|
"rules": mc.rules.RuleFilters(rule_list=["rule_of_leadlike_strict"]),
|
|
"alerts": mc.structural.NIBRFilters(),
|
|
"lilly": mc.structural.LillyDemeritsFilters(),
|
|
"complexity": mc.complexity.ComplexityFilter(max_complexity=400)
|
|
}
|
|
|
|
# Apply all filters
|
|
results = {}
|
|
for name, filt in filters.items():
|
|
results[name] = filt(mols=candidate_mols, n_jobs=-1)
|
|
|
|
# Identify compounds passing all filters
|
|
passes_all = all(r["pass"] for r in results.values())
|
|
```
|
|
|
|
### Pattern 3: Identify Specific Chemical Groups
|
|
|
|
Find molecules containing specific functional groups or scaffolds.
|
|
|
|
```python
|
|
import medchem as mc
|
|
|
|
# Create group detector for multiple groups
|
|
group_detector = mc.groups.ChemicalGroup(
|
|
groups=["hinge_binders", "phosphate_binders"]
|
|
)
|
|
|
|
# Screen library
|
|
matches = group_detector.get_all_matches(mol_list)
|
|
|
|
# Filter molecules with desired groups
|
|
mol_with_groups = [mol for mol, match in zip(mol_list, matches) if match]
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Context Matters**: Don't blindly apply filters. Understand the biological target and chemical space.
|
|
|
|
2. **Combine Multiple Filters**: Use rules, structural alerts, and domain knowledge together for better decisions.
|
|
|
|
3. **Use Parallelization**: For large datasets (>1000 molecules), always use `n_jobs=-1` for parallel processing.
|
|
|
|
4. **Iterative Refinement**: Start with broad filters (Ro5), then apply more specific criteria (CNS, leadlike) as needed.
|
|
|
|
5. **Document Filtering Decisions**: Track which molecules were filtered out and why for reproducibility.
|
|
|
|
6. **Validate Results**: Remember that marketed drugs often fail standard filters—use these as guidelines, not absolute rules.
|
|
|
|
7. **Consider Prodrugs**: Molecules designed as prodrugs may intentionally violate standard medicinal chemistry rules.
|
|
|
|
## Resources
|
|
|
|
### references/api_guide.md
|
|
Comprehensive API reference covering all medchem modules with detailed function signatures, parameters, and return types.
|
|
|
|
### references/rules_catalog.md
|
|
Complete catalog of available rules, filters, and alerts with descriptions, thresholds, and literature references.
|
|
|
|
### scripts/filter_molecules.py
|
|
Production-ready script for batch filtering workflows. Supports multiple input formats (CSV, SDF, SMILES), configurable filter combinations, and detailed reporting.
|
|
|
|
**Usage:**
|
|
```bash
|
|
python scripts/filter_molecules.py input.csv --rules rule_of_five,rule_of_cns --alerts nibr --output filtered.csv
|
|
```
|
|
|
|
## Documentation
|
|
|
|
Official documentation: https://medchem-docs.datamol.io/
|
|
GitHub repository: https://github.com/datamol-io/medchem
|