mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-01-26 16:58:56 +08:00
439 lines
10 KiB
Markdown
439 lines
10 KiB
Markdown
# Rowan RDKit-Native API Reference
|
|
|
|
## Overview
|
|
|
|
The RDKit-native API provides a simplified interface for users working with RDKit molecules. Functions automatically handle:
|
|
|
|
1. Converting RDKit molecules to Rowan's internal format
|
|
2. Allocating cloud compute resources
|
|
3. Executing multi-step workflows
|
|
4. Monitoring job completion
|
|
5. Returning RDKit-compatible results
|
|
|
|
## Table of Contents
|
|
|
|
1. [pKa Functions](#pka-functions)
|
|
2. [Tautomer Functions](#tautomer-functions)
|
|
3. [Conformer Functions](#conformer-functions)
|
|
4. [Energy Functions](#energy-functions)
|
|
5. [Optimization Functions](#optimization-functions)
|
|
6. [Batch Processing Patterns](#batch-processing-patterns)
|
|
|
|
---
|
|
|
|
## pKa Functions
|
|
|
|
### `run_pka`
|
|
|
|
Calculate pKa for a single molecule.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
mol = Chem.MolFromSmiles("c1ccccc1O") # Phenol
|
|
result = rowan.run_pka(mol)
|
|
|
|
print(f"Strongest acid pKa: {result.strongest_acid}")
|
|
print(f"Strongest base pKa: {result.strongest_base}")
|
|
print(f"Microscopic pKas: {result.microscopic_pkas}")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mol` (rdkit.Chem.Mol): RDKit molecule object
|
|
|
|
**Returns:** `PKAResult` object with attributes:
|
|
- `strongest_acid`: float - pKa of most acidic proton
|
|
- `strongest_base`: float - pKa of most basic site
|
|
- `microscopic_pkas`: list - Site-specific pKa values
|
|
- `tautomer_populations`: dict - Populations at pH 7
|
|
|
|
---
|
|
|
|
### `batch_pka`
|
|
|
|
Calculate pKa for multiple molecules in parallel.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
smiles_list = ["CCO", "CC(=O)O", "c1ccccc1O", "c1ccccc1N"]
|
|
mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
|
|
|
|
results = rowan.batch_pka(mols)
|
|
|
|
for smi, result in zip(smiles_list, results):
|
|
if result is not None:
|
|
print(f"{smi}: pKa = {result.strongest_acid:.2f}")
|
|
else:
|
|
print(f"{smi}: Failed")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mols` (list[rdkit.Chem.Mol]): List of RDKit molecules
|
|
|
|
**Returns:** `list[PKAResult | None]` - Results for each molecule (None if failed)
|
|
|
|
---
|
|
|
|
## Tautomer Functions
|
|
|
|
### `run_tautomers`
|
|
|
|
Enumerate and rank tautomers.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
mol = Chem.MolFromSmiles("Oc1ncnc2[nH]cnc12") # Hypoxanthine
|
|
result = rowan.run_tautomers(mol)
|
|
|
|
print(f"Number of tautomers: {len(result.tautomers)}")
|
|
for i, (taut, pop) in enumerate(zip(result.tautomers, result.populations)):
|
|
print(f"Tautomer {i}: {Chem.MolToSmiles(taut)}, Population: {pop:.1%}")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mol` (rdkit.Chem.Mol): RDKit molecule object
|
|
|
|
**Returns:** `TautomerResult` object with attributes:
|
|
- `tautomers`: list[rdkit.Chem.Mol] - Tautomer structures
|
|
- `energies`: list[float] - Relative energies (kcal/mol)
|
|
- `populations`: list[float] - Boltzmann populations at 298 K
|
|
|
|
---
|
|
|
|
### `batch_tautomers`
|
|
|
|
Enumerate tautomers for multiple molecules.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
|
|
results = rowan.batch_tautomers(mols)
|
|
|
|
for smi, result in zip(smiles_list, results):
|
|
if result:
|
|
print(f"{smi}: {len(result.tautomers)} tautomers")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mols` (list[rdkit.Chem.Mol]): List of RDKit molecules
|
|
|
|
**Returns:** `list[TautomerResult | None]`
|
|
|
|
---
|
|
|
|
## Conformer Functions
|
|
|
|
### `run_conformers`
|
|
|
|
Generate and optimize conformer ensemble.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
mol = Chem.MolFromSmiles("CCCC") # Butane
|
|
result = rowan.run_conformers(mol)
|
|
|
|
print(f"Number of conformers: {len(result.conformers)}")
|
|
print(f"Energy range: {result.energy_range:.2f} kcal/mol")
|
|
|
|
# Get lowest energy conformer
|
|
best_conformer = result.lowest_energy_conformer
|
|
print(f"Lowest energy: {result.energies[0]:.4f} Hartree")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mol` (rdkit.Chem.Mol): RDKit molecule object
|
|
|
|
**Returns:** `ConformerResult` object with attributes:
|
|
- `conformers`: list[rdkit.Chem.Mol] - Conformer structures (with 3D coordinates)
|
|
- `energies`: list[float] - Energies in Hartree
|
|
- `lowest_energy_conformer`: rdkit.Chem.Mol - Global minimum
|
|
- `energy_range`: float - Energy span in kcal/mol
|
|
- `boltzmann_weights`: list[float] - Population weights
|
|
|
|
---
|
|
|
|
### `batch_conformers`
|
|
|
|
Generate conformers for multiple molecules.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
|
|
results = rowan.batch_conformers(mols)
|
|
|
|
for smi, result in zip(smiles_list, results):
|
|
if result:
|
|
print(f"{smi}: {len(result.conformers)} conformers, range = {result.energy_range:.2f} kcal/mol")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mols` (list[rdkit.Chem.Mol]): List of RDKit molecules
|
|
|
|
**Returns:** `list[ConformerResult | None]`
|
|
|
|
---
|
|
|
|
## Energy Functions
|
|
|
|
### `run_energy`
|
|
|
|
Calculate single-point energy.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
from rdkit.Chem import AllChem
|
|
|
|
# Create molecule with 3D coordinates
|
|
mol = Chem.MolFromSmiles("CCO")
|
|
mol = Chem.AddHs(mol)
|
|
AllChem.EmbedMolecule(mol)
|
|
AllChem.MMFFOptimizeMolecule(mol)
|
|
|
|
result = rowan.run_energy(mol)
|
|
|
|
print(f"Energy: {result.energy:.6f} Hartree")
|
|
print(f"Dipole moment: {result.dipole_magnitude:.2f} Debye")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mol` (rdkit.Chem.Mol): RDKit molecule with 3D coordinates
|
|
|
|
**Returns:** `EnergyResult` object with attributes:
|
|
- `energy`: float - Total energy (Hartree)
|
|
- `dipole`: tuple[float, float, float] - Dipole vector
|
|
- `dipole_magnitude`: float - Dipole magnitude (Debye)
|
|
- `mulliken_charges`: list[float] - Atomic charges
|
|
|
|
---
|
|
|
|
### `batch_energy`
|
|
|
|
Calculate energies for multiple molecules.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
# Molecules must have 3D coordinates
|
|
results = rowan.batch_energy(mols_3d)
|
|
|
|
for mol, result in zip(mols_3d, results):
|
|
if result:
|
|
print(f"{Chem.MolToSmiles(mol)}: E = {result.energy:.6f} Ha")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mols` (list[rdkit.Chem.Mol]): List of molecules with 3D coordinates
|
|
|
|
**Returns:** `list[EnergyResult | None]`
|
|
|
|
---
|
|
|
|
## Optimization Functions
|
|
|
|
### `run_optimization`
|
|
|
|
Optimize molecular geometry.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
from rdkit.Chem import AllChem
|
|
|
|
# Start from initial guess
|
|
mol = Chem.MolFromSmiles("CC(=O)O")
|
|
mol = Chem.AddHs(mol)
|
|
AllChem.EmbedMolecule(mol)
|
|
|
|
result = rowan.run_optimization(mol)
|
|
|
|
print(f"Final energy: {result.energy:.6f} Hartree")
|
|
print(f"Converged: {result.converged}")
|
|
|
|
# Get optimized structure
|
|
optimized_mol = result.molecule
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mol` (rdkit.Chem.Mol): RDKit molecule (3D coordinates optional)
|
|
|
|
**Returns:** `OptimizationResult` object with attributes:
|
|
- `molecule`: rdkit.Chem.Mol - Optimized structure
|
|
- `energy`: float - Final energy (Hartree)
|
|
- `converged`: bool - Optimization convergence
|
|
- `n_steps`: int - Number of optimization steps
|
|
|
|
---
|
|
|
|
### `batch_optimization`
|
|
|
|
Optimize multiple molecules.
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
results = rowan.batch_optimization(mols)
|
|
|
|
for mol, result in zip(mols, results):
|
|
if result and result.converged:
|
|
print(f"{Chem.MolToSmiles(mol)}: E = {result.energy:.6f} Ha")
|
|
```
|
|
|
|
**Parameters:**
|
|
- `mols` (list[rdkit.Chem.Mol]): List of RDKit molecules
|
|
|
|
**Returns:** `list[OptimizationResult | None]`
|
|
|
|
---
|
|
|
|
## Batch Processing Patterns
|
|
|
|
### Parallel Processing with Progress
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
from tqdm import tqdm
|
|
|
|
smiles_list = ["CCO", "CC(=O)O", "c1ccccc1O", "c1ccc(O)c(O)c1"]
|
|
mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
|
|
|
|
# Batch functions automatically distribute across multiple workers
|
|
print("Submitting batch pKa calculations...")
|
|
results = rowan.batch_pka(mols)
|
|
|
|
# Process results
|
|
for smi, result in zip(smiles_list, results):
|
|
if result:
|
|
print(f"{smi}: pKa = {result.strongest_acid:.2f}")
|
|
else:
|
|
print(f"{smi}: calculation failed")
|
|
```
|
|
|
|
### Error Handling
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
|
|
def safe_pka(smiles):
|
|
"""Safely calculate pKa with error handling."""
|
|
try:
|
|
mol = Chem.MolFromSmiles(smiles)
|
|
if mol is None:
|
|
return None, "Invalid SMILES"
|
|
|
|
result = rowan.run_pka(mol)
|
|
return result, None
|
|
|
|
except rowan.RowanAPIError as e:
|
|
return None, f"API error: {e}"
|
|
except Exception as e:
|
|
return None, f"Error: {e}"
|
|
|
|
# Usage
|
|
result, error = safe_pka("c1ccccc1O")
|
|
if error:
|
|
print(f"Failed: {error}")
|
|
else:
|
|
print(f"pKa: {result.strongest_acid}")
|
|
```
|
|
|
|
### Combining with RDKit Workflows
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
from rdkit.Chem import Descriptors, AllChem
|
|
|
|
# Load molecules
|
|
mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
|
|
|
|
# Filter by RDKit descriptors first
|
|
filtered_mols = [
|
|
mol for mol in mols
|
|
if mol and Descriptors.MolWt(mol) < 500
|
|
]
|
|
|
|
# Calculate pKa only for filtered set
|
|
pka_results = rowan.batch_pka(filtered_mols)
|
|
|
|
# Combine results
|
|
for mol, pka in zip(filtered_mols, pka_results):
|
|
if pka:
|
|
mw = Descriptors.MolWt(mol)
|
|
print(f"{Chem.MolToSmiles(mol)}: MW={mw:.1f}, pKa={pka.strongest_acid:.2f}")
|
|
```
|
|
|
|
### Virtual Screening Pipeline
|
|
|
|
```python
|
|
import rowan
|
|
from rdkit import Chem
|
|
from rdkit.Chem import Descriptors
|
|
import pandas as pd
|
|
|
|
def screen_compounds(smiles_list):
|
|
"""Screen compounds for drug-likeness and calculate pKa."""
|
|
results = []
|
|
|
|
mols = [Chem.MolFromSmiles(smi) for smi in smiles_list]
|
|
valid_mols = [(smi, mol) for smi, mol in zip(smiles_list, mols) if mol]
|
|
|
|
# Batch pKa calculation
|
|
pka_results = rowan.batch_pka([mol for _, mol in valid_mols])
|
|
|
|
for (smi, mol), pka in zip(valid_mols, pka_results):
|
|
result = {
|
|
'smiles': smi,
|
|
'mw': Descriptors.MolWt(mol),
|
|
'logp': Descriptors.MolLogP(mol),
|
|
'hbd': Descriptors.NumHDonors(mol),
|
|
'hba': Descriptors.NumHAcceptors(mol),
|
|
'pka': pka.strongest_acid if pka else None
|
|
}
|
|
results.append(result)
|
|
|
|
return pd.DataFrame(results)
|
|
|
|
# Usage
|
|
df = screen_compounds(compound_library)
|
|
print(df[df['pka'].notna()].sort_values('pka'))
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Considerations
|
|
|
|
1. **Batch functions are more efficient** - Submit multiple molecules at once rather than one by one
|
|
2. **Fractional credits** - Low-cost calculations may use < 1 credit (e.g., 0.17 credits for fast pKa)
|
|
3. **Automatic parallelization** - Batch functions distribute work across Rowan's compute cluster
|
|
4. **Results caching** - Previously calculated molecules may return faster
|
|
|
|
---
|
|
|
|
## Comparison with Full API
|
|
|
|
| Feature | RDKit-Native | Full API |
|
|
|---------|--------------|----------|
|
|
| Input format | RDKit Mol | stjames.Molecule |
|
|
| Output format | RDKit Mol + results | Workflow object |
|
|
| Workflow control | Automatic | Manual wait/fetch |
|
|
| Folder organization | No | Yes |
|
|
| Advanced parameters | Default only | Full control |
|
|
|
|
Use RDKit-native API for quick calculations; use full API for complex workflows or when you need fine-grained control.
|