15 KiB
Biomni API Reference
This document provides comprehensive API documentation for the Biomni biomedical AI agent system.
Core Classes
A1 Agent
The primary agent class for executing biomedical research tasks.
Initialization
from biomni.agent import A1
agent = A1(
path='./data', # Path to biomedical knowledge base
llm='claude-sonnet-4-20250514', # LLM model identifier
timeout=None, # Optional timeout in seconds
verbose=True # Enable detailed logging
)
Parameters:
path(str, required): Directory path where the biomedical knowledge base is stored or will be downloaded. First-time initialization will download ~11GB of data.llm(str, optional): LLM model identifier. Defaults to the value indefault_config.llm. Supports multiple providers (see LLM Providers section).timeout(int, optional): Maximum execution time in seconds for agent operations. Overridesdefault_config.timeout_seconds.verbose(bool, optional): Enable verbose logging for debugging. Default: True.
Returns: A1 agent instance ready for task execution.
Methods
go(task_description: str) -> None
Execute a biomedical research task autonomously.
agent.go("Analyze this scRNA-seq dataset and identify cell types")
Parameters:
task_description(str, required): Natural language description of the biomedical task to execute. Be specific about:- Data location and format
- Desired analysis or output
- Any specific methods or parameters
- Expected results format
Behavior:
- Decomposes the task into executable steps
- Retrieves relevant biomedical knowledge from the data lake
- Generates and executes Python/R code
- Provides results and visualizations
- Handles errors and retries with refinement
Notes:
- Executes code with system privileges - use in sandboxed environments
- Long-running tasks may require timeout adjustments
- Intermediate results are displayed during execution
save_conversation_history(output_path: str, format: str = 'pdf') -> None
Export conversation history and execution trace as a formatted report.
agent.save_conversation_history(
output_path='./reports/analysis_log.pdf',
format='pdf'
)
Parameters:
output_path(str, required): File path for the output reportformat(str, optional): Output format. Options: 'pdf', 'markdown'. Default: 'pdf'
Requirements:
- For PDF: Install one of: WeasyPrint, markdown2pdf, or Pandoc
pip install weasyprint # Recommended # or pip install markdown2pdf # or install Pandoc system-wide
Report Contents:
- Task description and parameters
- Retrieved biomedical knowledge
- Generated code with execution traces
- Results, visualizations, and outputs
- Timestamps and execution metadata
add_mcp(config_path: str) -> None
Add Model Context Protocol (MCP) tools to extend agent capabilities.
agent.add_mcp(config_path='./mcp_tools_config.json')
Parameters:
config_path(str, required): Path to MCP configuration JSON file
MCP Configuration Format:
{
"tools": [
{
"name": "tool_name",
"endpoint": "http://localhost:8000/tool",
"description": "Tool description for LLM",
"parameters": {
"param1": "string",
"param2": "integer"
}
}
]
}
Use Cases:
- Connect to laboratory information systems
- Integrate proprietary databases
- Access specialized computational resources
- Link to institutional data repositories
Configuration
default_config
Global configuration object for Biomni settings.
from biomni.config import default_config
Attributes
llm: str
Default LLM model identifier for all agent instances.
default_config.llm = "claude-sonnet-4-20250514"
Supported Models:
Anthropic:
claude-sonnet-4-20250514(Recommended)claude-opus-4-20250514claude-3-5-sonnet-20241022claude-3-opus-20240229
OpenAI:
gpt-4ogpt-4gpt-4-turbogpt-3.5-turbo
Azure OpenAI:
azure/gpt-4azure/<deployment-name>
Google Gemini:
gemini/gemini-progemini/gemini-1.5-pro
Groq:
groq/llama-3.1-70b-versatilegroq/mixtral-8x7b-32768
Ollama (Local):
ollama/llama3ollama/mistralollama/<model-name>
AWS Bedrock:
bedrock/anthropic.claude-v2bedrock/anthropic.claude-3-sonnet
Custom/Biomni-R0:
openai/biomni-r0(requires local SGLang deployment)
timeout_seconds: int
Default timeout for agent operations in seconds.
default_config.timeout_seconds = 1200 # 20 minutes
Recommended Values:
- Simple tasks (QC, basic analysis): 300-600 seconds
- Medium tasks (differential expression, clustering): 600-1200 seconds
- Complex tasks (full pipelines, ML models): 1200-3600 seconds
- Very complex tasks: 3600+ seconds
data_path: str
Default path to biomedical knowledge base.
default_config.data_path = "/path/to/biomni/data"
Storage Requirements:
- Initial download: ~11GB
- Extracted size: ~15GB
- Additional working space: ~5-10GB recommended
api_base: str
Custom API endpoint for LLM providers (advanced usage).
# For local Biomni-R0 deployment
default_config.api_base = "http://localhost:30000/v1"
# For custom OpenAI-compatible endpoints
default_config.api_base = "https://your-endpoint.com/v1"
max_retries: int
Number of retry attempts for failed operations.
default_config.max_retries = 3
Methods
reset() -> None
Reset all configuration values to system defaults.
default_config.reset()
Database Query System
Biomni includes a retrieval-augmented generation (RAG) system for querying the biomedical knowledge base.
Query Functions
query_genes(query: str, top_k: int = 10) -> List[Dict]
Query gene information from integrated databases.
from biomni.database import query_genes
results = query_genes(
query="genes involved in p53 pathway",
top_k=20
)
Parameters:
query(str): Natural language or gene identifier querytop_k(int): Number of results to return
Returns: List of dictionaries containing:
gene_symbol: Official gene symbolgene_name: Full gene namedescription: Functional descriptionpathways: Associated biological pathwaysgo_terms: Gene Ontology annotationsdiseases: Associated diseasessimilarity_score: Relevance score (0-1)
query_proteins(query: str, top_k: int = 10) -> List[Dict]
Query protein information from UniProt and other sources.
from biomni.database import query_proteins
results = query_proteins(
query="kinase proteins in cell cycle",
top_k=15
)
Returns: List of dictionaries with protein metadata:
uniprot_id: UniProt accessionprotein_name: Protein namefunction: Functional annotationdomains: Protein domainssubcellular_location: Cellular localizationsimilarity_score: Relevance score
query_drugs(query: str, top_k: int = 10) -> List[Dict]
Query drug and compound information.
from biomni.database import query_drugs
results = query_drugs(
query="FDA approved cancer drugs targeting EGFR",
top_k=10
)
Returns: Drug information including:
drug_name: Common namedrugbank_id: DrugBank identifierindication: Therapeutic indicationmechanism: Mechanism of actiontargets: Molecular targetsapproval_status: Regulatory statussmiles: Chemical structure (SMILES notation)
query_diseases(query: str, top_k: int = 10) -> List[Dict]
Query disease information from clinical databases.
from biomni.database import query_diseases
results = query_diseases(
query="autoimmune diseases affecting joints",
top_k=10
)
Returns: Disease data:
disease_name: Standard disease namedisease_id: Ontology identifiersymptoms: Clinical manifestationsassociated_genes: Genetic associationsprevalence: Epidemiological data
query_pathways(query: str, top_k: int = 10) -> List[Dict]
Query biological pathways from KEGG, Reactome, and other sources.
from biomni.database import query_pathways
results = query_pathways(
query="immune response signaling pathways",
top_k=15
)
Returns: Pathway information:
pathway_name: Pathway namepathway_id: Database identifiergenes: Genes in pathwaydescription: Functional descriptionsource: Database source (KEGG, Reactome, etc.)
Data Structures
TaskResult
Result object returned by complex agent operations.
class TaskResult:
success: bool # Whether task completed successfully
output: Any # Task output (varies by task)
code: str # Generated code
execution_time: float # Execution time in seconds
error: Optional[str] # Error message if failed
metadata: Dict # Additional metadata
BiomedicalEntity
Base class for biomedical entities in the knowledge base.
class BiomedicalEntity:
entity_id: str # Unique identifier
entity_type: str # Type (gene, protein, drug, etc.)
name: str # Entity name
description: str # Description
attributes: Dict # Additional attributes
references: List[str] # Literature references
Utility Functions
download_data(path: str, force: bool = False) -> None
Manually download or update the biomedical knowledge base.
from biomni.utils import download_data
download_data(
path='./data',
force=True # Force re-download
)
validate_environment() -> Dict[str, bool]
Check if the environment is properly configured.
from biomni.utils import validate_environment
status = validate_environment()
# Returns: {
# 'conda_env': True,
# 'api_keys': True,
# 'data_available': True,
# 'dependencies': True
# }
list_available_models() -> List[str]
Get a list of available LLM models based on configured API keys.
from biomni.utils import list_available_models
models = list_available_models()
# Returns: ['claude-sonnet-4-20250514', 'gpt-4o', ...]
Error Handling
Common Exceptions
BiomniConfigError
Raised when configuration is invalid or incomplete.
from biomni.exceptions import BiomniConfigError
try:
agent = A1(path='./data')
except BiomniConfigError as e:
print(f"Configuration error: {e}")
BiomniExecutionError
Raised when code generation or execution fails.
from biomni.exceptions import BiomniExecutionError
try:
agent.go("invalid task")
except BiomniExecutionError as e:
print(f"Execution failed: {e}")
# Access failed code: e.code
# Access error details: e.details
BiomniDataError
Raised when knowledge base or data access fails.
from biomni.exceptions import BiomniDataError
try:
results = query_genes("unknown query format")
except BiomniDataError as e:
print(f"Data access error: {e}")
BiomniTimeoutError
Raised when operations exceed timeout limit.
from biomni.exceptions import BiomniTimeoutError
try:
agent.go("very complex long-running task")
except BiomniTimeoutError as e:
print(f"Task timed out after {e.duration} seconds")
# Partial results may be available: e.partial_results
Best Practices
Efficient Knowledge Retrieval
Pre-query databases for relevant context before complex tasks:
from biomni.database import query_genes, query_pathways
# Gather relevant biological context first
genes = query_genes("cell cycle genes", top_k=50)
pathways = query_pathways("cell cycle regulation", top_k=20)
# Then execute task with enriched context
agent.go(f"""
Analyze the cell cycle progression in this dataset.
Focus on these genes: {[g['gene_symbol'] for g in genes]}
Consider these pathways: {[p['pathway_name'] for p in pathways]}
""")
Error Recovery
Implement robust error handling for production workflows:
from biomni.exceptions import BiomniExecutionError, BiomniTimeoutError
max_attempts = 3
for attempt in range(max_attempts):
try:
agent.go("complex biomedical task")
break
except BiomniTimeoutError:
# Increase timeout and retry
default_config.timeout_seconds *= 2
print(f"Timeout, retrying with {default_config.timeout_seconds}s timeout")
except BiomniExecutionError as e:
# Refine task based on error
print(f"Execution failed: {e}, refining task...")
# Optionally modify task description
else:
print("Task failed after max attempts")
Memory Management
For large-scale analyses, manage memory explicitly:
import gc
# Process datasets in chunks
for chunk_id in range(num_chunks):
agent.go(f"Process data chunk {chunk_id} located at data/chunk_{chunk_id}.h5ad")
# Force garbage collection between chunks
gc.collect()
# Save intermediate results
agent.save_conversation_history(f"./reports/chunk_{chunk_id}.pdf")
Reproducibility
Ensure reproducible analyses by:
- Fixing random seeds:
agent.go("Set random seed to 42 for all analyses, then perform clustering...")
- Logging configuration:
import json
config_log = {
'llm': default_config.llm,
'timeout': default_config.timeout_seconds,
'data_path': default_config.data_path,
'timestamp': datetime.now().isoformat()
}
with open('config_log.json', 'w') as f:
json.dump(config_log, f, indent=2)
- Saving execution traces:
# Always save detailed reports
agent.save_conversation_history('./reports/full_analysis.pdf')
Performance Optimization
Model Selection Strategy
Choose models based on task characteristics:
# For exploratory, simple tasks
default_config.llm = "gpt-3.5-turbo" # Fast, cost-effective
# For standard biomedical analyses
default_config.llm = "claude-sonnet-4-20250514" # Recommended
# For complex reasoning and hypothesis generation
default_config.llm = "claude-opus-4-20250514" # Highest quality
# For specialized biological reasoning
default_config.llm = "openai/biomni-r0" # Requires local deployment
Timeout Tuning
Set appropriate timeouts based on task complexity:
# Quick queries and simple analyses
agent = A1(path='./data', timeout=300)
# Standard workflows
agent = A1(path='./data', timeout=1200)
# Full pipelines with ML training
agent = A1(path='./data', timeout=3600)
Caching and Reuse
Reuse agent instances for multiple related tasks:
# Create agent once
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
# Execute multiple related tasks
tasks = [
"Load and QC the scRNA-seq dataset",
"Perform clustering with resolution 0.5",
"Identify marker genes for each cluster",
"Annotate cell types based on markers"
]
for task in tasks:
agent.go(task)
# Save complete workflow
agent.save_conversation_history('./reports/full_workflow.pdf')