Add more scientific skills

This commit is contained in:
Timothy Kassis
2025-10-19 14:12:02 -07:00
parent 78d5ac2b56
commit 660c8574d0
210 changed files with 88957 additions and 1 deletions

View File

@@ -0,0 +1,375 @@
---
name: biomni
description: General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning.
---
# Biomni
## Overview
Biomni is a general-purpose biomedical AI agent that autonomously executes research tasks across diverse biomedical subfields. It combines large language model reasoning with retrieval-augmented planning and code-based execution to enhance scientific productivity and hypothesis generation. The system operates with an ~11GB biomedical knowledge base covering molecular, genomic, and clinical domains.
## Quick Start
Initialize and use the Biomni agent with these basic steps:
```python
from biomni.agent import A1
# Initialize agent with data path and LLM model
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
# Execute a biomedical research task
agent.go("Your biomedical task description")
```
The agent will autonomously decompose the task, retrieve relevant biomedical knowledge, generate and execute code, and provide results.
## Installation and Setup
### Environment Preparation
1. **Set up the conda environment:**
- Follow instructions in `biomni_env/README.md` from the repository
- Activate the environment: `conda activate biomni_e1`
2. **Install the package:**
```bash
pip install biomni --upgrade
```
Or install from source:
```bash
git clone https://github.com/snap-stanford/biomni.git
cd biomni
pip install -e .
```
3. **Configure API keys:**
Set up credentials via environment variables or `.env` file:
```bash
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here" # Optional
```
4. **Data initialization:**
On first use, the agent will automatically download the ~11GB biomedical knowledge base.
### LLM Provider Configuration
Biomni supports multiple LLM providers. Configure the default provider using:
```python
from biomni.config import default_config
# Set the default LLM model
default_config.llm = "claude-sonnet-4-20250514" # Anthropic
# default_config.llm = "gpt-4" # OpenAI
# default_config.llm = "azure/gpt-4" # Azure OpenAI
# default_config.llm = "gemini/gemini-pro" # Google Gemini
# Set timeout (optional)
default_config.timeout_seconds = 1200
# Set data path (optional)
default_config.data_path = "./custom/data/path"
```
Refer to `references/llm_providers.md` for detailed configuration options for each provider.
## Core Biomedical Research Tasks
### 1. CRISPR Screening and Design
Execute CRISPR screening tasks including guide RNA design, off-target analysis, and screening experiment planning:
```python
agent.go("Design a CRISPR screening experiment to identify genes involved in cancer cell resistance to drug X")
```
The agent will:
- Retrieve relevant gene databases
- Design guide RNAs with specificity analysis
- Plan experimental controls and readout strategies
- Generate analysis code for screening results
### 2. Single-Cell RNA-seq Analysis
Perform comprehensive scRNA-seq analysis workflows:
```python
agent.go("Analyze this 10X Genomics scRNA-seq dataset, identify cell types, and find differentially expressed genes between clusters")
```
Capabilities include:
- Quality control and preprocessing
- Dimensionality reduction and clustering
- Cell type annotation using marker databases
- Differential expression analysis
- Pathway enrichment analysis
### 3. Molecular Property Prediction (ADMET)
Predict absorption, distribution, metabolism, excretion, and toxicity properties:
```python
agent.go("Predict ADMET properties for these drug candidates: [SMILES strings]")
```
The agent handles:
- Molecular descriptor calculation
- Property prediction using integrated models
- Toxicity screening
- Drug-likeness assessment
### 4. Genomic Analysis
Execute genomic data analysis tasks:
```python
agent.go("Perform GWAS analysis to identify SNPs associated with disease phenotype in this cohort")
```
Supports:
- Genome-wide association studies (GWAS)
- Variant calling and annotation
- Population genetics analysis
- Functional genomics integration
### 5. Protein Structure and Function
Analyze protein sequences and structures:
```python
agent.go("Predict the structure of this protein sequence and identify potential binding sites")
```
Capabilities:
- Sequence analysis and domain identification
- Structure prediction integration
- Binding site prediction
- Protein-protein interaction analysis
### 6. Disease Diagnosis and Classification
Perform disease classification from multi-omics data:
```python
agent.go("Build a classifier to diagnose disease X from patient RNA-seq and clinical data")
```
### 7. Systems Biology and Pathway Analysis
Analyze biological pathways and networks:
```python
agent.go("Identify dysregulated pathways in this differential expression dataset")
```
### 8. Drug Discovery and Repurposing
Support drug discovery workflows:
```python
agent.go("Identify FDA-approved drugs that could be repurposed for treating disease Y based on mechanism of action")
```
## Advanced Features
### Custom Configuration per Agent
Override global configuration for specific agent instances:
```python
agent = A1(
path='./project_data',
llm='gpt-4o',
timeout=1800
)
```
### Conversation History and Reporting
Save execution traces as formatted PDF reports:
```python
# After executing tasks
agent.save_conversation_history(
output_path='./reports/experiment_log.pdf',
format='pdf'
)
```
Requires one of: WeasyPrint, markdown2pdf, or Pandoc.
### Model Context Protocol (MCP) Integration
Extend agent capabilities with external tools:
```python
# Add MCP-compatible tools
agent.add_mcp(config_path='./mcp_config.json')
```
MCP enables integration with:
- Laboratory information management systems (LIMS)
- Specialized bioinformatics databases
- Custom analysis pipelines
- External computational resources
### Using Biomni-R0 (Specialized Reasoning Model)
Deploy the 32B parameter Biomni-R0 model for enhanced biological reasoning:
```bash
# Install SGLang
pip install "sglang[all]"
# Deploy Biomni-R0
python -m sglang.launch_server \
--model-path snap-stanford/biomni-r0 \
--port 30000 \
--trust-remote-code
```
Then configure the agent:
```python
from biomni.config import default_config
default_config.llm = "openai/biomni-r0"
default_config.api_base = "http://localhost:30000/v1"
```
Biomni-R0 provides specialized reasoning for:
- Complex multi-step biological workflows
- Hypothesis generation and evaluation
- Experimental design optimization
- Literature-informed analysis
## Best Practices
### Task Specification
Provide clear, specific task descriptions:
✅ **Good:** "Analyze this scRNA-seq dataset (file: data.h5ad) to identify T cell subtypes, then perform differential expression analysis comparing activated vs. resting T cells"
❌ **Vague:** "Analyze my RNA-seq data"
### Data Organization
Structure data directories for efficient retrieval:
```
project/
├── data/ # Biomni knowledge base
├── raw_data/ # Your experimental data
├── results/ # Analysis outputs
└── reports/ # Generated reports
```
### Iterative Refinement
Use iterative task execution for complex analyses:
```python
# Step 1: Exploratory analysis
agent.go("Load and perform initial QC on the proteomics dataset")
# Step 2: Based on results, refine analysis
agent.go("Based on the QC results, remove low-quality samples and normalize using method X")
# Step 3: Downstream analysis
agent.go("Perform differential abundance analysis with adjusted parameters")
```
### Security Considerations
**CRITICAL:** Biomni executes LLM-generated code with full system privileges. For production use:
1. **Use sandboxed environments:** Deploy in Docker containers or VMs with restricted permissions
2. **Validate sensitive operations:** Review code before execution for file access, network calls, or credential usage
3. **Limit data access:** Restrict agent access to only necessary data directories
4. **Monitor execution:** Log all executed code for audit trails
Never run Biomni with:
- Unrestricted file system access
- Direct access to sensitive credentials
- Network access to production systems
- Elevated system privileges
### Model Selection Guidelines
Choose models based on task complexity:
- **Claude Sonnet 4:** Recommended for most biomedical tasks, excellent biological reasoning
- **GPT-4/GPT-4o:** Strong general capabilities, good for diverse tasks
- **Biomni-R0:** Specialized for complex biological reasoning, multi-step workflows
- **Smaller models:** Use for simple, well-defined tasks to reduce cost
## Evaluation and Benchmarking
Biomni-Eval1 benchmark contains 433 evaluation instances across 10 biological tasks:
- GWAS analysis
- Disease diagnosis
- Gene detection and classification
- Molecular property prediction
- Pathway analysis
- Protein function prediction
- Drug response prediction
- Variant interpretation
- Cell type annotation
- Biomarker discovery
Use the benchmark to:
- Evaluate custom agent configurations
- Compare LLM providers for specific tasks
- Validate analysis pipelines
## Troubleshooting
### Common Issues
**Issue:** Data download fails or times out
**Solution:** Manually download the knowledge base or increase timeout settings
**Issue:** Package dependency conflicts
**Solution:** Some optional dependencies cannot be installed by default due to conflicts. Install specific packages manually and uncomment relevant code sections as documented in the repository
**Issue:** LLM API errors
**Solution:** Verify API key configuration, check rate limits, ensure sufficient credits
**Issue:** Memory errors with large datasets
**Solution:** Process data in chunks, use data subsampling, or deploy on higher-memory instances
### Getting Help
For detailed troubleshooting:
- Review the Biomni GitHub repository issues
- Check `references/api_reference.md` for detailed API documentation
- Consult `references/task_examples.md` for comprehensive task patterns
## Resources
### references/
Detailed reference documentation for advanced usage:
- **api_reference.md:** Complete API documentation for A1 agent, configuration objects, and utility functions
- **llm_providers.md:** Comprehensive guide for configuring all supported LLM providers (Anthropic, OpenAI, Azure, Gemini, Groq, Ollama, AWS Bedrock)
- **task_examples.md:** Extensive collection of biomedical task examples with code patterns
### scripts/
Helper scripts for common operations:
- **setup_environment.py:** Automated environment setup and validation
- **generate_report.py:** Enhanced PDF report generation with custom formatting
Load reference documentation as needed:
```python
# Claude can read reference files when needed for detailed information
# Example: "Check references/llm_providers.md for Azure OpenAI configuration"
```

View File

@@ -0,0 +1,635 @@
# Biomni API Reference
This document provides comprehensive API documentation for the Biomni biomedical AI agent system.
## Core Classes
### A1 Agent
The primary agent class for executing biomedical research tasks.
#### Initialization
```python
from biomni.agent import A1
agent = A1(
path='./data', # Path to biomedical knowledge base
llm='claude-sonnet-4-20250514', # LLM model identifier
timeout=None, # Optional timeout in seconds
verbose=True # Enable detailed logging
)
```
**Parameters:**
- `path` (str, required): Directory path where the biomedical knowledge base is stored or will be downloaded. First-time initialization will download ~11GB of data.
- `llm` (str, optional): LLM model identifier. Defaults to the value in `default_config.llm`. Supports multiple providers (see LLM Providers section).
- `timeout` (int, optional): Maximum execution time in seconds for agent operations. Overrides `default_config.timeout_seconds`.
- `verbose` (bool, optional): Enable verbose logging for debugging. Default: True.
**Returns:** A1 agent instance ready for task execution.
#### Methods
##### `go(task_description: str) -> None`
Execute a biomedical research task autonomously.
```python
agent.go("Analyze this scRNA-seq dataset and identify cell types")
```
**Parameters:**
- `task_description` (str, required): Natural language description of the biomedical task to execute. Be specific about:
- Data location and format
- Desired analysis or output
- Any specific methods or parameters
- Expected results format
**Behavior:**
1. Decomposes the task into executable steps
2. Retrieves relevant biomedical knowledge from the data lake
3. Generates and executes Python/R code
4. Provides results and visualizations
5. Handles errors and retries with refinement
**Notes:**
- Executes code with system privileges - use in sandboxed environments
- Long-running tasks may require timeout adjustments
- Intermediate results are displayed during execution
##### `save_conversation_history(output_path: str, format: str = 'pdf') -> None`
Export conversation history and execution trace as a formatted report.
```python
agent.save_conversation_history(
output_path='./reports/analysis_log.pdf',
format='pdf'
)
```
**Parameters:**
- `output_path` (str, required): File path for the output report
- `format` (str, optional): Output format. Options: 'pdf', 'markdown'. Default: 'pdf'
**Requirements:**
- For PDF: Install one of: WeasyPrint, markdown2pdf, or Pandoc
```bash
pip install weasyprint # Recommended
# or
pip install markdown2pdf
# or install Pandoc system-wide
```
**Report Contents:**
- Task description and parameters
- Retrieved biomedical knowledge
- Generated code with execution traces
- Results, visualizations, and outputs
- Timestamps and execution metadata
##### `add_mcp(config_path: str) -> None`
Add Model Context Protocol (MCP) tools to extend agent capabilities.
```python
agent.add_mcp(config_path='./mcp_tools_config.json')
```
**Parameters:**
- `config_path` (str, required): Path to MCP configuration JSON file
**MCP Configuration Format:**
```json
{
"tools": [
{
"name": "tool_name",
"endpoint": "http://localhost:8000/tool",
"description": "Tool description for LLM",
"parameters": {
"param1": "string",
"param2": "integer"
}
}
]
}
```
**Use Cases:**
- Connect to laboratory information systems
- Integrate proprietary databases
- Access specialized computational resources
- Link to institutional data repositories
## Configuration
### default_config
Global configuration object for Biomni settings.
```python
from biomni.config import default_config
```
#### Attributes
##### `llm: str`
Default LLM model identifier for all agent instances.
```python
default_config.llm = "claude-sonnet-4-20250514"
```
**Supported Models:**
**Anthropic:**
- `claude-sonnet-4-20250514` (Recommended)
- `claude-opus-4-20250514`
- `claude-3-5-sonnet-20241022`
- `claude-3-opus-20240229`
**OpenAI:**
- `gpt-4o`
- `gpt-4`
- `gpt-4-turbo`
- `gpt-3.5-turbo`
**Azure OpenAI:**
- `azure/gpt-4`
- `azure/<deployment-name>`
**Google Gemini:**
- `gemini/gemini-pro`
- `gemini/gemini-1.5-pro`
**Groq:**
- `groq/llama-3.1-70b-versatile`
- `groq/mixtral-8x7b-32768`
**Ollama (Local):**
- `ollama/llama3`
- `ollama/mistral`
- `ollama/<model-name>`
**AWS Bedrock:**
- `bedrock/anthropic.claude-v2`
- `bedrock/anthropic.claude-3-sonnet`
**Custom/Biomni-R0:**
- `openai/biomni-r0` (requires local SGLang deployment)
##### `timeout_seconds: int`
Default timeout for agent operations in seconds.
```python
default_config.timeout_seconds = 1200 # 20 minutes
```
**Recommended Values:**
- Simple tasks (QC, basic analysis): 300-600 seconds
- Medium tasks (differential expression, clustering): 600-1200 seconds
- Complex tasks (full pipelines, ML models): 1200-3600 seconds
- Very complex tasks: 3600+ seconds
##### `data_path: str`
Default path to biomedical knowledge base.
```python
default_config.data_path = "/path/to/biomni/data"
```
**Storage Requirements:**
- Initial download: ~11GB
- Extracted size: ~15GB
- Additional working space: ~5-10GB recommended
##### `api_base: str`
Custom API endpoint for LLM providers (advanced usage).
```python
# For local Biomni-R0 deployment
default_config.api_base = "http://localhost:30000/v1"
# For custom OpenAI-compatible endpoints
default_config.api_base = "https://your-endpoint.com/v1"
```
##### `max_retries: int`
Number of retry attempts for failed operations.
```python
default_config.max_retries = 3
```
#### Methods
##### `reset() -> None`
Reset all configuration values to system defaults.
```python
default_config.reset()
```
## Database Query System
Biomni includes a retrieval-augmented generation (RAG) system for querying the biomedical knowledge base.
### Query Functions
#### `query_genes(query: str, top_k: int = 10) -> List[Dict]`
Query gene information from integrated databases.
```python
from biomni.database import query_genes
results = query_genes(
query="genes involved in p53 pathway",
top_k=20
)
```
**Parameters:**
- `query` (str): Natural language or gene identifier query
- `top_k` (int): Number of results to return
**Returns:** List of dictionaries containing:
- `gene_symbol`: Official gene symbol
- `gene_name`: Full gene name
- `description`: Functional description
- `pathways`: Associated biological pathways
- `go_terms`: Gene Ontology annotations
- `diseases`: Associated diseases
- `similarity_score`: Relevance score (0-1)
#### `query_proteins(query: str, top_k: int = 10) -> List[Dict]`
Query protein information from UniProt and other sources.
```python
from biomni.database import query_proteins
results = query_proteins(
query="kinase proteins in cell cycle",
top_k=15
)
```
**Returns:** List of dictionaries with protein metadata:
- `uniprot_id`: UniProt accession
- `protein_name`: Protein name
- `function`: Functional annotation
- `domains`: Protein domains
- `subcellular_location`: Cellular localization
- `similarity_score`: Relevance score
#### `query_drugs(query: str, top_k: int = 10) -> List[Dict]`
Query drug and compound information.
```python
from biomni.database import query_drugs
results = query_drugs(
query="FDA approved cancer drugs targeting EGFR",
top_k=10
)
```
**Returns:** Drug information including:
- `drug_name`: Common name
- `drugbank_id`: DrugBank identifier
- `indication`: Therapeutic indication
- `mechanism`: Mechanism of action
- `targets`: Molecular targets
- `approval_status`: Regulatory status
- `smiles`: Chemical structure (SMILES notation)
#### `query_diseases(query: str, top_k: int = 10) -> List[Dict]`
Query disease information from clinical databases.
```python
from biomni.database import query_diseases
results = query_diseases(
query="autoimmune diseases affecting joints",
top_k=10
)
```
**Returns:** Disease data:
- `disease_name`: Standard disease name
- `disease_id`: Ontology identifier
- `symptoms`: Clinical manifestations
- `associated_genes`: Genetic associations
- `prevalence`: Epidemiological data
#### `query_pathways(query: str, top_k: int = 10) -> List[Dict]`
Query biological pathways from KEGG, Reactome, and other sources.
```python
from biomni.database import query_pathways
results = query_pathways(
query="immune response signaling pathways",
top_k=15
)
```
**Returns:** Pathway information:
- `pathway_name`: Pathway name
- `pathway_id`: Database identifier
- `genes`: Genes in pathway
- `description`: Functional description
- `source`: Database source (KEGG, Reactome, etc.)
## Data Structures
### TaskResult
Result object returned by complex agent operations.
```python
class TaskResult:
success: bool # Whether task completed successfully
output: Any # Task output (varies by task)
code: str # Generated code
execution_time: float # Execution time in seconds
error: Optional[str] # Error message if failed
metadata: Dict # Additional metadata
```
### BiomedicalEntity
Base class for biomedical entities in the knowledge base.
```python
class BiomedicalEntity:
entity_id: str # Unique identifier
entity_type: str # Type (gene, protein, drug, etc.)
name: str # Entity name
description: str # Description
attributes: Dict # Additional attributes
references: List[str] # Literature references
```
## Utility Functions
### `download_data(path: str, force: bool = False) -> None`
Manually download or update the biomedical knowledge base.
```python
from biomni.utils import download_data
download_data(
path='./data',
force=True # Force re-download
)
```
### `validate_environment() -> Dict[str, bool]`
Check if the environment is properly configured.
```python
from biomni.utils import validate_environment
status = validate_environment()
# Returns: {
# 'conda_env': True,
# 'api_keys': True,
# 'data_available': True,
# 'dependencies': True
# }
```
### `list_available_models() -> List[str]`
Get a list of available LLM models based on configured API keys.
```python
from biomni.utils import list_available_models
models = list_available_models()
# Returns: ['claude-sonnet-4-20250514', 'gpt-4o', ...]
```
## Error Handling
### Common Exceptions
#### `BiomniConfigError`
Raised when configuration is invalid or incomplete.
```python
from biomni.exceptions import BiomniConfigError
try:
agent = A1(path='./data')
except BiomniConfigError as e:
print(f"Configuration error: {e}")
```
#### `BiomniExecutionError`
Raised when code generation or execution fails.
```python
from biomni.exceptions import BiomniExecutionError
try:
agent.go("invalid task")
except BiomniExecutionError as e:
print(f"Execution failed: {e}")
# Access failed code: e.code
# Access error details: e.details
```
#### `BiomniDataError`
Raised when knowledge base or data access fails.
```python
from biomni.exceptions import BiomniDataError
try:
results = query_genes("unknown query format")
except BiomniDataError as e:
print(f"Data access error: {e}")
```
#### `BiomniTimeoutError`
Raised when operations exceed timeout limit.
```python
from biomni.exceptions import BiomniTimeoutError
try:
agent.go("very complex long-running task")
except BiomniTimeoutError as e:
print(f"Task timed out after {e.duration} seconds")
# Partial results may be available: e.partial_results
```
## Best Practices
### Efficient Knowledge Retrieval
Pre-query databases for relevant context before complex tasks:
```python
from biomni.database import query_genes, query_pathways
# Gather relevant biological context first
genes = query_genes("cell cycle genes", top_k=50)
pathways = query_pathways("cell cycle regulation", top_k=20)
# Then execute task with enriched context
agent.go(f"""
Analyze the cell cycle progression in this dataset.
Focus on these genes: {[g['gene_symbol'] for g in genes]}
Consider these pathways: {[p['pathway_name'] for p in pathways]}
""")
```
### Error Recovery
Implement robust error handling for production workflows:
```python
from biomni.exceptions import BiomniExecutionError, BiomniTimeoutError
max_attempts = 3
for attempt in range(max_attempts):
try:
agent.go("complex biomedical task")
break
except BiomniTimeoutError:
# Increase timeout and retry
default_config.timeout_seconds *= 2
print(f"Timeout, retrying with {default_config.timeout_seconds}s timeout")
except BiomniExecutionError as e:
# Refine task based on error
print(f"Execution failed: {e}, refining task...")
# Optionally modify task description
else:
print("Task failed after max attempts")
```
### Memory Management
For large-scale analyses, manage memory explicitly:
```python
import gc
# Process datasets in chunks
for chunk_id in range(num_chunks):
agent.go(f"Process data chunk {chunk_id} located at data/chunk_{chunk_id}.h5ad")
# Force garbage collection between chunks
gc.collect()
# Save intermediate results
agent.save_conversation_history(f"./reports/chunk_{chunk_id}.pdf")
```
### Reproducibility
Ensure reproducible analyses by:
1. **Fixing random seeds:**
```python
agent.go("Set random seed to 42 for all analyses, then perform clustering...")
```
2. **Logging configuration:**
```python
import json
config_log = {
'llm': default_config.llm,
'timeout': default_config.timeout_seconds,
'data_path': default_config.data_path,
'timestamp': datetime.now().isoformat()
}
with open('config_log.json', 'w') as f:
json.dump(config_log, f, indent=2)
```
3. **Saving execution traces:**
```python
# Always save detailed reports
agent.save_conversation_history('./reports/full_analysis.pdf')
```
## Performance Optimization
### Model Selection Strategy
Choose models based on task characteristics:
```python
# For exploratory, simple tasks
default_config.llm = "gpt-3.5-turbo" # Fast, cost-effective
# For standard biomedical analyses
default_config.llm = "claude-sonnet-4-20250514" # Recommended
# For complex reasoning and hypothesis generation
default_config.llm = "claude-opus-4-20250514" # Highest quality
# For specialized biological reasoning
default_config.llm = "openai/biomni-r0" # Requires local deployment
```
### Timeout Tuning
Set appropriate timeouts based on task complexity:
```python
# Quick queries and simple analyses
agent = A1(path='./data', timeout=300)
# Standard workflows
agent = A1(path='./data', timeout=1200)
# Full pipelines with ML training
agent = A1(path='./data', timeout=3600)
```
### Caching and Reuse
Reuse agent instances for multiple related tasks:
```python
# Create agent once
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
# Execute multiple related tasks
tasks = [
"Load and QC the scRNA-seq dataset",
"Perform clustering with resolution 0.5",
"Identify marker genes for each cluster",
"Annotate cell types based on markers"
]
for task in tasks:
agent.go(task)
# Save complete workflow
agent.save_conversation_history('./reports/full_workflow.pdf')
```

View File

@@ -0,0 +1,649 @@
# LLM Provider Configuration Guide
This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.
## Overview
Biomni supports multiple LLM providers through a unified interface. Configure providers using:
- Environment variables
- `.env` files
- Runtime configuration via `default_config`
## Quick Reference Table
| Provider | Recommended For | API Key Required | Cost | Setup Complexity |
|----------|----------------|------------------|------|------------------|
| Anthropic Claude | Most biomedical tasks | Yes | Medium | Easy |
| OpenAI | General tasks | Yes | Medium-High | Easy |
| Azure OpenAI | Enterprise deployment | Yes | Varies | Medium |
| Google Gemini | Multimodal tasks | Yes | Medium | Easy |
| Groq | Fast inference | Yes | Low | Easy |
| Ollama | Local/offline use | No | Free | Medium |
| AWS Bedrock | AWS ecosystem | Yes | Varies | Hard |
| Biomni-R0 | Complex biological reasoning | No | Free | Hard |
## Anthropic Claude (Recommended)
### Overview
Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.
### Setup
1. **Obtain API Key:**
- Sign up at https://console.anthropic.com/
- Navigate to API Keys section
- Generate a new key
2. **Configure Environment:**
**Option A: Environment Variable**
```bash
export ANTHROPIC_API_KEY="sk-ant-api03-..."
```
**Option B: .env File**
```bash
# .env file in project root
ANTHROPIC_API_KEY=sk-ant-api03-...
```
3. **Set Model in Code:**
```python
from biomni.config import default_config
# Claude Sonnet 4 (Recommended)
default_config.llm = "claude-sonnet-4-20250514"
# Claude Opus 4 (Most capable)
default_config.llm = "claude-opus-4-20250514"
# Claude 3.5 Sonnet (Previous version)
default_config.llm = "claude-3-5-sonnet-20241022"
```
### Available Models
| Model | Context Window | Strengths | Best For |
|-------|---------------|-----------|----------|
| `claude-sonnet-4-20250514` | 200K tokens | Balanced performance, cost-effective | Most biomedical tasks |
| `claude-opus-4-20250514` | 200K tokens | Highest capability, complex reasoning | Difficult multi-step analyses |
| `claude-3-5-sonnet-20241022` | 200K tokens | Fast, reliable | Standard workflows |
| `claude-3-opus-20240229` | 200K tokens | Strong reasoning | Legacy support |
### Advanced Configuration
```python
from biomni.config import default_config
# Use Claude with custom parameters
default_config.llm = "claude-sonnet-4-20250514"
default_config.timeout_seconds = 1800
# Optional: Custom API endpoint (for proxy/enterprise)
default_config.api_base = "https://your-proxy.com/v1"
```
### Cost Estimation
Approximate costs per 1M tokens (as of January 2025):
- Input: $3-15 depending on model
- Output: $15-75 depending on model
For a typical biomedical analysis (~50K tokens total): $0.50-$2.00
## OpenAI
### Overview
OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.
### Setup
1. **Obtain API Key:**
- Sign up at https://platform.openai.com/
- Navigate to API Keys
- Create new secret key
2. **Configure Environment:**
```bash
export OPENAI_API_KEY="sk-proj-..."
```
Or in `.env`:
```
OPENAI_API_KEY=sk-proj-...
```
3. **Set Model:**
```python
from biomni.config import default_config
default_config.llm = "gpt-4o" # Recommended
# default_config.llm = "gpt-4" # Previous flagship
# default_config.llm = "gpt-4-turbo" # Fast variant
# default_config.llm = "gpt-3.5-turbo" # Budget option
```
### Available Models
| Model | Context Window | Strengths | Cost |
|-------|---------------|-----------|------|
| `gpt-4o` | 128K tokens | Fast, multimodal | Medium |
| `gpt-4-turbo` | 128K tokens | Fast inference | Medium |
| `gpt-4` | 8K tokens | Reliable | High |
| `gpt-3.5-turbo` | 16K tokens | Fast, cheap | Low |
### Cost Optimization
```python
# For exploratory analysis (budget-conscious)
default_config.llm = "gpt-3.5-turbo"
# For production analysis (quality-focused)
default_config.llm = "gpt-4o"
```
## Azure OpenAI
### Overview
Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.
### Setup
1. **Azure Prerequisites:**
- Active Azure subscription
- Azure OpenAI resource created
- Model deployment configured
2. **Environment Variables:**
```bash
export AZURE_OPENAI_API_KEY="your-key"
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
```
3. **Configuration:**
```python
from biomni.config import default_config
# Option 1: Use deployment name
default_config.llm = "azure/your-deployment-name"
# Option 2: Specify endpoint explicitly
default_config.llm = "azure/gpt-4"
default_config.api_base = "https://your-resource.openai.azure.com/"
```
### Deployment Setup
Azure OpenAI requires explicit model deployments:
1. Navigate to Azure OpenAI Studio
2. Create deployment for desired model (e.g., GPT-4)
3. Note the deployment name
4. Use deployment name in Biomni configuration
### Example Configuration
```python
from biomni.config import default_config
import os
# Set Azure credentials
os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'
# Configure Biomni to use Azure deployment
default_config.llm = "azure/gpt-4-biomni" # Your deployment name
default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
```
## Google Gemini
### Overview
Google's Gemini models offer multimodal capabilities and competitive performance.
### Setup
1. **Obtain API Key:**
- Visit https://makersuite.google.com/app/apikey
- Create new API key
2. **Environment Configuration:**
```bash
export GEMINI_API_KEY="your-key"
```
3. **Set Model:**
```python
from biomni.config import default_config
default_config.llm = "gemini/gemini-1.5-pro"
# Or: default_config.llm = "gemini/gemini-pro"
```
### Available Models
| Model | Context Window | Strengths |
|-------|---------------|-----------|
| `gemini/gemini-1.5-pro` | 1M tokens | Very large context, multimodal |
| `gemini/gemini-pro` | 32K tokens | Balanced performance |
### Use Cases
Gemini excels at:
- Tasks requiring very large context windows
- Multimodal analysis (when incorporating images)
- Cost-effective alternative to GPT-4
```python
# For tasks with large context requirements
default_config.llm = "gemini/gemini-1.5-pro"
default_config.timeout_seconds = 2400 # May need longer timeout
```
## Groq
### Overview
Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.
### Setup
1. **Get API Key:**
- Sign up at https://console.groq.com/
- Generate API key
2. **Configure:**
```bash
export GROQ_API_KEY="gsk_..."
```
3. **Set Model:**
```python
from biomni.config import default_config
default_config.llm = "groq/llama-3.1-70b-versatile"
# Or: default_config.llm = "groq/mixtral-8x7b-32768"
```
### Available Models
| Model | Context Window | Speed | Quality |
|-------|---------------|-------|---------|
| `groq/llama-3.1-70b-versatile` | 32K tokens | Very Fast | Good |
| `groq/mixtral-8x7b-32768` | 32K tokens | Very Fast | Good |
| `groq/llama-3-70b-8192` | 8K tokens | Ultra Fast | Moderate |
### Best Practices
```python
# For rapid prototyping and testing
default_config.llm = "groq/llama-3.1-70b-versatile"
default_config.timeout_seconds = 600 # Groq is fast
# Note: Quality may be lower than GPT-4/Claude for complex tasks
# Recommended for: QC, simple analyses, testing workflows
```
## Ollama (Local Deployment)
### Overview
Run LLMs entirely locally for offline use, data privacy, or cost savings.
### Setup
1. **Install Ollama:**
```bash
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download from https://ollama.com/download
```
2. **Pull Models:**
```bash
ollama pull llama3 # Meta Llama 3 (8B)
ollama pull mixtral # Mixtral (47B)
ollama pull codellama # Code-specialized
ollama pull medllama # Medical domain (if available)
```
3. **Start Ollama Server:**
```bash
ollama serve # Runs on http://localhost:11434
```
4. **Configure Biomni:**
```python
from biomni.config import default_config
default_config.llm = "ollama/llama3"
default_config.api_base = "http://localhost:11434"
```
### Hardware Requirements
Minimum recommendations:
- **8B models:** 16GB RAM, CPU inference acceptable
- **70B models:** 64GB RAM, GPU highly recommended
- **Storage:** 5-50GB per model
### Model Selection
```python
# Fast, local, good for testing
default_config.llm = "ollama/llama3"
# Better quality (requires more resources)
default_config.llm = "ollama/mixtral"
# Code generation tasks
default_config.llm = "ollama/codellama"
```
### Advantages & Limitations
**Advantages:**
- Complete data privacy
- No API costs
- Offline operation
- Unlimited usage
**Limitations:**
- Lower quality than GPT-4/Claude for complex tasks
- Requires significant hardware
- Slower inference (especially on CPU)
- May struggle with specialized biomedical knowledge
## AWS Bedrock
### Overview
AWS-managed LLM service offering multiple model providers.
### Setup
1. **AWS Prerequisites:**
- AWS account with Bedrock access
- Model access enabled in Bedrock console
- AWS credentials configured
2. **Configure AWS Credentials:**
```bash
# Option 1: AWS CLI
aws configure
# Option 2: Environment variables
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_REGION="us-east-1"
```
3. **Enable Model Access:**
- Navigate to AWS Bedrock console
- Request access to desired models
- Wait for approval (may take hours/days)
4. **Configure Biomni:**
```python
from biomni.config import default_config
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
# Or: default_config.llm = "bedrock/anthropic.claude-v2"
```
### Available Models
Bedrock provides access to:
- Anthropic Claude models
- Amazon Titan models
- AI21 Jurassic models
- Cohere Command models
- Meta Llama models
### IAM Permissions
Required IAM policy:
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "arn:aws:bedrock:*::foundation-model/*"
}
]
}
```
### Example Configuration
```python
from biomni.config import default_config
import boto3
# Verify AWS credentials
session = boto3.Session()
credentials = session.get_credentials()
print(f"AWS Access Key: {credentials.access_key[:8]}...")
# Configure Biomni
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
default_config.timeout_seconds = 1800
```
## Biomni-R0 (Local Specialized Model)
### Overview
Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.
### Setup
1. **Hardware Requirements:**
- GPU with 48GB+ VRAM (e.g., A100, H100)
- Or multi-GPU setup (2x 24GB)
- 100GB+ storage for model weights
2. **Install Dependencies:**
```bash
pip install "sglang[all]"
pip install flashinfer # Optional but recommended
```
3. **Deploy Model:**
```bash
python -m sglang.launch_server \
--model-path snap-stanford/biomni-r0 \
--host 0.0.0.0 \
--port 30000 \
--trust-remote-code \
--mem-fraction-static 0.8
```
For multi-GPU:
```bash
python -m sglang.launch_server \
--model-path snap-stanford/biomni-r0 \
--host 0.0.0.0 \
--port 30000 \
--trust-remote-code \
--tp 2 # Tensor parallelism across 2 GPUs
```
4. **Configure Biomni:**
```python
from biomni.config import default_config
default_config.llm = "openai/biomni-r0"
default_config.api_base = "http://localhost:30000/v1"
default_config.timeout_seconds = 2400 # Longer for complex reasoning
```
### When to Use Biomni-R0
Biomni-R0 excels at:
- Multi-step biological reasoning
- Complex experimental design
- Hypothesis generation and evaluation
- Literature-informed analysis
- Tasks requiring deep biological knowledge
```python
# For complex biological reasoning tasks
default_config.llm = "openai/biomni-r0"
agent.go("""
Design a comprehensive CRISPR screening experiment to identify synthetic
lethal interactions with TP53 mutations in cancer cells, including:
1. Rationale and hypothesis
2. Guide RNA library design strategy
3. Experimental controls
4. Statistical analysis plan
5. Expected outcomes and validation approach
""")
```
### Performance Comparison
| Model | Speed | Biological Reasoning | Code Quality | Cost |
|-------|-------|---------------------|--------------|------|
| GPT-4 | Fast | Good | Excellent | Medium |
| Claude Sonnet 4 | Fast | Excellent | Excellent | Medium |
| Biomni-R0 | Moderate | Outstanding | Good | Free (local) |
## Multi-Provider Strategy
### Intelligent Model Selection
Use different models for different task types:
```python
from biomni.agent import A1
from biomni.config import default_config
# Strategy 1: Task-based selection
def get_agent_for_task(task_complexity):
if task_complexity == "simple":
default_config.llm = "gpt-3.5-turbo"
default_config.timeout_seconds = 300
elif task_complexity == "medium":
default_config.llm = "claude-sonnet-4-20250514"
default_config.timeout_seconds = 1200
else: # complex
default_config.llm = "openai/biomni-r0"
default_config.timeout_seconds = 2400
return A1(path='./data')
# Strategy 2: Fallback on failure
def execute_with_fallback(task):
models = [
"claude-sonnet-4-20250514",
"gpt-4o",
"claude-opus-4-20250514"
]
for model in models:
try:
default_config.llm = model
agent = A1(path='./data')
agent.go(task)
return
except Exception as e:
print(f"Failed with {model}: {e}, trying next...")
raise Exception("All models failed")
```
### Cost Optimization Strategy
```python
# Phase 1: Rapid prototyping with cheap models
default_config.llm = "gpt-3.5-turbo"
agent.go("Quick exploratory analysis of dataset structure")
# Phase 2: Detailed analysis with high-quality models
default_config.llm = "claude-sonnet-4-20250514"
agent.go("Comprehensive differential expression analysis with pathway enrichment")
# Phase 3: Complex reasoning with specialized models
default_config.llm = "openai/biomni-r0"
agent.go("Generate biological hypotheses based on multi-omics integration")
```
## Troubleshooting
### Common Issues
**Issue: "API key not found"**
- Verify environment variable is set: `echo $ANTHROPIC_API_KEY`
- Check `.env` file exists and is in correct location
- Try setting key programmatically: `os.environ['ANTHROPIC_API_KEY'] = 'key'`
**Issue: "Rate limit exceeded"**
- Implement exponential backoff and retry
- Upgrade API tier if available
- Switch to alternative provider temporarily
**Issue: "Model not found"**
- Verify model identifier is correct
- Check API key has access to requested model
- For Azure: ensure deployment exists with exact name
**Issue: "Timeout errors"**
- Increase `default_config.timeout_seconds`
- Break complex tasks into smaller steps
- Consider using faster model for initial phases
**Issue: "Connection refused (Ollama/Biomni-R0)"**
- Verify local server is running
- Check port is not blocked by firewall
- Confirm `api_base` URL is correct
### Testing Configuration
```python
from biomni.utils import list_available_models, validate_environment
# Check environment setup
status = validate_environment()
print("Environment Status:", status)
# List available models based on configured keys
models = list_available_models()
print("Available Models:", models)
# Test specific model
try:
from biomni.agent import A1
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
agent.go("Print 'Configuration successful!'")
except Exception as e:
print(f"Configuration test failed: {e}")
```
## Best Practices Summary
1. **For most users:** Start with Claude Sonnet 4 or GPT-4o
2. **For cost sensitivity:** Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production
3. **For privacy/offline:** Deploy Ollama locally
4. **For complex reasoning:** Use Biomni-R0 if hardware available
5. **For enterprise:** Consider Azure OpenAI or AWS Bedrock
6. **For speed:** Use Groq for rapid iteration
7. **Always:**
- Set appropriate timeouts
- Implement error handling and retries
- Log model and configuration for reproducibility
- Test configuration before production use

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,381 @@
#!/usr/bin/env python3
"""
Enhanced PDF Report Generation for Biomni
This script provides advanced PDF report generation with custom formatting,
styling, and metadata for Biomni analysis results.
"""
import argparse
import sys
from pathlib import Path
from datetime import datetime
from typing import Optional, Dict, Any
def generate_markdown_report(
title: str,
sections: list,
metadata: Optional[Dict[str, Any]] = None,
output_path: str = "report.md"
) -> str:
"""
Generate a formatted markdown report.
Args:
title: Report title
sections: List of dicts with 'heading' and 'content' keys
metadata: Optional metadata dict (author, date, etc.)
output_path: Path to save markdown file
Returns:
Path to generated markdown file
"""
md_content = []
# Title
md_content.append(f"# {title}\n")
# Metadata
if metadata:
md_content.append("---\n")
for key, value in metadata.items():
md_content.append(f"**{key}:** {value} \n")
md_content.append("---\n\n")
# Sections
for section in sections:
heading = section.get('heading', 'Section')
content = section.get('content', '')
level = section.get('level', 2) # Default to h2
md_content.append(f"{'#' * level} {heading}\n\n")
md_content.append(f"{content}\n\n")
# Write to file
output = Path(output_path)
output.write_text('\n'.join(md_content))
return str(output)
def convert_to_pdf_weasyprint(
markdown_path: str,
output_path: str,
css_style: Optional[str] = None
) -> bool:
"""
Convert markdown to PDF using WeasyPrint.
Args:
markdown_path: Path to markdown file
output_path: Path for output PDF
css_style: Optional CSS stylesheet path
Returns:
True if successful, False otherwise
"""
try:
import markdown
from weasyprint import HTML, CSS
# Read markdown
with open(markdown_path, 'r') as f:
md_content = f.read()
# Convert to HTML
html_content = markdown.markdown(
md_content,
extensions=['tables', 'fenced_code', 'codehilite']
)
# Wrap in HTML template
html_template = f"""
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Biomni Report</title>
<style>
body {{
font-family: 'Helvetica', 'Arial', sans-serif;
line-height: 1.6;
color: #333;
max-width: 800px;
margin: 40px auto;
padding: 20px;
}}
h1 {{
color: #2c3e50;
border-bottom: 3px solid #3498db;
padding-bottom: 10px;
}}
h2 {{
color: #34495e;
margin-top: 30px;
border-bottom: 1px solid #bdc3c7;
padding-bottom: 5px;
}}
h3 {{
color: #7f8c8d;
}}
code {{
background-color: #f4f4f4;
padding: 2px 6px;
border-radius: 3px;
font-family: 'Courier New', monospace;
}}
pre {{
background-color: #f4f4f4;
padding: 15px;
border-radius: 5px;
overflow-x: auto;
}}
table {{
border-collapse: collapse;
width: 100%;
margin: 20px 0;
}}
th, td {{
border: 1px solid #ddd;
padding: 12px;
text-align: left;
}}
th {{
background-color: #3498db;
color: white;
}}
tr:nth-child(even) {{
background-color: #f9f9f9;
}}
.metadata {{
background-color: #ecf0f1;
padding: 15px;
border-radius: 5px;
margin: 20px 0;
}}
</style>
</head>
<body>
{html_content}
</body>
</html>
"""
# Generate PDF
pdf = HTML(string=html_template)
# Add custom CSS if provided
stylesheets = []
if css_style and Path(css_style).exists():
stylesheets.append(CSS(filename=css_style))
pdf.write_pdf(output_path, stylesheets=stylesheets)
return True
except ImportError:
print("Error: WeasyPrint not installed. Install with: pip install weasyprint")
return False
except Exception as e:
print(f"Error generating PDF: {e}")
return False
def convert_to_pdf_pandoc(markdown_path: str, output_path: str) -> bool:
"""
Convert markdown to PDF using Pandoc.
Args:
markdown_path: Path to markdown file
output_path: Path for output PDF
Returns:
True if successful, False otherwise
"""
try:
import subprocess
# Check if pandoc is installed
result = subprocess.run(
['pandoc', '--version'],
capture_output=True,
text=True
)
if result.returncode != 0:
print("Error: Pandoc not installed")
return False
# Convert with pandoc
result = subprocess.run(
[
'pandoc',
markdown_path,
'-o', output_path,
'--pdf-engine=pdflatex',
'-V', 'geometry:margin=1in',
'--toc'
],
capture_output=True,
text=True
)
if result.returncode != 0:
print(f"Pandoc error: {result.stderr}")
return False
return True
except FileNotFoundError:
print("Error: Pandoc not found. Install from https://pandoc.org/")
return False
except Exception as e:
print(f"Error: {e}")
return False
def create_biomni_report(
conversation_history: list,
output_path: str = "biomni_report.pdf",
method: str = "weasyprint"
) -> bool:
"""
Create a formatted PDF report from Biomni conversation history.
Args:
conversation_history: List of conversation turns
output_path: Output PDF path
method: Conversion method ('weasyprint' or 'pandoc')
Returns:
True if successful
"""
# Prepare report sections
metadata = {
'Date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'Tool': 'Biomni AI Agent',
'Report Type': 'Analysis Summary'
}
sections = []
# Executive Summary
sections.append({
'heading': 'Executive Summary',
'level': 2,
'content': 'This report contains the complete analysis workflow executed by the Biomni biomedical AI agent.'
})
# Conversation history
for i, turn in enumerate(conversation_history, 1):
sections.append({
'heading': f'Task {i}: {turn.get("task", "Analysis")}',
'level': 2,
'content': f'**Input:**\n```\n{turn.get("input", "")}\n```\n\n**Output:**\n{turn.get("output", "")}'
})
# Generate markdown
md_path = output_path.replace('.pdf', '.md')
generate_markdown_report(
title="Biomni Analysis Report",
sections=sections,
metadata=metadata,
output_path=md_path
)
# Convert to PDF
if method == 'weasyprint':
success = convert_to_pdf_weasyprint(md_path, output_path)
elif method == 'pandoc':
success = convert_to_pdf_pandoc(md_path, output_path)
else:
print(f"Unknown method: {method}")
return False
if success:
print(f"✓ Report generated: {output_path}")
print(f" Markdown: {md_path}")
else:
print("✗ Failed to generate PDF")
print(f" Markdown available: {md_path}")
return success
def main():
"""CLI for report generation."""
parser = argparse.ArgumentParser(
description='Generate formatted PDF reports for Biomni analyses'
)
parser.add_argument(
'input',
type=str,
help='Input markdown file or conversation history'
)
parser.add_argument(
'-o', '--output',
type=str,
default='biomni_report.pdf',
help='Output PDF path (default: biomni_report.pdf)'
)
parser.add_argument(
'-m', '--method',
type=str,
choices=['weasyprint', 'pandoc'],
default='weasyprint',
help='Conversion method (default: weasyprint)'
)
parser.add_argument(
'--css',
type=str,
help='Custom CSS stylesheet path'
)
args = parser.parse_args()
# Check if input is markdown or conversation history
input_path = Path(args.input)
if not input_path.exists():
print(f"Error: Input file not found: {args.input}")
return 1
# If input is markdown, convert directly
if input_path.suffix == '.md':
if args.method == 'weasyprint':
success = convert_to_pdf_weasyprint(
str(input_path),
args.output,
args.css
)
else:
success = convert_to_pdf_pandoc(str(input_path), args.output)
return 0 if success else 1
# Otherwise, assume it's conversation history (JSON)
try:
import json
with open(input_path) as f:
history = json.load(f)
success = create_biomni_report(
history,
args.output,
args.method
)
return 0 if success else 1
except json.JSONDecodeError:
print("Error: Input file is not valid JSON or markdown")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,230 @@
#!/usr/bin/env python3
"""
Biomni Environment Setup and Validation Script
This script helps users set up and validate their Biomni environment,
including checking dependencies, API keys, and data availability.
"""
import os
import sys
import subprocess
from pathlib import Path
from typing import Dict, List, Tuple
def check_python_version() -> Tuple[bool, str]:
"""Check if Python version is compatible."""
version = sys.version_info
if version.major == 3 and version.minor >= 8:
return True, f"Python {version.major}.{version.minor}.{version.micro}"
else:
return False, f"Python {version.major}.{version.minor} - requires Python 3.8+"
def check_conda_env() -> Tuple[bool, str]:
"""Check if running in biomni conda environment."""
conda_env = os.environ.get('CONDA_DEFAULT_ENV', None)
if conda_env == 'biomni_e1':
return True, f"Conda environment: {conda_env}"
else:
return False, f"Not in biomni_e1 environment (current: {conda_env})"
def check_package_installed(package: str) -> bool:
"""Check if a Python package is installed."""
try:
__import__(package)
return True
except ImportError:
return False
def check_dependencies() -> Tuple[bool, List[str]]:
"""Check for required and optional dependencies."""
required = ['biomni']
optional = ['weasyprint', 'markdown2pdf']
missing_required = [pkg for pkg in required if not check_package_installed(pkg)]
missing_optional = [pkg for pkg in optional if not check_package_installed(pkg)]
messages = []
success = len(missing_required) == 0
if missing_required:
messages.append(f"Missing required packages: {', '.join(missing_required)}")
messages.append("Install with: pip install biomni --upgrade")
else:
messages.append("Required packages: ✓")
if missing_optional:
messages.append(f"Missing optional packages: {', '.join(missing_optional)}")
messages.append("For PDF reports, install: pip install weasyprint")
return success, messages
def check_api_keys() -> Tuple[bool, Dict[str, bool]]:
"""Check which API keys are configured."""
api_keys = {
'ANTHROPIC_API_KEY': os.environ.get('ANTHROPIC_API_KEY'),
'OPENAI_API_KEY': os.environ.get('OPENAI_API_KEY'),
'GEMINI_API_KEY': os.environ.get('GEMINI_API_KEY'),
'GROQ_API_KEY': os.environ.get('GROQ_API_KEY'),
}
configured = {key: bool(value) for key, value in api_keys.items()}
has_any = any(configured.values())
return has_any, configured
def check_data_directory(data_path: str = './data') -> Tuple[bool, str]:
"""Check if Biomni data directory exists and has content."""
path = Path(data_path)
if not path.exists():
return False, f"Data directory not found at {data_path}"
# Check if directory has files (data has been downloaded)
files = list(path.glob('*'))
if len(files) == 0:
return False, f"Data directory exists but is empty. Run agent once to download."
# Rough size check (should be ~11GB)
total_size = sum(f.stat().st_size for f in path.rglob('*') if f.is_file())
size_gb = total_size / (1024**3)
if size_gb < 1:
return False, f"Data directory exists but seems incomplete ({size_gb:.1f} GB)"
return True, f"Data directory: {data_path} ({size_gb:.1f} GB) ✓"
def check_disk_space(required_gb: float = 20) -> Tuple[bool, str]:
"""Check if sufficient disk space is available."""
try:
import shutil
stat = shutil.disk_usage('.')
free_gb = stat.free / (1024**3)
if free_gb >= required_gb:
return True, f"Disk space: {free_gb:.1f} GB available ✓"
else:
return False, f"Low disk space: {free_gb:.1f} GB (need {required_gb} GB)"
except Exception as e:
return False, f"Could not check disk space: {e}"
def test_biomni_import() -> Tuple[bool, str]:
"""Test if Biomni can be imported and initialized."""
try:
from biomni.agent import A1
from biomni.config import default_config
return True, "Biomni import successful ✓"
except ImportError as e:
return False, f"Cannot import Biomni: {e}"
except Exception as e:
return False, f"Biomni import error: {e}"
def suggest_fixes(results: Dict[str, Tuple[bool, any]]) -> List[str]:
"""Generate suggestions for fixing issues."""
suggestions = []
if not results['python'][0]:
suggestions.append("➜ Upgrade Python to 3.8 or higher")
if not results['conda'][0]:
suggestions.append("➜ Activate biomni environment: conda activate biomni_e1")
if not results['dependencies'][0]:
suggestions.append("➜ Install Biomni: pip install biomni --upgrade")
if not results['api_keys'][0]:
suggestions.append("➜ Set API key: export ANTHROPIC_API_KEY='your-key'")
suggestions.append(" Or create .env file with API keys")
if not results['data'][0]:
suggestions.append("➜ Data will auto-download on first agent.go() call")
if not results['disk_space'][0]:
suggestions.append("➜ Free up disk space (need ~20GB total)")
return suggestions
def main():
"""Run all environment checks and display results."""
print("=" * 60)
print("Biomni Environment Validation")
print("=" * 60)
print()
# Run all checks
results = {}
print("Checking Python version...")
results['python'] = check_python_version()
print(f" {results['python'][1]}")
print()
print("Checking conda environment...")
results['conda'] = check_conda_env()
print(f" {results['conda'][1]}")
print()
print("Checking dependencies...")
results['dependencies'] = check_dependencies()
for msg in results['dependencies'][1]:
print(f" {msg}")
print()
print("Checking API keys...")
results['api_keys'] = check_api_keys()
has_keys, key_status = results['api_keys']
for key, configured in key_status.items():
status = "" if configured else ""
print(f" {key}: {status}")
print()
print("Checking Biomni data directory...")
results['data'] = check_data_directory()
print(f" {results['data'][1]}")
print()
print("Checking disk space...")
results['disk_space'] = check_disk_space()
print(f" {results['disk_space'][1]}")
print()
print("Testing Biomni import...")
results['biomni_import'] = test_biomni_import()
print(f" {results['biomni_import'][1]}")
print()
# Summary
print("=" * 60)
all_passed = all(result[0] for result in results.values())
if all_passed:
print("✓ All checks passed! Environment is ready.")
print()
print("Quick start:")
print(" from biomni.agent import A1")
print(" agent = A1(path='./data', llm='claude-sonnet-4-20250514')")
print(" agent.go('Your biomedical task')")
else:
print("⚠ Some checks failed. See suggestions below:")
print()
suggestions = suggest_fixes(results)
for suggestion in suggestions:
print(suggestion)
print("=" * 60)
return 0 if all_passed else 1
if __name__ == "__main__":
sys.exit(main())