mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Add more scientific skills
This commit is contained in:
375
scientific-packages/biomni/SKILL.md
Normal file
375
scientific-packages/biomni/SKILL.md
Normal file
@@ -0,0 +1,375 @@
|
||||
---
|
||||
name: biomni
|
||||
description: General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning.
|
||||
---
|
||||
|
||||
# Biomni
|
||||
|
||||
## Overview
|
||||
|
||||
Biomni is a general-purpose biomedical AI agent that autonomously executes research tasks across diverse biomedical subfields. It combines large language model reasoning with retrieval-augmented planning and code-based execution to enhance scientific productivity and hypothesis generation. The system operates with an ~11GB biomedical knowledge base covering molecular, genomic, and clinical domains.
|
||||
|
||||
## Quick Start
|
||||
|
||||
Initialize and use the Biomni agent with these basic steps:
|
||||
|
||||
```python
|
||||
from biomni.agent import A1
|
||||
|
||||
# Initialize agent with data path and LLM model
|
||||
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
|
||||
|
||||
# Execute a biomedical research task
|
||||
agent.go("Your biomedical task description")
|
||||
```
|
||||
|
||||
The agent will autonomously decompose the task, retrieve relevant biomedical knowledge, generate and execute code, and provide results.
|
||||
|
||||
## Installation and Setup
|
||||
|
||||
### Environment Preparation
|
||||
|
||||
1. **Set up the conda environment:**
|
||||
- Follow instructions in `biomni_env/README.md` from the repository
|
||||
- Activate the environment: `conda activate biomni_e1`
|
||||
|
||||
2. **Install the package:**
|
||||
```bash
|
||||
pip install biomni --upgrade
|
||||
```
|
||||
|
||||
Or install from source:
|
||||
```bash
|
||||
git clone https://github.com/snap-stanford/biomni.git
|
||||
cd biomni
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
3. **Configure API keys:**
|
||||
|
||||
Set up credentials via environment variables or `.env` file:
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="your-key-here"
|
||||
export OPENAI_API_KEY="your-key-here" # Optional
|
||||
```
|
||||
|
||||
4. **Data initialization:**
|
||||
|
||||
On first use, the agent will automatically download the ~11GB biomedical knowledge base.
|
||||
|
||||
### LLM Provider Configuration
|
||||
|
||||
Biomni supports multiple LLM providers. Configure the default provider using:
|
||||
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
# Set the default LLM model
|
||||
default_config.llm = "claude-sonnet-4-20250514" # Anthropic
|
||||
# default_config.llm = "gpt-4" # OpenAI
|
||||
# default_config.llm = "azure/gpt-4" # Azure OpenAI
|
||||
# default_config.llm = "gemini/gemini-pro" # Google Gemini
|
||||
|
||||
# Set timeout (optional)
|
||||
default_config.timeout_seconds = 1200
|
||||
|
||||
# Set data path (optional)
|
||||
default_config.data_path = "./custom/data/path"
|
||||
```
|
||||
|
||||
Refer to `references/llm_providers.md` for detailed configuration options for each provider.
|
||||
|
||||
## Core Biomedical Research Tasks
|
||||
|
||||
### 1. CRISPR Screening and Design
|
||||
|
||||
Execute CRISPR screening tasks including guide RNA design, off-target analysis, and screening experiment planning:
|
||||
|
||||
```python
|
||||
agent.go("Design a CRISPR screening experiment to identify genes involved in cancer cell resistance to drug X")
|
||||
```
|
||||
|
||||
The agent will:
|
||||
- Retrieve relevant gene databases
|
||||
- Design guide RNAs with specificity analysis
|
||||
- Plan experimental controls and readout strategies
|
||||
- Generate analysis code for screening results
|
||||
|
||||
### 2. Single-Cell RNA-seq Analysis
|
||||
|
||||
Perform comprehensive scRNA-seq analysis workflows:
|
||||
|
||||
```python
|
||||
agent.go("Analyze this 10X Genomics scRNA-seq dataset, identify cell types, and find differentially expressed genes between clusters")
|
||||
```
|
||||
|
||||
Capabilities include:
|
||||
- Quality control and preprocessing
|
||||
- Dimensionality reduction and clustering
|
||||
- Cell type annotation using marker databases
|
||||
- Differential expression analysis
|
||||
- Pathway enrichment analysis
|
||||
|
||||
### 3. Molecular Property Prediction (ADMET)
|
||||
|
||||
Predict absorption, distribution, metabolism, excretion, and toxicity properties:
|
||||
|
||||
```python
|
||||
agent.go("Predict ADMET properties for these drug candidates: [SMILES strings]")
|
||||
```
|
||||
|
||||
The agent handles:
|
||||
- Molecular descriptor calculation
|
||||
- Property prediction using integrated models
|
||||
- Toxicity screening
|
||||
- Drug-likeness assessment
|
||||
|
||||
### 4. Genomic Analysis
|
||||
|
||||
Execute genomic data analysis tasks:
|
||||
|
||||
```python
|
||||
agent.go("Perform GWAS analysis to identify SNPs associated with disease phenotype in this cohort")
|
||||
```
|
||||
|
||||
Supports:
|
||||
- Genome-wide association studies (GWAS)
|
||||
- Variant calling and annotation
|
||||
- Population genetics analysis
|
||||
- Functional genomics integration
|
||||
|
||||
### 5. Protein Structure and Function
|
||||
|
||||
Analyze protein sequences and structures:
|
||||
|
||||
```python
|
||||
agent.go("Predict the structure of this protein sequence and identify potential binding sites")
|
||||
```
|
||||
|
||||
Capabilities:
|
||||
- Sequence analysis and domain identification
|
||||
- Structure prediction integration
|
||||
- Binding site prediction
|
||||
- Protein-protein interaction analysis
|
||||
|
||||
### 6. Disease Diagnosis and Classification
|
||||
|
||||
Perform disease classification from multi-omics data:
|
||||
|
||||
```python
|
||||
agent.go("Build a classifier to diagnose disease X from patient RNA-seq and clinical data")
|
||||
```
|
||||
|
||||
### 7. Systems Biology and Pathway Analysis
|
||||
|
||||
Analyze biological pathways and networks:
|
||||
|
||||
```python
|
||||
agent.go("Identify dysregulated pathways in this differential expression dataset")
|
||||
```
|
||||
|
||||
### 8. Drug Discovery and Repurposing
|
||||
|
||||
Support drug discovery workflows:
|
||||
|
||||
```python
|
||||
agent.go("Identify FDA-approved drugs that could be repurposed for treating disease Y based on mechanism of action")
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Custom Configuration per Agent
|
||||
|
||||
Override global configuration for specific agent instances:
|
||||
|
||||
```python
|
||||
agent = A1(
|
||||
path='./project_data',
|
||||
llm='gpt-4o',
|
||||
timeout=1800
|
||||
)
|
||||
```
|
||||
|
||||
### Conversation History and Reporting
|
||||
|
||||
Save execution traces as formatted PDF reports:
|
||||
|
||||
```python
|
||||
# After executing tasks
|
||||
agent.save_conversation_history(
|
||||
output_path='./reports/experiment_log.pdf',
|
||||
format='pdf'
|
||||
)
|
||||
```
|
||||
|
||||
Requires one of: WeasyPrint, markdown2pdf, or Pandoc.
|
||||
|
||||
### Model Context Protocol (MCP) Integration
|
||||
|
||||
Extend agent capabilities with external tools:
|
||||
|
||||
```python
|
||||
# Add MCP-compatible tools
|
||||
agent.add_mcp(config_path='./mcp_config.json')
|
||||
```
|
||||
|
||||
MCP enables integration with:
|
||||
- Laboratory information management systems (LIMS)
|
||||
- Specialized bioinformatics databases
|
||||
- Custom analysis pipelines
|
||||
- External computational resources
|
||||
|
||||
### Using Biomni-R0 (Specialized Reasoning Model)
|
||||
|
||||
Deploy the 32B parameter Biomni-R0 model for enhanced biological reasoning:
|
||||
|
||||
```bash
|
||||
# Install SGLang
|
||||
pip install "sglang[all]"
|
||||
|
||||
# Deploy Biomni-R0
|
||||
python -m sglang.launch_server \
|
||||
--model-path snap-stanford/biomni-r0 \
|
||||
--port 30000 \
|
||||
--trust-remote-code
|
||||
```
|
||||
|
||||
Then configure the agent:
|
||||
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "openai/biomni-r0"
|
||||
default_config.api_base = "http://localhost:30000/v1"
|
||||
```
|
||||
|
||||
Biomni-R0 provides specialized reasoning for:
|
||||
- Complex multi-step biological workflows
|
||||
- Hypothesis generation and evaluation
|
||||
- Experimental design optimization
|
||||
- Literature-informed analysis
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Task Specification
|
||||
|
||||
Provide clear, specific task descriptions:
|
||||
|
||||
✅ **Good:** "Analyze this scRNA-seq dataset (file: data.h5ad) to identify T cell subtypes, then perform differential expression analysis comparing activated vs. resting T cells"
|
||||
|
||||
❌ **Vague:** "Analyze my RNA-seq data"
|
||||
|
||||
### Data Organization
|
||||
|
||||
Structure data directories for efficient retrieval:
|
||||
|
||||
```
|
||||
project/
|
||||
├── data/ # Biomni knowledge base
|
||||
├── raw_data/ # Your experimental data
|
||||
├── results/ # Analysis outputs
|
||||
└── reports/ # Generated reports
|
||||
```
|
||||
|
||||
### Iterative Refinement
|
||||
|
||||
Use iterative task execution for complex analyses:
|
||||
|
||||
```python
|
||||
# Step 1: Exploratory analysis
|
||||
agent.go("Load and perform initial QC on the proteomics dataset")
|
||||
|
||||
# Step 2: Based on results, refine analysis
|
||||
agent.go("Based on the QC results, remove low-quality samples and normalize using method X")
|
||||
|
||||
# Step 3: Downstream analysis
|
||||
agent.go("Perform differential abundance analysis with adjusted parameters")
|
||||
```
|
||||
|
||||
### Security Considerations
|
||||
|
||||
**CRITICAL:** Biomni executes LLM-generated code with full system privileges. For production use:
|
||||
|
||||
1. **Use sandboxed environments:** Deploy in Docker containers or VMs with restricted permissions
|
||||
2. **Validate sensitive operations:** Review code before execution for file access, network calls, or credential usage
|
||||
3. **Limit data access:** Restrict agent access to only necessary data directories
|
||||
4. **Monitor execution:** Log all executed code for audit trails
|
||||
|
||||
Never run Biomni with:
|
||||
- Unrestricted file system access
|
||||
- Direct access to sensitive credentials
|
||||
- Network access to production systems
|
||||
- Elevated system privileges
|
||||
|
||||
### Model Selection Guidelines
|
||||
|
||||
Choose models based on task complexity:
|
||||
|
||||
- **Claude Sonnet 4:** Recommended for most biomedical tasks, excellent biological reasoning
|
||||
- **GPT-4/GPT-4o:** Strong general capabilities, good for diverse tasks
|
||||
- **Biomni-R0:** Specialized for complex biological reasoning, multi-step workflows
|
||||
- **Smaller models:** Use for simple, well-defined tasks to reduce cost
|
||||
|
||||
## Evaluation and Benchmarking
|
||||
|
||||
Biomni-Eval1 benchmark contains 433 evaluation instances across 10 biological tasks:
|
||||
|
||||
- GWAS analysis
|
||||
- Disease diagnosis
|
||||
- Gene detection and classification
|
||||
- Molecular property prediction
|
||||
- Pathway analysis
|
||||
- Protein function prediction
|
||||
- Drug response prediction
|
||||
- Variant interpretation
|
||||
- Cell type annotation
|
||||
- Biomarker discovery
|
||||
|
||||
Use the benchmark to:
|
||||
- Evaluate custom agent configurations
|
||||
- Compare LLM providers for specific tasks
|
||||
- Validate analysis pipelines
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Issue:** Data download fails or times out
|
||||
**Solution:** Manually download the knowledge base or increase timeout settings
|
||||
|
||||
**Issue:** Package dependency conflicts
|
||||
**Solution:** Some optional dependencies cannot be installed by default due to conflicts. Install specific packages manually and uncomment relevant code sections as documented in the repository
|
||||
|
||||
**Issue:** LLM API errors
|
||||
**Solution:** Verify API key configuration, check rate limits, ensure sufficient credits
|
||||
|
||||
**Issue:** Memory errors with large datasets
|
||||
**Solution:** Process data in chunks, use data subsampling, or deploy on higher-memory instances
|
||||
|
||||
### Getting Help
|
||||
|
||||
For detailed troubleshooting:
|
||||
- Review the Biomni GitHub repository issues
|
||||
- Check `references/api_reference.md` for detailed API documentation
|
||||
- Consult `references/task_examples.md` for comprehensive task patterns
|
||||
|
||||
## Resources
|
||||
|
||||
### references/
|
||||
Detailed reference documentation for advanced usage:
|
||||
|
||||
- **api_reference.md:** Complete API documentation for A1 agent, configuration objects, and utility functions
|
||||
- **llm_providers.md:** Comprehensive guide for configuring all supported LLM providers (Anthropic, OpenAI, Azure, Gemini, Groq, Ollama, AWS Bedrock)
|
||||
- **task_examples.md:** Extensive collection of biomedical task examples with code patterns
|
||||
|
||||
### scripts/
|
||||
Helper scripts for common operations:
|
||||
|
||||
- **setup_environment.py:** Automated environment setup and validation
|
||||
- **generate_report.py:** Enhanced PDF report generation with custom formatting
|
||||
|
||||
Load reference documentation as needed:
|
||||
```python
|
||||
# Claude can read reference files when needed for detailed information
|
||||
# Example: "Check references/llm_providers.md for Azure OpenAI configuration"
|
||||
```
|
||||
635
scientific-packages/biomni/references/api_reference.md
Normal file
635
scientific-packages/biomni/references/api_reference.md
Normal file
@@ -0,0 +1,635 @@
|
||||
# Biomni API Reference
|
||||
|
||||
This document provides comprehensive API documentation for the Biomni biomedical AI agent system.
|
||||
|
||||
## Core Classes
|
||||
|
||||
### A1 Agent
|
||||
|
||||
The primary agent class for executing biomedical research tasks.
|
||||
|
||||
#### Initialization
|
||||
|
||||
```python
|
||||
from biomni.agent import A1
|
||||
|
||||
agent = A1(
|
||||
path='./data', # Path to biomedical knowledge base
|
||||
llm='claude-sonnet-4-20250514', # LLM model identifier
|
||||
timeout=None, # Optional timeout in seconds
|
||||
verbose=True # Enable detailed logging
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
|
||||
- `path` (str, required): Directory path where the biomedical knowledge base is stored or will be downloaded. First-time initialization will download ~11GB of data.
|
||||
- `llm` (str, optional): LLM model identifier. Defaults to the value in `default_config.llm`. Supports multiple providers (see LLM Providers section).
|
||||
- `timeout` (int, optional): Maximum execution time in seconds for agent operations. Overrides `default_config.timeout_seconds`.
|
||||
- `verbose` (bool, optional): Enable verbose logging for debugging. Default: True.
|
||||
|
||||
**Returns:** A1 agent instance ready for task execution.
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `go(task_description: str) -> None`
|
||||
|
||||
Execute a biomedical research task autonomously.
|
||||
|
||||
```python
|
||||
agent.go("Analyze this scRNA-seq dataset and identify cell types")
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `task_description` (str, required): Natural language description of the biomedical task to execute. Be specific about:
|
||||
- Data location and format
|
||||
- Desired analysis or output
|
||||
- Any specific methods or parameters
|
||||
- Expected results format
|
||||
|
||||
**Behavior:**
|
||||
1. Decomposes the task into executable steps
|
||||
2. Retrieves relevant biomedical knowledge from the data lake
|
||||
3. Generates and executes Python/R code
|
||||
4. Provides results and visualizations
|
||||
5. Handles errors and retries with refinement
|
||||
|
||||
**Notes:**
|
||||
- Executes code with system privileges - use in sandboxed environments
|
||||
- Long-running tasks may require timeout adjustments
|
||||
- Intermediate results are displayed during execution
|
||||
|
||||
##### `save_conversation_history(output_path: str, format: str = 'pdf') -> None`
|
||||
|
||||
Export conversation history and execution trace as a formatted report.
|
||||
|
||||
```python
|
||||
agent.save_conversation_history(
|
||||
output_path='./reports/analysis_log.pdf',
|
||||
format='pdf'
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `output_path` (str, required): File path for the output report
|
||||
- `format` (str, optional): Output format. Options: 'pdf', 'markdown'. Default: 'pdf'
|
||||
|
||||
**Requirements:**
|
||||
- For PDF: Install one of: WeasyPrint, markdown2pdf, or Pandoc
|
||||
```bash
|
||||
pip install weasyprint # Recommended
|
||||
# or
|
||||
pip install markdown2pdf
|
||||
# or install Pandoc system-wide
|
||||
```
|
||||
|
||||
**Report Contents:**
|
||||
- Task description and parameters
|
||||
- Retrieved biomedical knowledge
|
||||
- Generated code with execution traces
|
||||
- Results, visualizations, and outputs
|
||||
- Timestamps and execution metadata
|
||||
|
||||
##### `add_mcp(config_path: str) -> None`
|
||||
|
||||
Add Model Context Protocol (MCP) tools to extend agent capabilities.
|
||||
|
||||
```python
|
||||
agent.add_mcp(config_path='./mcp_tools_config.json')
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `config_path` (str, required): Path to MCP configuration JSON file
|
||||
|
||||
**MCP Configuration Format:**
|
||||
```json
|
||||
{
|
||||
"tools": [
|
||||
{
|
||||
"name": "tool_name",
|
||||
"endpoint": "http://localhost:8000/tool",
|
||||
"description": "Tool description for LLM",
|
||||
"parameters": {
|
||||
"param1": "string",
|
||||
"param2": "integer"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Use Cases:**
|
||||
- Connect to laboratory information systems
|
||||
- Integrate proprietary databases
|
||||
- Access specialized computational resources
|
||||
- Link to institutional data repositories
|
||||
|
||||
## Configuration
|
||||
|
||||
### default_config
|
||||
|
||||
Global configuration object for Biomni settings.
|
||||
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
```
|
||||
|
||||
#### Attributes
|
||||
|
||||
##### `llm: str`
|
||||
|
||||
Default LLM model identifier for all agent instances.
|
||||
|
||||
```python
|
||||
default_config.llm = "claude-sonnet-4-20250514"
|
||||
```
|
||||
|
||||
**Supported Models:**
|
||||
|
||||
**Anthropic:**
|
||||
- `claude-sonnet-4-20250514` (Recommended)
|
||||
- `claude-opus-4-20250514`
|
||||
- `claude-3-5-sonnet-20241022`
|
||||
- `claude-3-opus-20240229`
|
||||
|
||||
**OpenAI:**
|
||||
- `gpt-4o`
|
||||
- `gpt-4`
|
||||
- `gpt-4-turbo`
|
||||
- `gpt-3.5-turbo`
|
||||
|
||||
**Azure OpenAI:**
|
||||
- `azure/gpt-4`
|
||||
- `azure/<deployment-name>`
|
||||
|
||||
**Google Gemini:**
|
||||
- `gemini/gemini-pro`
|
||||
- `gemini/gemini-1.5-pro`
|
||||
|
||||
**Groq:**
|
||||
- `groq/llama-3.1-70b-versatile`
|
||||
- `groq/mixtral-8x7b-32768`
|
||||
|
||||
**Ollama (Local):**
|
||||
- `ollama/llama3`
|
||||
- `ollama/mistral`
|
||||
- `ollama/<model-name>`
|
||||
|
||||
**AWS Bedrock:**
|
||||
- `bedrock/anthropic.claude-v2`
|
||||
- `bedrock/anthropic.claude-3-sonnet`
|
||||
|
||||
**Custom/Biomni-R0:**
|
||||
- `openai/biomni-r0` (requires local SGLang deployment)
|
||||
|
||||
##### `timeout_seconds: int`
|
||||
|
||||
Default timeout for agent operations in seconds.
|
||||
|
||||
```python
|
||||
default_config.timeout_seconds = 1200 # 20 minutes
|
||||
```
|
||||
|
||||
**Recommended Values:**
|
||||
- Simple tasks (QC, basic analysis): 300-600 seconds
|
||||
- Medium tasks (differential expression, clustering): 600-1200 seconds
|
||||
- Complex tasks (full pipelines, ML models): 1200-3600 seconds
|
||||
- Very complex tasks: 3600+ seconds
|
||||
|
||||
##### `data_path: str`
|
||||
|
||||
Default path to biomedical knowledge base.
|
||||
|
||||
```python
|
||||
default_config.data_path = "/path/to/biomni/data"
|
||||
```
|
||||
|
||||
**Storage Requirements:**
|
||||
- Initial download: ~11GB
|
||||
- Extracted size: ~15GB
|
||||
- Additional working space: ~5-10GB recommended
|
||||
|
||||
##### `api_base: str`
|
||||
|
||||
Custom API endpoint for LLM providers (advanced usage).
|
||||
|
||||
```python
|
||||
# For local Biomni-R0 deployment
|
||||
default_config.api_base = "http://localhost:30000/v1"
|
||||
|
||||
# For custom OpenAI-compatible endpoints
|
||||
default_config.api_base = "https://your-endpoint.com/v1"
|
||||
```
|
||||
|
||||
##### `max_retries: int`
|
||||
|
||||
Number of retry attempts for failed operations.
|
||||
|
||||
```python
|
||||
default_config.max_retries = 3
|
||||
```
|
||||
|
||||
#### Methods
|
||||
|
||||
##### `reset() -> None`
|
||||
|
||||
Reset all configuration values to system defaults.
|
||||
|
||||
```python
|
||||
default_config.reset()
|
||||
```
|
||||
|
||||
## Database Query System
|
||||
|
||||
Biomni includes a retrieval-augmented generation (RAG) system for querying the biomedical knowledge base.
|
||||
|
||||
### Query Functions
|
||||
|
||||
#### `query_genes(query: str, top_k: int = 10) -> List[Dict]`
|
||||
|
||||
Query gene information from integrated databases.
|
||||
|
||||
```python
|
||||
from biomni.database import query_genes
|
||||
|
||||
results = query_genes(
|
||||
query="genes involved in p53 pathway",
|
||||
top_k=20
|
||||
)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `query` (str): Natural language or gene identifier query
|
||||
- `top_k` (int): Number of results to return
|
||||
|
||||
**Returns:** List of dictionaries containing:
|
||||
- `gene_symbol`: Official gene symbol
|
||||
- `gene_name`: Full gene name
|
||||
- `description`: Functional description
|
||||
- `pathways`: Associated biological pathways
|
||||
- `go_terms`: Gene Ontology annotations
|
||||
- `diseases`: Associated diseases
|
||||
- `similarity_score`: Relevance score (0-1)
|
||||
|
||||
#### `query_proteins(query: str, top_k: int = 10) -> List[Dict]`
|
||||
|
||||
Query protein information from UniProt and other sources.
|
||||
|
||||
```python
|
||||
from biomni.database import query_proteins
|
||||
|
||||
results = query_proteins(
|
||||
query="kinase proteins in cell cycle",
|
||||
top_k=15
|
||||
)
|
||||
```
|
||||
|
||||
**Returns:** List of dictionaries with protein metadata:
|
||||
- `uniprot_id`: UniProt accession
|
||||
- `protein_name`: Protein name
|
||||
- `function`: Functional annotation
|
||||
- `domains`: Protein domains
|
||||
- `subcellular_location`: Cellular localization
|
||||
- `similarity_score`: Relevance score
|
||||
|
||||
#### `query_drugs(query: str, top_k: int = 10) -> List[Dict]`
|
||||
|
||||
Query drug and compound information.
|
||||
|
||||
```python
|
||||
from biomni.database import query_drugs
|
||||
|
||||
results = query_drugs(
|
||||
query="FDA approved cancer drugs targeting EGFR",
|
||||
top_k=10
|
||||
)
|
||||
```
|
||||
|
||||
**Returns:** Drug information including:
|
||||
- `drug_name`: Common name
|
||||
- `drugbank_id`: DrugBank identifier
|
||||
- `indication`: Therapeutic indication
|
||||
- `mechanism`: Mechanism of action
|
||||
- `targets`: Molecular targets
|
||||
- `approval_status`: Regulatory status
|
||||
- `smiles`: Chemical structure (SMILES notation)
|
||||
|
||||
#### `query_diseases(query: str, top_k: int = 10) -> List[Dict]`
|
||||
|
||||
Query disease information from clinical databases.
|
||||
|
||||
```python
|
||||
from biomni.database import query_diseases
|
||||
|
||||
results = query_diseases(
|
||||
query="autoimmune diseases affecting joints",
|
||||
top_k=10
|
||||
)
|
||||
```
|
||||
|
||||
**Returns:** Disease data:
|
||||
- `disease_name`: Standard disease name
|
||||
- `disease_id`: Ontology identifier
|
||||
- `symptoms`: Clinical manifestations
|
||||
- `associated_genes`: Genetic associations
|
||||
- `prevalence`: Epidemiological data
|
||||
|
||||
#### `query_pathways(query: str, top_k: int = 10) -> List[Dict]`
|
||||
|
||||
Query biological pathways from KEGG, Reactome, and other sources.
|
||||
|
||||
```python
|
||||
from biomni.database import query_pathways
|
||||
|
||||
results = query_pathways(
|
||||
query="immune response signaling pathways",
|
||||
top_k=15
|
||||
)
|
||||
```
|
||||
|
||||
**Returns:** Pathway information:
|
||||
- `pathway_name`: Pathway name
|
||||
- `pathway_id`: Database identifier
|
||||
- `genes`: Genes in pathway
|
||||
- `description`: Functional description
|
||||
- `source`: Database source (KEGG, Reactome, etc.)
|
||||
|
||||
## Data Structures
|
||||
|
||||
### TaskResult
|
||||
|
||||
Result object returned by complex agent operations.
|
||||
|
||||
```python
|
||||
class TaskResult:
|
||||
success: bool # Whether task completed successfully
|
||||
output: Any # Task output (varies by task)
|
||||
code: str # Generated code
|
||||
execution_time: float # Execution time in seconds
|
||||
error: Optional[str] # Error message if failed
|
||||
metadata: Dict # Additional metadata
|
||||
```
|
||||
|
||||
### BiomedicalEntity
|
||||
|
||||
Base class for biomedical entities in the knowledge base.
|
||||
|
||||
```python
|
||||
class BiomedicalEntity:
|
||||
entity_id: str # Unique identifier
|
||||
entity_type: str # Type (gene, protein, drug, etc.)
|
||||
name: str # Entity name
|
||||
description: str # Description
|
||||
attributes: Dict # Additional attributes
|
||||
references: List[str] # Literature references
|
||||
```
|
||||
|
||||
## Utility Functions
|
||||
|
||||
### `download_data(path: str, force: bool = False) -> None`
|
||||
|
||||
Manually download or update the biomedical knowledge base.
|
||||
|
||||
```python
|
||||
from biomni.utils import download_data
|
||||
|
||||
download_data(
|
||||
path='./data',
|
||||
force=True # Force re-download
|
||||
)
|
||||
```
|
||||
|
||||
### `validate_environment() -> Dict[str, bool]`
|
||||
|
||||
Check if the environment is properly configured.
|
||||
|
||||
```python
|
||||
from biomni.utils import validate_environment
|
||||
|
||||
status = validate_environment()
|
||||
# Returns: {
|
||||
# 'conda_env': True,
|
||||
# 'api_keys': True,
|
||||
# 'data_available': True,
|
||||
# 'dependencies': True
|
||||
# }
|
||||
```
|
||||
|
||||
### `list_available_models() -> List[str]`
|
||||
|
||||
Get a list of available LLM models based on configured API keys.
|
||||
|
||||
```python
|
||||
from biomni.utils import list_available_models
|
||||
|
||||
models = list_available_models()
|
||||
# Returns: ['claude-sonnet-4-20250514', 'gpt-4o', ...]
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Common Exceptions
|
||||
|
||||
#### `BiomniConfigError`
|
||||
|
||||
Raised when configuration is invalid or incomplete.
|
||||
|
||||
```python
|
||||
from biomni.exceptions import BiomniConfigError
|
||||
|
||||
try:
|
||||
agent = A1(path='./data')
|
||||
except BiomniConfigError as e:
|
||||
print(f"Configuration error: {e}")
|
||||
```
|
||||
|
||||
#### `BiomniExecutionError`
|
||||
|
||||
Raised when code generation or execution fails.
|
||||
|
||||
```python
|
||||
from biomni.exceptions import BiomniExecutionError
|
||||
|
||||
try:
|
||||
agent.go("invalid task")
|
||||
except BiomniExecutionError as e:
|
||||
print(f"Execution failed: {e}")
|
||||
# Access failed code: e.code
|
||||
# Access error details: e.details
|
||||
```
|
||||
|
||||
#### `BiomniDataError`
|
||||
|
||||
Raised when knowledge base or data access fails.
|
||||
|
||||
```python
|
||||
from biomni.exceptions import BiomniDataError
|
||||
|
||||
try:
|
||||
results = query_genes("unknown query format")
|
||||
except BiomniDataError as e:
|
||||
print(f"Data access error: {e}")
|
||||
```
|
||||
|
||||
#### `BiomniTimeoutError`
|
||||
|
||||
Raised when operations exceed timeout limit.
|
||||
|
||||
```python
|
||||
from biomni.exceptions import BiomniTimeoutError
|
||||
|
||||
try:
|
||||
agent.go("very complex long-running task")
|
||||
except BiomniTimeoutError as e:
|
||||
print(f"Task timed out after {e.duration} seconds")
|
||||
# Partial results may be available: e.partial_results
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Efficient Knowledge Retrieval
|
||||
|
||||
Pre-query databases for relevant context before complex tasks:
|
||||
|
||||
```python
|
||||
from biomni.database import query_genes, query_pathways
|
||||
|
||||
# Gather relevant biological context first
|
||||
genes = query_genes("cell cycle genes", top_k=50)
|
||||
pathways = query_pathways("cell cycle regulation", top_k=20)
|
||||
|
||||
# Then execute task with enriched context
|
||||
agent.go(f"""
|
||||
Analyze the cell cycle progression in this dataset.
|
||||
Focus on these genes: {[g['gene_symbol'] for g in genes]}
|
||||
Consider these pathways: {[p['pathway_name'] for p in pathways]}
|
||||
""")
|
||||
```
|
||||
|
||||
### Error Recovery
|
||||
|
||||
Implement robust error handling for production workflows:
|
||||
|
||||
```python
|
||||
from biomni.exceptions import BiomniExecutionError, BiomniTimeoutError
|
||||
|
||||
max_attempts = 3
|
||||
for attempt in range(max_attempts):
|
||||
try:
|
||||
agent.go("complex biomedical task")
|
||||
break
|
||||
except BiomniTimeoutError:
|
||||
# Increase timeout and retry
|
||||
default_config.timeout_seconds *= 2
|
||||
print(f"Timeout, retrying with {default_config.timeout_seconds}s timeout")
|
||||
except BiomniExecutionError as e:
|
||||
# Refine task based on error
|
||||
print(f"Execution failed: {e}, refining task...")
|
||||
# Optionally modify task description
|
||||
else:
|
||||
print("Task failed after max attempts")
|
||||
```
|
||||
|
||||
### Memory Management
|
||||
|
||||
For large-scale analyses, manage memory explicitly:
|
||||
|
||||
```python
|
||||
import gc
|
||||
|
||||
# Process datasets in chunks
|
||||
for chunk_id in range(num_chunks):
|
||||
agent.go(f"Process data chunk {chunk_id} located at data/chunk_{chunk_id}.h5ad")
|
||||
|
||||
# Force garbage collection between chunks
|
||||
gc.collect()
|
||||
|
||||
# Save intermediate results
|
||||
agent.save_conversation_history(f"./reports/chunk_{chunk_id}.pdf")
|
||||
```
|
||||
|
||||
### Reproducibility
|
||||
|
||||
Ensure reproducible analyses by:
|
||||
|
||||
1. **Fixing random seeds:**
|
||||
```python
|
||||
agent.go("Set random seed to 42 for all analyses, then perform clustering...")
|
||||
```
|
||||
|
||||
2. **Logging configuration:**
|
||||
```python
|
||||
import json
|
||||
config_log = {
|
||||
'llm': default_config.llm,
|
||||
'timeout': default_config.timeout_seconds,
|
||||
'data_path': default_config.data_path,
|
||||
'timestamp': datetime.now().isoformat()
|
||||
}
|
||||
with open('config_log.json', 'w') as f:
|
||||
json.dump(config_log, f, indent=2)
|
||||
```
|
||||
|
||||
3. **Saving execution traces:**
|
||||
```python
|
||||
# Always save detailed reports
|
||||
agent.save_conversation_history('./reports/full_analysis.pdf')
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Model Selection Strategy
|
||||
|
||||
Choose models based on task characteristics:
|
||||
|
||||
```python
|
||||
# For exploratory, simple tasks
|
||||
default_config.llm = "gpt-3.5-turbo" # Fast, cost-effective
|
||||
|
||||
# For standard biomedical analyses
|
||||
default_config.llm = "claude-sonnet-4-20250514" # Recommended
|
||||
|
||||
# For complex reasoning and hypothesis generation
|
||||
default_config.llm = "claude-opus-4-20250514" # Highest quality
|
||||
|
||||
# For specialized biological reasoning
|
||||
default_config.llm = "openai/biomni-r0" # Requires local deployment
|
||||
```
|
||||
|
||||
### Timeout Tuning
|
||||
|
||||
Set appropriate timeouts based on task complexity:
|
||||
|
||||
```python
|
||||
# Quick queries and simple analyses
|
||||
agent = A1(path='./data', timeout=300)
|
||||
|
||||
# Standard workflows
|
||||
agent = A1(path='./data', timeout=1200)
|
||||
|
||||
# Full pipelines with ML training
|
||||
agent = A1(path='./data', timeout=3600)
|
||||
```
|
||||
|
||||
### Caching and Reuse
|
||||
|
||||
Reuse agent instances for multiple related tasks:
|
||||
|
||||
```python
|
||||
# Create agent once
|
||||
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
|
||||
|
||||
# Execute multiple related tasks
|
||||
tasks = [
|
||||
"Load and QC the scRNA-seq dataset",
|
||||
"Perform clustering with resolution 0.5",
|
||||
"Identify marker genes for each cluster",
|
||||
"Annotate cell types based on markers"
|
||||
]
|
||||
|
||||
for task in tasks:
|
||||
agent.go(task)
|
||||
|
||||
# Save complete workflow
|
||||
agent.save_conversation_history('./reports/full_workflow.pdf')
|
||||
```
|
||||
649
scientific-packages/biomni/references/llm_providers.md
Normal file
649
scientific-packages/biomni/references/llm_providers.md
Normal file
@@ -0,0 +1,649 @@
|
||||
# LLM Provider Configuration Guide
|
||||
|
||||
This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.
|
||||
|
||||
## Overview
|
||||
|
||||
Biomni supports multiple LLM providers through a unified interface. Configure providers using:
|
||||
- Environment variables
|
||||
- `.env` files
|
||||
- Runtime configuration via `default_config`
|
||||
|
||||
## Quick Reference Table
|
||||
|
||||
| Provider | Recommended For | API Key Required | Cost | Setup Complexity |
|
||||
|----------|----------------|------------------|------|------------------|
|
||||
| Anthropic Claude | Most biomedical tasks | Yes | Medium | Easy |
|
||||
| OpenAI | General tasks | Yes | Medium-High | Easy |
|
||||
| Azure OpenAI | Enterprise deployment | Yes | Varies | Medium |
|
||||
| Google Gemini | Multimodal tasks | Yes | Medium | Easy |
|
||||
| Groq | Fast inference | Yes | Low | Easy |
|
||||
| Ollama | Local/offline use | No | Free | Medium |
|
||||
| AWS Bedrock | AWS ecosystem | Yes | Varies | Hard |
|
||||
| Biomni-R0 | Complex biological reasoning | No | Free | Hard |
|
||||
|
||||
## Anthropic Claude (Recommended)
|
||||
|
||||
### Overview
|
||||
|
||||
Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Obtain API Key:**
|
||||
- Sign up at https://console.anthropic.com/
|
||||
- Navigate to API Keys section
|
||||
- Generate a new key
|
||||
|
||||
2. **Configure Environment:**
|
||||
|
||||
**Option A: Environment Variable**
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="sk-ant-api03-..."
|
||||
```
|
||||
|
||||
**Option B: .env File**
|
||||
```bash
|
||||
# .env file in project root
|
||||
ANTHROPIC_API_KEY=sk-ant-api03-...
|
||||
```
|
||||
|
||||
3. **Set Model in Code:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
# Claude Sonnet 4 (Recommended)
|
||||
default_config.llm = "claude-sonnet-4-20250514"
|
||||
|
||||
# Claude Opus 4 (Most capable)
|
||||
default_config.llm = "claude-opus-4-20250514"
|
||||
|
||||
# Claude 3.5 Sonnet (Previous version)
|
||||
default_config.llm = "claude-3-5-sonnet-20241022"
|
||||
```
|
||||
|
||||
### Available Models
|
||||
|
||||
| Model | Context Window | Strengths | Best For |
|
||||
|-------|---------------|-----------|----------|
|
||||
| `claude-sonnet-4-20250514` | 200K tokens | Balanced performance, cost-effective | Most biomedical tasks |
|
||||
| `claude-opus-4-20250514` | 200K tokens | Highest capability, complex reasoning | Difficult multi-step analyses |
|
||||
| `claude-3-5-sonnet-20241022` | 200K tokens | Fast, reliable | Standard workflows |
|
||||
| `claude-3-opus-20240229` | 200K tokens | Strong reasoning | Legacy support |
|
||||
|
||||
### Advanced Configuration
|
||||
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
# Use Claude with custom parameters
|
||||
default_config.llm = "claude-sonnet-4-20250514"
|
||||
default_config.timeout_seconds = 1800
|
||||
|
||||
# Optional: Custom API endpoint (for proxy/enterprise)
|
||||
default_config.api_base = "https://your-proxy.com/v1"
|
||||
```
|
||||
|
||||
### Cost Estimation
|
||||
|
||||
Approximate costs per 1M tokens (as of January 2025):
|
||||
- Input: $3-15 depending on model
|
||||
- Output: $15-75 depending on model
|
||||
|
||||
For a typical biomedical analysis (~50K tokens total): $0.50-$2.00
|
||||
|
||||
## OpenAI
|
||||
|
||||
### Overview
|
||||
|
||||
OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Obtain API Key:**
|
||||
- Sign up at https://platform.openai.com/
|
||||
- Navigate to API Keys
|
||||
- Create new secret key
|
||||
|
||||
2. **Configure Environment:**
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY="sk-proj-..."
|
||||
```
|
||||
|
||||
Or in `.env`:
|
||||
```
|
||||
OPENAI_API_KEY=sk-proj-...
|
||||
```
|
||||
|
||||
3. **Set Model:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "gpt-4o" # Recommended
|
||||
# default_config.llm = "gpt-4" # Previous flagship
|
||||
# default_config.llm = "gpt-4-turbo" # Fast variant
|
||||
# default_config.llm = "gpt-3.5-turbo" # Budget option
|
||||
```
|
||||
|
||||
### Available Models
|
||||
|
||||
| Model | Context Window | Strengths | Cost |
|
||||
|-------|---------------|-----------|------|
|
||||
| `gpt-4o` | 128K tokens | Fast, multimodal | Medium |
|
||||
| `gpt-4-turbo` | 128K tokens | Fast inference | Medium |
|
||||
| `gpt-4` | 8K tokens | Reliable | High |
|
||||
| `gpt-3.5-turbo` | 16K tokens | Fast, cheap | Low |
|
||||
|
||||
### Cost Optimization
|
||||
|
||||
```python
|
||||
# For exploratory analysis (budget-conscious)
|
||||
default_config.llm = "gpt-3.5-turbo"
|
||||
|
||||
# For production analysis (quality-focused)
|
||||
default_config.llm = "gpt-4o"
|
||||
```
|
||||
|
||||
## Azure OpenAI
|
||||
|
||||
### Overview
|
||||
|
||||
Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Azure Prerequisites:**
|
||||
- Active Azure subscription
|
||||
- Azure OpenAI resource created
|
||||
- Model deployment configured
|
||||
|
||||
2. **Environment Variables:**
|
||||
```bash
|
||||
export AZURE_OPENAI_API_KEY="your-key"
|
||||
export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
|
||||
export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
|
||||
```
|
||||
|
||||
3. **Configuration:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
# Option 1: Use deployment name
|
||||
default_config.llm = "azure/your-deployment-name"
|
||||
|
||||
# Option 2: Specify endpoint explicitly
|
||||
default_config.llm = "azure/gpt-4"
|
||||
default_config.api_base = "https://your-resource.openai.azure.com/"
|
||||
```
|
||||
|
||||
### Deployment Setup
|
||||
|
||||
Azure OpenAI requires explicit model deployments:
|
||||
|
||||
1. Navigate to Azure OpenAI Studio
|
||||
2. Create deployment for desired model (e.g., GPT-4)
|
||||
3. Note the deployment name
|
||||
4. Use deployment name in Biomni configuration
|
||||
|
||||
### Example Configuration
|
||||
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
import os
|
||||
|
||||
# Set Azure credentials
|
||||
os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
|
||||
os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'
|
||||
|
||||
# Configure Biomni to use Azure deployment
|
||||
default_config.llm = "azure/gpt-4-biomni" # Your deployment name
|
||||
default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
|
||||
```
|
||||
|
||||
## Google Gemini
|
||||
|
||||
### Overview
|
||||
|
||||
Google's Gemini models offer multimodal capabilities and competitive performance.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Obtain API Key:**
|
||||
- Visit https://makersuite.google.com/app/apikey
|
||||
- Create new API key
|
||||
|
||||
2. **Environment Configuration:**
|
||||
```bash
|
||||
export GEMINI_API_KEY="your-key"
|
||||
```
|
||||
|
||||
3. **Set Model:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "gemini/gemini-1.5-pro"
|
||||
# Or: default_config.llm = "gemini/gemini-pro"
|
||||
```
|
||||
|
||||
### Available Models
|
||||
|
||||
| Model | Context Window | Strengths |
|
||||
|-------|---------------|-----------|
|
||||
| `gemini/gemini-1.5-pro` | 1M tokens | Very large context, multimodal |
|
||||
| `gemini/gemini-pro` | 32K tokens | Balanced performance |
|
||||
|
||||
### Use Cases
|
||||
|
||||
Gemini excels at:
|
||||
- Tasks requiring very large context windows
|
||||
- Multimodal analysis (when incorporating images)
|
||||
- Cost-effective alternative to GPT-4
|
||||
|
||||
```python
|
||||
# For tasks with large context requirements
|
||||
default_config.llm = "gemini/gemini-1.5-pro"
|
||||
default_config.timeout_seconds = 2400 # May need longer timeout
|
||||
```
|
||||
|
||||
## Groq
|
||||
|
||||
### Overview
|
||||
|
||||
Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Get API Key:**
|
||||
- Sign up at https://console.groq.com/
|
||||
- Generate API key
|
||||
|
||||
2. **Configure:**
|
||||
```bash
|
||||
export GROQ_API_KEY="gsk_..."
|
||||
```
|
||||
|
||||
3. **Set Model:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "groq/llama-3.1-70b-versatile"
|
||||
# Or: default_config.llm = "groq/mixtral-8x7b-32768"
|
||||
```
|
||||
|
||||
### Available Models
|
||||
|
||||
| Model | Context Window | Speed | Quality |
|
||||
|-------|---------------|-------|---------|
|
||||
| `groq/llama-3.1-70b-versatile` | 32K tokens | Very Fast | Good |
|
||||
| `groq/mixtral-8x7b-32768` | 32K tokens | Very Fast | Good |
|
||||
| `groq/llama-3-70b-8192` | 8K tokens | Ultra Fast | Moderate |
|
||||
|
||||
### Best Practices
|
||||
|
||||
```python
|
||||
# For rapid prototyping and testing
|
||||
default_config.llm = "groq/llama-3.1-70b-versatile"
|
||||
default_config.timeout_seconds = 600 # Groq is fast
|
||||
|
||||
# Note: Quality may be lower than GPT-4/Claude for complex tasks
|
||||
# Recommended for: QC, simple analyses, testing workflows
|
||||
```
|
||||
|
||||
## Ollama (Local Deployment)
|
||||
|
||||
### Overview
|
||||
|
||||
Run LLMs entirely locally for offline use, data privacy, or cost savings.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Install Ollama:**
|
||||
```bash
|
||||
# macOS/Linux
|
||||
curl -fsSL https://ollama.com/install.sh | sh
|
||||
|
||||
# Or download from https://ollama.com/download
|
||||
```
|
||||
|
||||
2. **Pull Models:**
|
||||
```bash
|
||||
ollama pull llama3 # Meta Llama 3 (8B)
|
||||
ollama pull mixtral # Mixtral (47B)
|
||||
ollama pull codellama # Code-specialized
|
||||
ollama pull medllama # Medical domain (if available)
|
||||
```
|
||||
|
||||
3. **Start Ollama Server:**
|
||||
```bash
|
||||
ollama serve # Runs on http://localhost:11434
|
||||
```
|
||||
|
||||
4. **Configure Biomni:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "ollama/llama3"
|
||||
default_config.api_base = "http://localhost:11434"
|
||||
```
|
||||
|
||||
### Hardware Requirements
|
||||
|
||||
Minimum recommendations:
|
||||
- **8B models:** 16GB RAM, CPU inference acceptable
|
||||
- **70B models:** 64GB RAM, GPU highly recommended
|
||||
- **Storage:** 5-50GB per model
|
||||
|
||||
### Model Selection
|
||||
|
||||
```python
|
||||
# Fast, local, good for testing
|
||||
default_config.llm = "ollama/llama3"
|
||||
|
||||
# Better quality (requires more resources)
|
||||
default_config.llm = "ollama/mixtral"
|
||||
|
||||
# Code generation tasks
|
||||
default_config.llm = "ollama/codellama"
|
||||
```
|
||||
|
||||
### Advantages & Limitations
|
||||
|
||||
**Advantages:**
|
||||
- Complete data privacy
|
||||
- No API costs
|
||||
- Offline operation
|
||||
- Unlimited usage
|
||||
|
||||
**Limitations:**
|
||||
- Lower quality than GPT-4/Claude for complex tasks
|
||||
- Requires significant hardware
|
||||
- Slower inference (especially on CPU)
|
||||
- May struggle with specialized biomedical knowledge
|
||||
|
||||
## AWS Bedrock
|
||||
|
||||
### Overview
|
||||
|
||||
AWS-managed LLM service offering multiple model providers.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **AWS Prerequisites:**
|
||||
- AWS account with Bedrock access
|
||||
- Model access enabled in Bedrock console
|
||||
- AWS credentials configured
|
||||
|
||||
2. **Configure AWS Credentials:**
|
||||
```bash
|
||||
# Option 1: AWS CLI
|
||||
aws configure
|
||||
|
||||
# Option 2: Environment variables
|
||||
export AWS_ACCESS_KEY_ID="your-key"
|
||||
export AWS_SECRET_ACCESS_KEY="your-secret"
|
||||
export AWS_REGION="us-east-1"
|
||||
```
|
||||
|
||||
3. **Enable Model Access:**
|
||||
- Navigate to AWS Bedrock console
|
||||
- Request access to desired models
|
||||
- Wait for approval (may take hours/days)
|
||||
|
||||
4. **Configure Biomni:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
|
||||
# Or: default_config.llm = "bedrock/anthropic.claude-v2"
|
||||
```
|
||||
|
||||
### Available Models
|
||||
|
||||
Bedrock provides access to:
|
||||
- Anthropic Claude models
|
||||
- Amazon Titan models
|
||||
- AI21 Jurassic models
|
||||
- Cohere Command models
|
||||
- Meta Llama models
|
||||
|
||||
### IAM Permissions
|
||||
|
||||
Required IAM policy:
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"bedrock:InvokeModel",
|
||||
"bedrock:InvokeModelWithResponseStream"
|
||||
],
|
||||
"Resource": "arn:aws:bedrock:*::foundation-model/*"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Example Configuration
|
||||
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
import boto3
|
||||
|
||||
# Verify AWS credentials
|
||||
session = boto3.Session()
|
||||
credentials = session.get_credentials()
|
||||
print(f"AWS Access Key: {credentials.access_key[:8]}...")
|
||||
|
||||
# Configure Biomni
|
||||
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
|
||||
default_config.timeout_seconds = 1800
|
||||
```
|
||||
|
||||
## Biomni-R0 (Local Specialized Model)
|
||||
|
||||
### Overview
|
||||
|
||||
Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.
|
||||
|
||||
### Setup
|
||||
|
||||
1. **Hardware Requirements:**
|
||||
- GPU with 48GB+ VRAM (e.g., A100, H100)
|
||||
- Or multi-GPU setup (2x 24GB)
|
||||
- 100GB+ storage for model weights
|
||||
|
||||
2. **Install Dependencies:**
|
||||
```bash
|
||||
pip install "sglang[all]"
|
||||
pip install flashinfer # Optional but recommended
|
||||
```
|
||||
|
||||
3. **Deploy Model:**
|
||||
```bash
|
||||
python -m sglang.launch_server \
|
||||
--model-path snap-stanford/biomni-r0 \
|
||||
--host 0.0.0.0 \
|
||||
--port 30000 \
|
||||
--trust-remote-code \
|
||||
--mem-fraction-static 0.8
|
||||
```
|
||||
|
||||
For multi-GPU:
|
||||
```bash
|
||||
python -m sglang.launch_server \
|
||||
--model-path snap-stanford/biomni-r0 \
|
||||
--host 0.0.0.0 \
|
||||
--port 30000 \
|
||||
--trust-remote-code \
|
||||
--tp 2 # Tensor parallelism across 2 GPUs
|
||||
```
|
||||
|
||||
4. **Configure Biomni:**
|
||||
```python
|
||||
from biomni.config import default_config
|
||||
|
||||
default_config.llm = "openai/biomni-r0"
|
||||
default_config.api_base = "http://localhost:30000/v1"
|
||||
default_config.timeout_seconds = 2400 # Longer for complex reasoning
|
||||
```
|
||||
|
||||
### When to Use Biomni-R0
|
||||
|
||||
Biomni-R0 excels at:
|
||||
- Multi-step biological reasoning
|
||||
- Complex experimental design
|
||||
- Hypothesis generation and evaluation
|
||||
- Literature-informed analysis
|
||||
- Tasks requiring deep biological knowledge
|
||||
|
||||
```python
|
||||
# For complex biological reasoning tasks
|
||||
default_config.llm = "openai/biomni-r0"
|
||||
|
||||
agent.go("""
|
||||
Design a comprehensive CRISPR screening experiment to identify synthetic
|
||||
lethal interactions with TP53 mutations in cancer cells, including:
|
||||
1. Rationale and hypothesis
|
||||
2. Guide RNA library design strategy
|
||||
3. Experimental controls
|
||||
4. Statistical analysis plan
|
||||
5. Expected outcomes and validation approach
|
||||
""")
|
||||
```
|
||||
|
||||
### Performance Comparison
|
||||
|
||||
| Model | Speed | Biological Reasoning | Code Quality | Cost |
|
||||
|-------|-------|---------------------|--------------|------|
|
||||
| GPT-4 | Fast | Good | Excellent | Medium |
|
||||
| Claude Sonnet 4 | Fast | Excellent | Excellent | Medium |
|
||||
| Biomni-R0 | Moderate | Outstanding | Good | Free (local) |
|
||||
|
||||
## Multi-Provider Strategy
|
||||
|
||||
### Intelligent Model Selection
|
||||
|
||||
Use different models for different task types:
|
||||
|
||||
```python
|
||||
from biomni.agent import A1
|
||||
from biomni.config import default_config
|
||||
|
||||
# Strategy 1: Task-based selection
|
||||
def get_agent_for_task(task_complexity):
|
||||
if task_complexity == "simple":
|
||||
default_config.llm = "gpt-3.5-turbo"
|
||||
default_config.timeout_seconds = 300
|
||||
elif task_complexity == "medium":
|
||||
default_config.llm = "claude-sonnet-4-20250514"
|
||||
default_config.timeout_seconds = 1200
|
||||
else: # complex
|
||||
default_config.llm = "openai/biomni-r0"
|
||||
default_config.timeout_seconds = 2400
|
||||
|
||||
return A1(path='./data')
|
||||
|
||||
# Strategy 2: Fallback on failure
|
||||
def execute_with_fallback(task):
|
||||
models = [
|
||||
"claude-sonnet-4-20250514",
|
||||
"gpt-4o",
|
||||
"claude-opus-4-20250514"
|
||||
]
|
||||
|
||||
for model in models:
|
||||
try:
|
||||
default_config.llm = model
|
||||
agent = A1(path='./data')
|
||||
agent.go(task)
|
||||
return
|
||||
except Exception as e:
|
||||
print(f"Failed with {model}: {e}, trying next...")
|
||||
|
||||
raise Exception("All models failed")
|
||||
```
|
||||
|
||||
### Cost Optimization Strategy
|
||||
|
||||
```python
|
||||
# Phase 1: Rapid prototyping with cheap models
|
||||
default_config.llm = "gpt-3.5-turbo"
|
||||
agent.go("Quick exploratory analysis of dataset structure")
|
||||
|
||||
# Phase 2: Detailed analysis with high-quality models
|
||||
default_config.llm = "claude-sonnet-4-20250514"
|
||||
agent.go("Comprehensive differential expression analysis with pathway enrichment")
|
||||
|
||||
# Phase 3: Complex reasoning with specialized models
|
||||
default_config.llm = "openai/biomni-r0"
|
||||
agent.go("Generate biological hypotheses based on multi-omics integration")
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Issue: "API key not found"**
|
||||
- Verify environment variable is set: `echo $ANTHROPIC_API_KEY`
|
||||
- Check `.env` file exists and is in correct location
|
||||
- Try setting key programmatically: `os.environ['ANTHROPIC_API_KEY'] = 'key'`
|
||||
|
||||
**Issue: "Rate limit exceeded"**
|
||||
- Implement exponential backoff and retry
|
||||
- Upgrade API tier if available
|
||||
- Switch to alternative provider temporarily
|
||||
|
||||
**Issue: "Model not found"**
|
||||
- Verify model identifier is correct
|
||||
- Check API key has access to requested model
|
||||
- For Azure: ensure deployment exists with exact name
|
||||
|
||||
**Issue: "Timeout errors"**
|
||||
- Increase `default_config.timeout_seconds`
|
||||
- Break complex tasks into smaller steps
|
||||
- Consider using faster model for initial phases
|
||||
|
||||
**Issue: "Connection refused (Ollama/Biomni-R0)"**
|
||||
- Verify local server is running
|
||||
- Check port is not blocked by firewall
|
||||
- Confirm `api_base` URL is correct
|
||||
|
||||
### Testing Configuration
|
||||
|
||||
```python
|
||||
from biomni.utils import list_available_models, validate_environment
|
||||
|
||||
# Check environment setup
|
||||
status = validate_environment()
|
||||
print("Environment Status:", status)
|
||||
|
||||
# List available models based on configured keys
|
||||
models = list_available_models()
|
||||
print("Available Models:", models)
|
||||
|
||||
# Test specific model
|
||||
try:
|
||||
from biomni.agent import A1
|
||||
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
|
||||
agent.go("Print 'Configuration successful!'")
|
||||
except Exception as e:
|
||||
print(f"Configuration test failed: {e}")
|
||||
```
|
||||
|
||||
## Best Practices Summary
|
||||
|
||||
1. **For most users:** Start with Claude Sonnet 4 or GPT-4o
|
||||
2. **For cost sensitivity:** Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production
|
||||
3. **For privacy/offline:** Deploy Ollama locally
|
||||
4. **For complex reasoning:** Use Biomni-R0 if hardware available
|
||||
5. **For enterprise:** Consider Azure OpenAI or AWS Bedrock
|
||||
6. **For speed:** Use Groq for rapid iteration
|
||||
|
||||
7. **Always:**
|
||||
- Set appropriate timeouts
|
||||
- Implement error handling and retries
|
||||
- Log model and configuration for reproducibility
|
||||
- Test configuration before production use
|
||||
1472
scientific-packages/biomni/references/task_examples.md
Normal file
1472
scientific-packages/biomni/references/task_examples.md
Normal file
File diff suppressed because it is too large
Load Diff
381
scientific-packages/biomni/scripts/generate_report.py
Normal file
381
scientific-packages/biomni/scripts/generate_report.py
Normal file
@@ -0,0 +1,381 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Enhanced PDF Report Generation for Biomni
|
||||
|
||||
This script provides advanced PDF report generation with custom formatting,
|
||||
styling, and metadata for Biomni analysis results.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
|
||||
def generate_markdown_report(
|
||||
title: str,
|
||||
sections: list,
|
||||
metadata: Optional[Dict[str, Any]] = None,
|
||||
output_path: str = "report.md"
|
||||
) -> str:
|
||||
"""
|
||||
Generate a formatted markdown report.
|
||||
|
||||
Args:
|
||||
title: Report title
|
||||
sections: List of dicts with 'heading' and 'content' keys
|
||||
metadata: Optional metadata dict (author, date, etc.)
|
||||
output_path: Path to save markdown file
|
||||
|
||||
Returns:
|
||||
Path to generated markdown file
|
||||
"""
|
||||
md_content = []
|
||||
|
||||
# Title
|
||||
md_content.append(f"# {title}\n")
|
||||
|
||||
# Metadata
|
||||
if metadata:
|
||||
md_content.append("---\n")
|
||||
for key, value in metadata.items():
|
||||
md_content.append(f"**{key}:** {value} \n")
|
||||
md_content.append("---\n\n")
|
||||
|
||||
# Sections
|
||||
for section in sections:
|
||||
heading = section.get('heading', 'Section')
|
||||
content = section.get('content', '')
|
||||
level = section.get('level', 2) # Default to h2
|
||||
|
||||
md_content.append(f"{'#' * level} {heading}\n\n")
|
||||
md_content.append(f"{content}\n\n")
|
||||
|
||||
# Write to file
|
||||
output = Path(output_path)
|
||||
output.write_text('\n'.join(md_content))
|
||||
|
||||
return str(output)
|
||||
|
||||
|
||||
def convert_to_pdf_weasyprint(
|
||||
markdown_path: str,
|
||||
output_path: str,
|
||||
css_style: Optional[str] = None
|
||||
) -> bool:
|
||||
"""
|
||||
Convert markdown to PDF using WeasyPrint.
|
||||
|
||||
Args:
|
||||
markdown_path: Path to markdown file
|
||||
output_path: Path for output PDF
|
||||
css_style: Optional CSS stylesheet path
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
import markdown
|
||||
from weasyprint import HTML, CSS
|
||||
|
||||
# Read markdown
|
||||
with open(markdown_path, 'r') as f:
|
||||
md_content = f.read()
|
||||
|
||||
# Convert to HTML
|
||||
html_content = markdown.markdown(
|
||||
md_content,
|
||||
extensions=['tables', 'fenced_code', 'codehilite']
|
||||
)
|
||||
|
||||
# Wrap in HTML template
|
||||
html_template = f"""
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Biomni Report</title>
|
||||
<style>
|
||||
body {{
|
||||
font-family: 'Helvetica', 'Arial', sans-serif;
|
||||
line-height: 1.6;
|
||||
color: #333;
|
||||
max-width: 800px;
|
||||
margin: 40px auto;
|
||||
padding: 20px;
|
||||
}}
|
||||
h1 {{
|
||||
color: #2c3e50;
|
||||
border-bottom: 3px solid #3498db;
|
||||
padding-bottom: 10px;
|
||||
}}
|
||||
h2 {{
|
||||
color: #34495e;
|
||||
margin-top: 30px;
|
||||
border-bottom: 1px solid #bdc3c7;
|
||||
padding-bottom: 5px;
|
||||
}}
|
||||
h3 {{
|
||||
color: #7f8c8d;
|
||||
}}
|
||||
code {{
|
||||
background-color: #f4f4f4;
|
||||
padding: 2px 6px;
|
||||
border-radius: 3px;
|
||||
font-family: 'Courier New', monospace;
|
||||
}}
|
||||
pre {{
|
||||
background-color: #f4f4f4;
|
||||
padding: 15px;
|
||||
border-radius: 5px;
|
||||
overflow-x: auto;
|
||||
}}
|
||||
table {{
|
||||
border-collapse: collapse;
|
||||
width: 100%;
|
||||
margin: 20px 0;
|
||||
}}
|
||||
th, td {{
|
||||
border: 1px solid #ddd;
|
||||
padding: 12px;
|
||||
text-align: left;
|
||||
}}
|
||||
th {{
|
||||
background-color: #3498db;
|
||||
color: white;
|
||||
}}
|
||||
tr:nth-child(even) {{
|
||||
background-color: #f9f9f9;
|
||||
}}
|
||||
.metadata {{
|
||||
background-color: #ecf0f1;
|
||||
padding: 15px;
|
||||
border-radius: 5px;
|
||||
margin: 20px 0;
|
||||
}}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
{html_content}
|
||||
</body>
|
||||
</html>
|
||||
"""
|
||||
|
||||
# Generate PDF
|
||||
pdf = HTML(string=html_template)
|
||||
|
||||
# Add custom CSS if provided
|
||||
stylesheets = []
|
||||
if css_style and Path(css_style).exists():
|
||||
stylesheets.append(CSS(filename=css_style))
|
||||
|
||||
pdf.write_pdf(output_path, stylesheets=stylesheets)
|
||||
|
||||
return True
|
||||
|
||||
except ImportError:
|
||||
print("Error: WeasyPrint not installed. Install with: pip install weasyprint")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"Error generating PDF: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def convert_to_pdf_pandoc(markdown_path: str, output_path: str) -> bool:
|
||||
"""
|
||||
Convert markdown to PDF using Pandoc.
|
||||
|
||||
Args:
|
||||
markdown_path: Path to markdown file
|
||||
output_path: Path for output PDF
|
||||
|
||||
Returns:
|
||||
True if successful, False otherwise
|
||||
"""
|
||||
try:
|
||||
import subprocess
|
||||
|
||||
# Check if pandoc is installed
|
||||
result = subprocess.run(
|
||||
['pandoc', '--version'],
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print("Error: Pandoc not installed")
|
||||
return False
|
||||
|
||||
# Convert with pandoc
|
||||
result = subprocess.run(
|
||||
[
|
||||
'pandoc',
|
||||
markdown_path,
|
||||
'-o', output_path,
|
||||
'--pdf-engine=pdflatex',
|
||||
'-V', 'geometry:margin=1in',
|
||||
'--toc'
|
||||
],
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Pandoc error: {result.stderr}")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
except FileNotFoundError:
|
||||
print("Error: Pandoc not found. Install from https://pandoc.org/")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"Error: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def create_biomni_report(
|
||||
conversation_history: list,
|
||||
output_path: str = "biomni_report.pdf",
|
||||
method: str = "weasyprint"
|
||||
) -> bool:
|
||||
"""
|
||||
Create a formatted PDF report from Biomni conversation history.
|
||||
|
||||
Args:
|
||||
conversation_history: List of conversation turns
|
||||
output_path: Output PDF path
|
||||
method: Conversion method ('weasyprint' or 'pandoc')
|
||||
|
||||
Returns:
|
||||
True if successful
|
||||
"""
|
||||
# Prepare report sections
|
||||
metadata = {
|
||||
'Date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
|
||||
'Tool': 'Biomni AI Agent',
|
||||
'Report Type': 'Analysis Summary'
|
||||
}
|
||||
|
||||
sections = []
|
||||
|
||||
# Executive Summary
|
||||
sections.append({
|
||||
'heading': 'Executive Summary',
|
||||
'level': 2,
|
||||
'content': 'This report contains the complete analysis workflow executed by the Biomni biomedical AI agent.'
|
||||
})
|
||||
|
||||
# Conversation history
|
||||
for i, turn in enumerate(conversation_history, 1):
|
||||
sections.append({
|
||||
'heading': f'Task {i}: {turn.get("task", "Analysis")}',
|
||||
'level': 2,
|
||||
'content': f'**Input:**\n```\n{turn.get("input", "")}\n```\n\n**Output:**\n{turn.get("output", "")}'
|
||||
})
|
||||
|
||||
# Generate markdown
|
||||
md_path = output_path.replace('.pdf', '.md')
|
||||
generate_markdown_report(
|
||||
title="Biomni Analysis Report",
|
||||
sections=sections,
|
||||
metadata=metadata,
|
||||
output_path=md_path
|
||||
)
|
||||
|
||||
# Convert to PDF
|
||||
if method == 'weasyprint':
|
||||
success = convert_to_pdf_weasyprint(md_path, output_path)
|
||||
elif method == 'pandoc':
|
||||
success = convert_to_pdf_pandoc(md_path, output_path)
|
||||
else:
|
||||
print(f"Unknown method: {method}")
|
||||
return False
|
||||
|
||||
if success:
|
||||
print(f"✓ Report generated: {output_path}")
|
||||
print(f" Markdown: {md_path}")
|
||||
else:
|
||||
print("✗ Failed to generate PDF")
|
||||
print(f" Markdown available: {md_path}")
|
||||
|
||||
return success
|
||||
|
||||
|
||||
def main():
|
||||
"""CLI for report generation."""
|
||||
parser = argparse.ArgumentParser(
|
||||
description='Generate formatted PDF reports for Biomni analyses'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'input',
|
||||
type=str,
|
||||
help='Input markdown file or conversation history'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-o', '--output',
|
||||
type=str,
|
||||
default='biomni_report.pdf',
|
||||
help='Output PDF path (default: biomni_report.pdf)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'-m', '--method',
|
||||
type=str,
|
||||
choices=['weasyprint', 'pandoc'],
|
||||
default='weasyprint',
|
||||
help='Conversion method (default: weasyprint)'
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
'--css',
|
||||
type=str,
|
||||
help='Custom CSS stylesheet path'
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Check if input is markdown or conversation history
|
||||
input_path = Path(args.input)
|
||||
|
||||
if not input_path.exists():
|
||||
print(f"Error: Input file not found: {args.input}")
|
||||
return 1
|
||||
|
||||
# If input is markdown, convert directly
|
||||
if input_path.suffix == '.md':
|
||||
if args.method == 'weasyprint':
|
||||
success = convert_to_pdf_weasyprint(
|
||||
str(input_path),
|
||||
args.output,
|
||||
args.css
|
||||
)
|
||||
else:
|
||||
success = convert_to_pdf_pandoc(str(input_path), args.output)
|
||||
|
||||
return 0 if success else 1
|
||||
|
||||
# Otherwise, assume it's conversation history (JSON)
|
||||
try:
|
||||
import json
|
||||
with open(input_path) as f:
|
||||
history = json.load(f)
|
||||
|
||||
success = create_biomni_report(
|
||||
history,
|
||||
args.output,
|
||||
args.method
|
||||
)
|
||||
|
||||
return 0 if success else 1
|
||||
|
||||
except json.JSONDecodeError:
|
||||
print("Error: Input file is not valid JSON or markdown")
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
230
scientific-packages/biomni/scripts/setup_environment.py
Normal file
230
scientific-packages/biomni/scripts/setup_environment.py
Normal file
@@ -0,0 +1,230 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Biomni Environment Setup and Validation Script
|
||||
|
||||
This script helps users set up and validate their Biomni environment,
|
||||
including checking dependencies, API keys, and data availability.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Tuple
|
||||
|
||||
|
||||
def check_python_version() -> Tuple[bool, str]:
|
||||
"""Check if Python version is compatible."""
|
||||
version = sys.version_info
|
||||
if version.major == 3 and version.minor >= 8:
|
||||
return True, f"Python {version.major}.{version.minor}.{version.micro} ✓"
|
||||
else:
|
||||
return False, f"Python {version.major}.{version.minor} - requires Python 3.8+"
|
||||
|
||||
|
||||
def check_conda_env() -> Tuple[bool, str]:
|
||||
"""Check if running in biomni conda environment."""
|
||||
conda_env = os.environ.get('CONDA_DEFAULT_ENV', None)
|
||||
if conda_env == 'biomni_e1':
|
||||
return True, f"Conda environment: {conda_env} ✓"
|
||||
else:
|
||||
return False, f"Not in biomni_e1 environment (current: {conda_env})"
|
||||
|
||||
|
||||
def check_package_installed(package: str) -> bool:
|
||||
"""Check if a Python package is installed."""
|
||||
try:
|
||||
__import__(package)
|
||||
return True
|
||||
except ImportError:
|
||||
return False
|
||||
|
||||
|
||||
def check_dependencies() -> Tuple[bool, List[str]]:
|
||||
"""Check for required and optional dependencies."""
|
||||
required = ['biomni']
|
||||
optional = ['weasyprint', 'markdown2pdf']
|
||||
|
||||
missing_required = [pkg for pkg in required if not check_package_installed(pkg)]
|
||||
missing_optional = [pkg for pkg in optional if not check_package_installed(pkg)]
|
||||
|
||||
messages = []
|
||||
success = len(missing_required) == 0
|
||||
|
||||
if missing_required:
|
||||
messages.append(f"Missing required packages: {', '.join(missing_required)}")
|
||||
messages.append("Install with: pip install biomni --upgrade")
|
||||
else:
|
||||
messages.append("Required packages: ✓")
|
||||
|
||||
if missing_optional:
|
||||
messages.append(f"Missing optional packages: {', '.join(missing_optional)}")
|
||||
messages.append("For PDF reports, install: pip install weasyprint")
|
||||
|
||||
return success, messages
|
||||
|
||||
|
||||
def check_api_keys() -> Tuple[bool, Dict[str, bool]]:
|
||||
"""Check which API keys are configured."""
|
||||
api_keys = {
|
||||
'ANTHROPIC_API_KEY': os.environ.get('ANTHROPIC_API_KEY'),
|
||||
'OPENAI_API_KEY': os.environ.get('OPENAI_API_KEY'),
|
||||
'GEMINI_API_KEY': os.environ.get('GEMINI_API_KEY'),
|
||||
'GROQ_API_KEY': os.environ.get('GROQ_API_KEY'),
|
||||
}
|
||||
|
||||
configured = {key: bool(value) for key, value in api_keys.items()}
|
||||
has_any = any(configured.values())
|
||||
|
||||
return has_any, configured
|
||||
|
||||
|
||||
def check_data_directory(data_path: str = './data') -> Tuple[bool, str]:
|
||||
"""Check if Biomni data directory exists and has content."""
|
||||
path = Path(data_path)
|
||||
|
||||
if not path.exists():
|
||||
return False, f"Data directory not found at {data_path}"
|
||||
|
||||
# Check if directory has files (data has been downloaded)
|
||||
files = list(path.glob('*'))
|
||||
if len(files) == 0:
|
||||
return False, f"Data directory exists but is empty. Run agent once to download."
|
||||
|
||||
# Rough size check (should be ~11GB)
|
||||
total_size = sum(f.stat().st_size for f in path.rglob('*') if f.is_file())
|
||||
size_gb = total_size / (1024**3)
|
||||
|
||||
if size_gb < 1:
|
||||
return False, f"Data directory exists but seems incomplete ({size_gb:.1f} GB)"
|
||||
|
||||
return True, f"Data directory: {data_path} ({size_gb:.1f} GB) ✓"
|
||||
|
||||
|
||||
def check_disk_space(required_gb: float = 20) -> Tuple[bool, str]:
|
||||
"""Check if sufficient disk space is available."""
|
||||
try:
|
||||
import shutil
|
||||
stat = shutil.disk_usage('.')
|
||||
free_gb = stat.free / (1024**3)
|
||||
|
||||
if free_gb >= required_gb:
|
||||
return True, f"Disk space: {free_gb:.1f} GB available ✓"
|
||||
else:
|
||||
return False, f"Low disk space: {free_gb:.1f} GB (need {required_gb} GB)"
|
||||
except Exception as e:
|
||||
return False, f"Could not check disk space: {e}"
|
||||
|
||||
|
||||
def test_biomni_import() -> Tuple[bool, str]:
|
||||
"""Test if Biomni can be imported and initialized."""
|
||||
try:
|
||||
from biomni.agent import A1
|
||||
from biomni.config import default_config
|
||||
return True, "Biomni import successful ✓"
|
||||
except ImportError as e:
|
||||
return False, f"Cannot import Biomni: {e}"
|
||||
except Exception as e:
|
||||
return False, f"Biomni import error: {e}"
|
||||
|
||||
|
||||
def suggest_fixes(results: Dict[str, Tuple[bool, any]]) -> List[str]:
|
||||
"""Generate suggestions for fixing issues."""
|
||||
suggestions = []
|
||||
|
||||
if not results['python'][0]:
|
||||
suggestions.append("➜ Upgrade Python to 3.8 or higher")
|
||||
|
||||
if not results['conda'][0]:
|
||||
suggestions.append("➜ Activate biomni environment: conda activate biomni_e1")
|
||||
|
||||
if not results['dependencies'][0]:
|
||||
suggestions.append("➜ Install Biomni: pip install biomni --upgrade")
|
||||
|
||||
if not results['api_keys'][0]:
|
||||
suggestions.append("➜ Set API key: export ANTHROPIC_API_KEY='your-key'")
|
||||
suggestions.append(" Or create .env file with API keys")
|
||||
|
||||
if not results['data'][0]:
|
||||
suggestions.append("➜ Data will auto-download on first agent.go() call")
|
||||
|
||||
if not results['disk_space'][0]:
|
||||
suggestions.append("➜ Free up disk space (need ~20GB total)")
|
||||
|
||||
return suggestions
|
||||
|
||||
|
||||
def main():
|
||||
"""Run all environment checks and display results."""
|
||||
print("=" * 60)
|
||||
print("Biomni Environment Validation")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# Run all checks
|
||||
results = {}
|
||||
|
||||
print("Checking Python version...")
|
||||
results['python'] = check_python_version()
|
||||
print(f" {results['python'][1]}")
|
||||
print()
|
||||
|
||||
print("Checking conda environment...")
|
||||
results['conda'] = check_conda_env()
|
||||
print(f" {results['conda'][1]}")
|
||||
print()
|
||||
|
||||
print("Checking dependencies...")
|
||||
results['dependencies'] = check_dependencies()
|
||||
for msg in results['dependencies'][1]:
|
||||
print(f" {msg}")
|
||||
print()
|
||||
|
||||
print("Checking API keys...")
|
||||
results['api_keys'] = check_api_keys()
|
||||
has_keys, key_status = results['api_keys']
|
||||
for key, configured in key_status.items():
|
||||
status = "✓" if configured else "✗"
|
||||
print(f" {key}: {status}")
|
||||
print()
|
||||
|
||||
print("Checking Biomni data directory...")
|
||||
results['data'] = check_data_directory()
|
||||
print(f" {results['data'][1]}")
|
||||
print()
|
||||
|
||||
print("Checking disk space...")
|
||||
results['disk_space'] = check_disk_space()
|
||||
print(f" {results['disk_space'][1]}")
|
||||
print()
|
||||
|
||||
print("Testing Biomni import...")
|
||||
results['biomni_import'] = test_biomni_import()
|
||||
print(f" {results['biomni_import'][1]}")
|
||||
print()
|
||||
|
||||
# Summary
|
||||
print("=" * 60)
|
||||
all_passed = all(result[0] for result in results.values())
|
||||
|
||||
if all_passed:
|
||||
print("✓ All checks passed! Environment is ready.")
|
||||
print()
|
||||
print("Quick start:")
|
||||
print(" from biomni.agent import A1")
|
||||
print(" agent = A1(path='./data', llm='claude-sonnet-4-20250514')")
|
||||
print(" agent.go('Your biomedical task')")
|
||||
else:
|
||||
print("⚠ Some checks failed. See suggestions below:")
|
||||
print()
|
||||
suggestions = suggest_fixes(results)
|
||||
for suggestion in suggestions:
|
||||
print(suggestion)
|
||||
|
||||
print("=" * 60)
|
||||
|
||||
return 0 if all_passed else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user