17 KiB
LLM Provider Configuration Guide
This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.
Overview
Biomni supports multiple LLM providers through a unified interface. Configure providers using:
- Environment variables
.envfiles- Runtime configuration via
default_config
Quick Reference Table
| Provider | Recommended For | API Key Required | Cost | Setup Complexity |
|---|---|---|---|---|
| Anthropic Claude | Most biomedical tasks | Yes | Medium | Easy |
| OpenAI | General tasks | Yes | Medium-High | Easy |
| Azure OpenAI | Enterprise deployment | Yes | Varies | Medium |
| Google Gemini | Multimodal tasks | Yes | Medium | Easy |
| Groq | Fast inference | Yes | Low | Easy |
| Ollama | Local/offline use | No | Free | Medium |
| AWS Bedrock | AWS ecosystem | Yes | Varies | Hard |
| Biomni-R0 | Complex biological reasoning | No | Free | Hard |
Anthropic Claude (Recommended)
Overview
Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.
Setup
-
Obtain API Key:
- Sign up at https://console.anthropic.com/
- Navigate to API Keys section
- Generate a new key
-
Configure Environment:
Option A: Environment Variable
export ANTHROPIC_API_KEY="sk-ant-api03-..."Option B: .env File
# .env file in project root ANTHROPIC_API_KEY=sk-ant-api03-... -
Set Model in Code:
from biomni.config import default_config # Claude Sonnet 4 (Recommended) default_config.llm = "claude-sonnet-4-20250514" # Claude Opus 4 (Most capable) default_config.llm = "claude-opus-4-20250514" # Claude 3.5 Sonnet (Previous version) default_config.llm = "claude-3-5-sonnet-20241022"
Available Models
| Model | Context Window | Strengths | Best For |
|---|---|---|---|
claude-sonnet-4-20250514 |
200K tokens | Balanced performance, cost-effective | Most biomedical tasks |
claude-opus-4-20250514 |
200K tokens | Highest capability, complex reasoning | Difficult multi-step analyses |
claude-3-5-sonnet-20241022 |
200K tokens | Fast, reliable | Standard workflows |
claude-3-opus-20240229 |
200K tokens | Strong reasoning | Legacy support |
Advanced Configuration
from biomni.config import default_config
# Use Claude with custom parameters
default_config.llm = "claude-sonnet-4-20250514"
default_config.timeout_seconds = 1800
# Optional: Custom API endpoint (for proxy/enterprise)
default_config.api_base = "https://your-proxy.com/v1"
Cost Estimation
Approximate costs per 1M tokens (as of January 2025):
- Input: $3-15 depending on model
- Output: $15-75 depending on model
For a typical biomedical analysis (~50K tokens total): $0.50-$2.00
OpenAI
Overview
OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.
Setup
-
Obtain API Key:
- Sign up at https://platform.openai.com/
- Navigate to API Keys
- Create new secret key
-
Configure Environment:
export OPENAI_API_KEY="sk-proj-..."Or in
.env:OPENAI_API_KEY=sk-proj-... -
Set Model:
from biomni.config import default_config default_config.llm = "gpt-4o" # Recommended # default_config.llm = "gpt-4" # Previous flagship # default_config.llm = "gpt-4-turbo" # Fast variant # default_config.llm = "gpt-3.5-turbo" # Budget option
Available Models
| Model | Context Window | Strengths | Cost |
|---|---|---|---|
gpt-4o |
128K tokens | Fast, multimodal | Medium |
gpt-4-turbo |
128K tokens | Fast inference | Medium |
gpt-4 |
8K tokens | Reliable | High |
gpt-3.5-turbo |
16K tokens | Fast, cheap | Low |
Cost Optimization
# For exploratory analysis (budget-conscious)
default_config.llm = "gpt-3.5-turbo"
# For production analysis (quality-focused)
default_config.llm = "gpt-4o"
Azure OpenAI
Overview
Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.
Setup
-
Azure Prerequisites:
- Active Azure subscription
- Azure OpenAI resource created
- Model deployment configured
-
Environment Variables:
export AZURE_OPENAI_API_KEY="your-key" export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/" export AZURE_OPENAI_API_VERSION="2024-02-15-preview" -
Configuration:
from biomni.config import default_config # Option 1: Use deployment name default_config.llm = "azure/your-deployment-name" # Option 2: Specify endpoint explicitly default_config.llm = "azure/gpt-4" default_config.api_base = "https://your-resource.openai.azure.com/"
Deployment Setup
Azure OpenAI requires explicit model deployments:
- Navigate to Azure OpenAI Studio
- Create deployment for desired model (e.g., GPT-4)
- Note the deployment name
- Use deployment name in Biomni configuration
Example Configuration
from biomni.config import default_config
import os
# Set Azure credentials
os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'
# Configure Biomni to use Azure deployment
default_config.llm = "azure/gpt-4-biomni" # Your deployment name
default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
Google Gemini
Overview
Google's Gemini models offer multimodal capabilities and competitive performance.
Setup
-
Obtain API Key:
- Visit https://makersuite.google.com/app/apikey
- Create new API key
-
Environment Configuration:
export GEMINI_API_KEY="your-key" -
Set Model:
from biomni.config import default_config default_config.llm = "gemini/gemini-1.5-pro" # Or: default_config.llm = "gemini/gemini-pro"
Available Models
| Model | Context Window | Strengths |
|---|---|---|
gemini/gemini-1.5-pro |
1M tokens | Very large context, multimodal |
gemini/gemini-pro |
32K tokens | Balanced performance |
Use Cases
Gemini excels at:
- Tasks requiring very large context windows
- Multimodal analysis (when incorporating images)
- Cost-effective alternative to GPT-4
# For tasks with large context requirements
default_config.llm = "gemini/gemini-1.5-pro"
default_config.timeout_seconds = 2400 # May need longer timeout
Groq
Overview
Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.
Setup
-
Get API Key:
- Sign up at https://console.groq.com/
- Generate API key
-
Configure:
export GROQ_API_KEY="gsk_..." -
Set Model:
from biomni.config import default_config default_config.llm = "groq/llama-3.1-70b-versatile" # Or: default_config.llm = "groq/mixtral-8x7b-32768"
Available Models
| Model | Context Window | Speed | Quality |
|---|---|---|---|
groq/llama-3.1-70b-versatile |
32K tokens | Very Fast | Good |
groq/mixtral-8x7b-32768 |
32K tokens | Very Fast | Good |
groq/llama-3-70b-8192 |
8K tokens | Ultra Fast | Moderate |
Best Practices
# For rapid prototyping and testing
default_config.llm = "groq/llama-3.1-70b-versatile"
default_config.timeout_seconds = 600 # Groq is fast
# Note: Quality may be lower than GPT-4/Claude for complex tasks
# Recommended for: QC, simple analyses, testing workflows
Ollama (Local Deployment)
Overview
Run LLMs entirely locally for offline use, data privacy, or cost savings.
Setup
-
Install Ollama:
# macOS/Linux curl -fsSL https://ollama.com/install.sh | sh # Or download from https://ollama.com/download -
Pull Models:
ollama pull llama3 # Meta Llama 3 (8B) ollama pull mixtral # Mixtral (47B) ollama pull codellama # Code-specialized ollama pull medllama # Medical domain (if available) -
Start Ollama Server:
ollama serve # Runs on http://localhost:11434 -
Configure Biomni:
from biomni.config import default_config default_config.llm = "ollama/llama3" default_config.api_base = "http://localhost:11434"
Hardware Requirements
Minimum recommendations:
- 8B models: 16GB RAM, CPU inference acceptable
- 70B models: 64GB RAM, GPU highly recommended
- Storage: 5-50GB per model
Model Selection
# Fast, local, good for testing
default_config.llm = "ollama/llama3"
# Better quality (requires more resources)
default_config.llm = "ollama/mixtral"
# Code generation tasks
default_config.llm = "ollama/codellama"
Advantages & Limitations
Advantages:
- Complete data privacy
- No API costs
- Offline operation
- Unlimited usage
Limitations:
- Lower quality than GPT-4/Claude for complex tasks
- Requires significant hardware
- Slower inference (especially on CPU)
- May struggle with specialized biomedical knowledge
AWS Bedrock
Overview
AWS-managed LLM service offering multiple model providers.
Setup
-
AWS Prerequisites:
- AWS account with Bedrock access
- Model access enabled in Bedrock console
- AWS credentials configured
-
Configure AWS Credentials:
# Option 1: AWS CLI aws configure # Option 2: Environment variables export AWS_ACCESS_KEY_ID="your-key" export AWS_SECRET_ACCESS_KEY="your-secret" export AWS_REGION="us-east-1" -
Enable Model Access:
- Navigate to AWS Bedrock console
- Request access to desired models
- Wait for approval (may take hours/days)
-
Configure Biomni:
from biomni.config import default_config default_config.llm = "bedrock/anthropic.claude-3-sonnet" # Or: default_config.llm = "bedrock/anthropic.claude-v2"
Available Models
Bedrock provides access to:
- Anthropic Claude models
- Amazon Titan models
- AI21 Jurassic models
- Cohere Command models
- Meta Llama models
IAM Permissions
Required IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "arn:aws:bedrock:*::foundation-model/*"
}
]
}
Example Configuration
from biomni.config import default_config
import boto3
# Verify AWS credentials
session = boto3.Session()
credentials = session.get_credentials()
print(f"AWS Access Key: {credentials.access_key[:8]}...")
# Configure Biomni
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
default_config.timeout_seconds = 1800
Biomni-R0 (Local Specialized Model)
Overview
Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.
Setup
-
Hardware Requirements:
- GPU with 48GB+ VRAM (e.g., A100, H100)
- Or multi-GPU setup (2x 24GB)
- 100GB+ storage for model weights
-
Install Dependencies:
pip install "sglang[all]" pip install flashinfer # Optional but recommended -
Deploy Model:
python -m sglang.launch_server \ --model-path snap-stanford/biomni-r0 \ --host 0.0.0.0 \ --port 30000 \ --trust-remote-code \ --mem-fraction-static 0.8For multi-GPU:
python -m sglang.launch_server \ --model-path snap-stanford/biomni-r0 \ --host 0.0.0.0 \ --port 30000 \ --trust-remote-code \ --tp 2 # Tensor parallelism across 2 GPUs -
Configure Biomni:
from biomni.config import default_config default_config.llm = "openai/biomni-r0" default_config.api_base = "http://localhost:30000/v1" default_config.timeout_seconds = 2400 # Longer for complex reasoning
When to Use Biomni-R0
Biomni-R0 excels at:
- Multi-step biological reasoning
- Complex experimental design
- Hypothesis generation and evaluation
- Literature-informed analysis
- Tasks requiring deep biological knowledge
# For complex biological reasoning tasks
default_config.llm = "openai/biomni-r0"
agent.go("""
Design a comprehensive CRISPR screening experiment to identify synthetic
lethal interactions with TP53 mutations in cancer cells, including:
1. Rationale and hypothesis
2. Guide RNA library design strategy
3. Experimental controls
4. Statistical analysis plan
5. Expected outcomes and validation approach
""")
Performance Comparison
| Model | Speed | Biological Reasoning | Code Quality | Cost |
|---|---|---|---|---|
| GPT-4 | Fast | Good | Excellent | Medium |
| Claude Sonnet 4 | Fast | Excellent | Excellent | Medium |
| Biomni-R0 | Moderate | Outstanding | Good | Free (local) |
Multi-Provider Strategy
Intelligent Model Selection
Use different models for different task types:
from biomni.agent import A1
from biomni.config import default_config
# Strategy 1: Task-based selection
def get_agent_for_task(task_complexity):
if task_complexity == "simple":
default_config.llm = "gpt-3.5-turbo"
default_config.timeout_seconds = 300
elif task_complexity == "medium":
default_config.llm = "claude-sonnet-4-20250514"
default_config.timeout_seconds = 1200
else: # complex
default_config.llm = "openai/biomni-r0"
default_config.timeout_seconds = 2400
return A1(path='./data')
# Strategy 2: Fallback on failure
def execute_with_fallback(task):
models = [
"claude-sonnet-4-20250514",
"gpt-4o",
"claude-opus-4-20250514"
]
for model in models:
try:
default_config.llm = model
agent = A1(path='./data')
agent.go(task)
return
except Exception as e:
print(f"Failed with {model}: {e}, trying next...")
raise Exception("All models failed")
Cost Optimization Strategy
# Phase 1: Rapid prototyping with cheap models
default_config.llm = "gpt-3.5-turbo"
agent.go("Quick exploratory analysis of dataset structure")
# Phase 2: Detailed analysis with high-quality models
default_config.llm = "claude-sonnet-4-20250514"
agent.go("Comprehensive differential expression analysis with pathway enrichment")
# Phase 3: Complex reasoning with specialized models
default_config.llm = "openai/biomni-r0"
agent.go("Generate biological hypotheses based on multi-omics integration")
Troubleshooting
Common Issues
Issue: "API key not found"
- Verify environment variable is set:
echo $ANTHROPIC_API_KEY - Check
.envfile exists and is in correct location - Try setting key programmatically:
os.environ['ANTHROPIC_API_KEY'] = 'key'
Issue: "Rate limit exceeded"
- Implement exponential backoff and retry
- Upgrade API tier if available
- Switch to alternative provider temporarily
Issue: "Model not found"
- Verify model identifier is correct
- Check API key has access to requested model
- For Azure: ensure deployment exists with exact name
Issue: "Timeout errors"
- Increase
default_config.timeout_seconds - Break complex tasks into smaller steps
- Consider using faster model for initial phases
Issue: "Connection refused (Ollama/Biomni-R0)"
- Verify local server is running
- Check port is not blocked by firewall
- Confirm
api_baseURL is correct
Testing Configuration
from biomni.utils import list_available_models, validate_environment
# Check environment setup
status = validate_environment()
print("Environment Status:", status)
# List available models based on configured keys
models = list_available_models()
print("Available Models:", models)
# Test specific model
try:
from biomni.agent import A1
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
agent.go("Print 'Configuration successful!'")
except Exception as e:
print(f"Configuration test failed: {e}")
Best Practices Summary
-
For most users: Start with Claude Sonnet 4 or GPT-4o
-
For cost sensitivity: Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production
-
For privacy/offline: Deploy Ollama locally
-
For complex reasoning: Use Biomni-R0 if hardware available
-
For enterprise: Consider Azure OpenAI or AWS Bedrock
-
For speed: Use Groq for rapid iteration
-
Always:
- Set appropriate timeouts
- Implement error handling and retries
- Log model and configuration for reproducibility
- Test configuration before production use