Files
claude-scientific-skills/scientific-packages/biomni/references/llm_providers.md
2025-10-19 14:12:02 -07:00

17 KiB

LLM Provider Configuration Guide

This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.

Overview

Biomni supports multiple LLM providers through a unified interface. Configure providers using:

  • Environment variables
  • .env files
  • Runtime configuration via default_config

Quick Reference Table

Provider Recommended For API Key Required Cost Setup Complexity
Anthropic Claude Most biomedical tasks Yes Medium Easy
OpenAI General tasks Yes Medium-High Easy
Azure OpenAI Enterprise deployment Yes Varies Medium
Google Gemini Multimodal tasks Yes Medium Easy
Groq Fast inference Yes Low Easy
Ollama Local/offline use No Free Medium
AWS Bedrock AWS ecosystem Yes Varies Hard
Biomni-R0 Complex biological reasoning No Free Hard

Overview

Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.

Setup

  1. Obtain API Key:

  2. Configure Environment:

    Option A: Environment Variable

    export ANTHROPIC_API_KEY="sk-ant-api03-..."
    

    Option B: .env File

    # .env file in project root
    ANTHROPIC_API_KEY=sk-ant-api03-...
    
  3. Set Model in Code:

    from biomni.config import default_config
    
    # Claude Sonnet 4 (Recommended)
    default_config.llm = "claude-sonnet-4-20250514"
    
    # Claude Opus 4 (Most capable)
    default_config.llm = "claude-opus-4-20250514"
    
    # Claude 3.5 Sonnet (Previous version)
    default_config.llm = "claude-3-5-sonnet-20241022"
    

Available Models

Model Context Window Strengths Best For
claude-sonnet-4-20250514 200K tokens Balanced performance, cost-effective Most biomedical tasks
claude-opus-4-20250514 200K tokens Highest capability, complex reasoning Difficult multi-step analyses
claude-3-5-sonnet-20241022 200K tokens Fast, reliable Standard workflows
claude-3-opus-20240229 200K tokens Strong reasoning Legacy support

Advanced Configuration

from biomni.config import default_config

# Use Claude with custom parameters
default_config.llm = "claude-sonnet-4-20250514"
default_config.timeout_seconds = 1800

# Optional: Custom API endpoint (for proxy/enterprise)
default_config.api_base = "https://your-proxy.com/v1"

Cost Estimation

Approximate costs per 1M tokens (as of January 2025):

  • Input: $3-15 depending on model
  • Output: $15-75 depending on model

For a typical biomedical analysis (~50K tokens total): $0.50-$2.00

OpenAI

Overview

OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.

Setup

  1. Obtain API Key:

  2. Configure Environment:

    export OPENAI_API_KEY="sk-proj-..."
    

    Or in .env:

    OPENAI_API_KEY=sk-proj-...
    
  3. Set Model:

    from biomni.config import default_config
    
    default_config.llm = "gpt-4o"          # Recommended
    # default_config.llm = "gpt-4"         # Previous flagship
    # default_config.llm = "gpt-4-turbo"   # Fast variant
    # default_config.llm = "gpt-3.5-turbo" # Budget option
    

Available Models

Model Context Window Strengths Cost
gpt-4o 128K tokens Fast, multimodal Medium
gpt-4-turbo 128K tokens Fast inference Medium
gpt-4 8K tokens Reliable High
gpt-3.5-turbo 16K tokens Fast, cheap Low

Cost Optimization

# For exploratory analysis (budget-conscious)
default_config.llm = "gpt-3.5-turbo"

# For production analysis (quality-focused)
default_config.llm = "gpt-4o"

Azure OpenAI

Overview

Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.

Setup

  1. Azure Prerequisites:

    • Active Azure subscription
    • Azure OpenAI resource created
    • Model deployment configured
  2. Environment Variables:

    export AZURE_OPENAI_API_KEY="your-key"
    export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
    export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
    
  3. Configuration:

    from biomni.config import default_config
    
    # Option 1: Use deployment name
    default_config.llm = "azure/your-deployment-name"
    
    # Option 2: Specify endpoint explicitly
    default_config.llm = "azure/gpt-4"
    default_config.api_base = "https://your-resource.openai.azure.com/"
    

Deployment Setup

Azure OpenAI requires explicit model deployments:

  1. Navigate to Azure OpenAI Studio
  2. Create deployment for desired model (e.g., GPT-4)
  3. Note the deployment name
  4. Use deployment name in Biomni configuration

Example Configuration

from biomni.config import default_config
import os

# Set Azure credentials
os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'

# Configure Biomni to use Azure deployment
default_config.llm = "azure/gpt-4-biomni"  # Your deployment name
default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']

Google Gemini

Overview

Google's Gemini models offer multimodal capabilities and competitive performance.

Setup

  1. Obtain API Key:

  2. Environment Configuration:

    export GEMINI_API_KEY="your-key"
    
  3. Set Model:

    from biomni.config import default_config
    
    default_config.llm = "gemini/gemini-1.5-pro"
    # Or: default_config.llm = "gemini/gemini-pro"
    

Available Models

Model Context Window Strengths
gemini/gemini-1.5-pro 1M tokens Very large context, multimodal
gemini/gemini-pro 32K tokens Balanced performance

Use Cases

Gemini excels at:

  • Tasks requiring very large context windows
  • Multimodal analysis (when incorporating images)
  • Cost-effective alternative to GPT-4
# For tasks with large context requirements
default_config.llm = "gemini/gemini-1.5-pro"
default_config.timeout_seconds = 2400  # May need longer timeout

Groq

Overview

Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.

Setup

  1. Get API Key:

  2. Configure:

    export GROQ_API_KEY="gsk_..."
    
  3. Set Model:

    from biomni.config import default_config
    
    default_config.llm = "groq/llama-3.1-70b-versatile"
    # Or: default_config.llm = "groq/mixtral-8x7b-32768"
    

Available Models

Model Context Window Speed Quality
groq/llama-3.1-70b-versatile 32K tokens Very Fast Good
groq/mixtral-8x7b-32768 32K tokens Very Fast Good
groq/llama-3-70b-8192 8K tokens Ultra Fast Moderate

Best Practices

# For rapid prototyping and testing
default_config.llm = "groq/llama-3.1-70b-versatile"
default_config.timeout_seconds = 600  # Groq is fast

# Note: Quality may be lower than GPT-4/Claude for complex tasks
# Recommended for: QC, simple analyses, testing workflows

Ollama (Local Deployment)

Overview

Run LLMs entirely locally for offline use, data privacy, or cost savings.

Setup

  1. Install Ollama:

    # macOS/Linux
    curl -fsSL https://ollama.com/install.sh | sh
    
    # Or download from https://ollama.com/download
    
  2. Pull Models:

    ollama pull llama3       # Meta Llama 3 (8B)
    ollama pull mixtral      # Mixtral (47B)
    ollama pull codellama    # Code-specialized
    ollama pull medllama     # Medical domain (if available)
    
  3. Start Ollama Server:

    ollama serve  # Runs on http://localhost:11434
    
  4. Configure Biomni:

    from biomni.config import default_config
    
    default_config.llm = "ollama/llama3"
    default_config.api_base = "http://localhost:11434"
    

Hardware Requirements

Minimum recommendations:

  • 8B models: 16GB RAM, CPU inference acceptable
  • 70B models: 64GB RAM, GPU highly recommended
  • Storage: 5-50GB per model

Model Selection

# Fast, local, good for testing
default_config.llm = "ollama/llama3"

# Better quality (requires more resources)
default_config.llm = "ollama/mixtral"

# Code generation tasks
default_config.llm = "ollama/codellama"

Advantages & Limitations

Advantages:

  • Complete data privacy
  • No API costs
  • Offline operation
  • Unlimited usage

Limitations:

  • Lower quality than GPT-4/Claude for complex tasks
  • Requires significant hardware
  • Slower inference (especially on CPU)
  • May struggle with specialized biomedical knowledge

AWS Bedrock

Overview

AWS-managed LLM service offering multiple model providers.

Setup

  1. AWS Prerequisites:

    • AWS account with Bedrock access
    • Model access enabled in Bedrock console
    • AWS credentials configured
  2. Configure AWS Credentials:

    # Option 1: AWS CLI
    aws configure
    
    # Option 2: Environment variables
    export AWS_ACCESS_KEY_ID="your-key"
    export AWS_SECRET_ACCESS_KEY="your-secret"
    export AWS_REGION="us-east-1"
    
  3. Enable Model Access:

    • Navigate to AWS Bedrock console
    • Request access to desired models
    • Wait for approval (may take hours/days)
  4. Configure Biomni:

    from biomni.config import default_config
    
    default_config.llm = "bedrock/anthropic.claude-3-sonnet"
    # Or: default_config.llm = "bedrock/anthropic.claude-v2"
    

Available Models

Bedrock provides access to:

  • Anthropic Claude models
  • Amazon Titan models
  • AI21 Jurassic models
  • Cohere Command models
  • Meta Llama models

IAM Permissions

Required IAM policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:*::foundation-model/*"
    }
  ]
}

Example Configuration

from biomni.config import default_config
import boto3

# Verify AWS credentials
session = boto3.Session()
credentials = session.get_credentials()
print(f"AWS Access Key: {credentials.access_key[:8]}...")

# Configure Biomni
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
default_config.timeout_seconds = 1800

Biomni-R0 (Local Specialized Model)

Overview

Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.

Setup

  1. Hardware Requirements:

    • GPU with 48GB+ VRAM (e.g., A100, H100)
    • Or multi-GPU setup (2x 24GB)
    • 100GB+ storage for model weights
  2. Install Dependencies:

    pip install "sglang[all]"
    pip install flashinfer  # Optional but recommended
    
  3. Deploy Model:

    python -m sglang.launch_server \
        --model-path snap-stanford/biomni-r0 \
        --host 0.0.0.0 \
        --port 30000 \
        --trust-remote-code \
        --mem-fraction-static 0.8
    

    For multi-GPU:

    python -m sglang.launch_server \
        --model-path snap-stanford/biomni-r0 \
        --host 0.0.0.0 \
        --port 30000 \
        --trust-remote-code \
        --tp 2  # Tensor parallelism across 2 GPUs
    
  4. Configure Biomni:

    from biomni.config import default_config
    
    default_config.llm = "openai/biomni-r0"
    default_config.api_base = "http://localhost:30000/v1"
    default_config.timeout_seconds = 2400  # Longer for complex reasoning
    

When to Use Biomni-R0

Biomni-R0 excels at:

  • Multi-step biological reasoning
  • Complex experimental design
  • Hypothesis generation and evaluation
  • Literature-informed analysis
  • Tasks requiring deep biological knowledge
# For complex biological reasoning tasks
default_config.llm = "openai/biomni-r0"

agent.go("""
Design a comprehensive CRISPR screening experiment to identify synthetic
lethal interactions with TP53 mutations in cancer cells, including:
1. Rationale and hypothesis
2. Guide RNA library design strategy
3. Experimental controls
4. Statistical analysis plan
5. Expected outcomes and validation approach
""")

Performance Comparison

Model Speed Biological Reasoning Code Quality Cost
GPT-4 Fast Good Excellent Medium
Claude Sonnet 4 Fast Excellent Excellent Medium
Biomni-R0 Moderate Outstanding Good Free (local)

Multi-Provider Strategy

Intelligent Model Selection

Use different models for different task types:

from biomni.agent import A1
from biomni.config import default_config

# Strategy 1: Task-based selection
def get_agent_for_task(task_complexity):
    if task_complexity == "simple":
        default_config.llm = "gpt-3.5-turbo"
        default_config.timeout_seconds = 300
    elif task_complexity == "medium":
        default_config.llm = "claude-sonnet-4-20250514"
        default_config.timeout_seconds = 1200
    else:  # complex
        default_config.llm = "openai/biomni-r0"
        default_config.timeout_seconds = 2400

    return A1(path='./data')

# Strategy 2: Fallback on failure
def execute_with_fallback(task):
    models = [
        "claude-sonnet-4-20250514",
        "gpt-4o",
        "claude-opus-4-20250514"
    ]

    for model in models:
        try:
            default_config.llm = model
            agent = A1(path='./data')
            agent.go(task)
            return
        except Exception as e:
            print(f"Failed with {model}: {e}, trying next...")

    raise Exception("All models failed")

Cost Optimization Strategy

# Phase 1: Rapid prototyping with cheap models
default_config.llm = "gpt-3.5-turbo"
agent.go("Quick exploratory analysis of dataset structure")

# Phase 2: Detailed analysis with high-quality models
default_config.llm = "claude-sonnet-4-20250514"
agent.go("Comprehensive differential expression analysis with pathway enrichment")

# Phase 3: Complex reasoning with specialized models
default_config.llm = "openai/biomni-r0"
agent.go("Generate biological hypotheses based on multi-omics integration")

Troubleshooting

Common Issues

Issue: "API key not found"

  • Verify environment variable is set: echo $ANTHROPIC_API_KEY
  • Check .env file exists and is in correct location
  • Try setting key programmatically: os.environ['ANTHROPIC_API_KEY'] = 'key'

Issue: "Rate limit exceeded"

  • Implement exponential backoff and retry
  • Upgrade API tier if available
  • Switch to alternative provider temporarily

Issue: "Model not found"

  • Verify model identifier is correct
  • Check API key has access to requested model
  • For Azure: ensure deployment exists with exact name

Issue: "Timeout errors"

  • Increase default_config.timeout_seconds
  • Break complex tasks into smaller steps
  • Consider using faster model for initial phases

Issue: "Connection refused (Ollama/Biomni-R0)"

  • Verify local server is running
  • Check port is not blocked by firewall
  • Confirm api_base URL is correct

Testing Configuration

from biomni.utils import list_available_models, validate_environment

# Check environment setup
status = validate_environment()
print("Environment Status:", status)

# List available models based on configured keys
models = list_available_models()
print("Available Models:", models)

# Test specific model
try:
    from biomni.agent import A1
    agent = A1(path='./data', llm='claude-sonnet-4-20250514')
    agent.go("Print 'Configuration successful!'")
except Exception as e:
    print(f"Configuration test failed: {e}")

Best Practices Summary

  1. For most users: Start with Claude Sonnet 4 or GPT-4o

  2. For cost sensitivity: Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production

  3. For privacy/offline: Deploy Ollama locally

  4. For complex reasoning: Use Biomni-R0 if hardware available

  5. For enterprise: Consider Azure OpenAI or AWS Bedrock

  6. For speed: Use Groq for rapid iteration

  7. Always:

    • Set appropriate timeouts
    • Implement error handling and retries
    • Log model and configuration for reproducibility
    • Test configuration before production use