claude-scientific-skills/scientific-packages/biomni/references/llm_providers.md

# LLM Provider Configuration Guide

This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.

## Overview

Biomni supports multiple LLM providers through a unified interface. Configure providers using:
- Environment variables
- `.env` files
- Runtime configuration via `default_config`

## Quick Reference Table

| Provider | Recommended For | API Key Required | Cost | Setup Complexity |
|----------|----------------|------------------|------|------------------|
| Anthropic Claude | Most biomedical tasks | Yes | Medium | Easy |
| OpenAI | General tasks | Yes | Medium-High | Easy |
| Azure OpenAI | Enterprise deployment | Yes | Varies | Medium |
| Google Gemini | Multimodal tasks | Yes | Medium | Easy |
| Groq | Fast inference | Yes | Low | Easy |
| Ollama | Local/offline use | No | Free | Medium |
| AWS Bedrock | AWS ecosystem | Yes | Varies | Hard |
| Biomni-R0 | Complex biological reasoning | No | Free | Hard |

## Anthropic Claude (Recommended)

### Overview

Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.

### Setup

1. **Obtain API Key:**
   - Sign up at https://console.anthropic.com/
   - Navigate to API Keys section
   - Generate a new key

2. **Configure Environment:**

   **Option A: Environment Variable**
   ```bash
   export ANTHROPIC_API_KEY="sk-ant-api03-..."
   ```

   **Option B: .env File**
   ```bash
   # .env file in project root
   ANTHROPIC_API_KEY=sk-ant-api03-...
   ```

3. **Set Model in Code:**
   ```python
   from biomni.config import default_config

   # Claude Sonnet 4 (Recommended)
   default_config.llm = "claude-sonnet-4-20250514"

   # Claude Opus 4 (Most capable)
   default_config.llm = "claude-opus-4-20250514"

   # Claude 3.5 Sonnet (Previous version)
   default_config.llm = "claude-3-5-sonnet-20241022"
   ```

### Available Models

| Model | Context Window | Strengths | Best For |
|-------|---------------|-----------|----------|
| `claude-sonnet-4-20250514` | 200K tokens | Balanced performance, cost-effective | Most biomedical tasks |
| `claude-opus-4-20250514` | 200K tokens | Highest capability, complex reasoning | Difficult multi-step analyses |
| `claude-3-5-sonnet-20241022` | 200K tokens | Fast, reliable | Standard workflows |
| `claude-3-opus-20240229` | 200K tokens | Strong reasoning | Legacy support |

### Advanced Configuration

```python
from biomni.config import default_config

# Use Claude with custom parameters
default_config.llm = "claude-sonnet-4-20250514"
default_config.timeout_seconds = 1800

# Optional: Custom API endpoint (for proxy/enterprise)
default_config.api_base = "https://your-proxy.com/v1"
```

### Cost Estimation

Approximate costs per 1M tokens (as of January 2025):
- Input: $3-15 depending on model
- Output: $15-75 depending on model

For a typical biomedical analysis (~50K tokens total): $0.50-$2.00

## OpenAI

### Overview

OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.

### Setup

1. **Obtain API Key:**
   - Sign up at https://platform.openai.com/
   - Navigate to API Keys
   - Create new secret key

2. **Configure Environment:**

   ```bash
   export OPENAI_API_KEY="sk-proj-..."
   ```

   Or in `.env`:
   ```
   OPENAI_API_KEY=sk-proj-...
   ```

3. **Set Model:**
   ```python
   from biomni.config import default_config

   default_config.llm = "gpt-4o"          # Recommended
   # default_config.llm = "gpt-4"         # Previous flagship
   # default_config.llm = "gpt-4-turbo"   # Fast variant
   # default_config.llm = "gpt-3.5-turbo" # Budget option
   ```

### Available Models

| Model | Context Window | Strengths | Cost |
|-------|---------------|-----------|------|
| `gpt-4o` | 128K tokens | Fast, multimodal | Medium |
| `gpt-4-turbo` | 128K tokens | Fast inference | Medium |
| `gpt-4` | 8K tokens | Reliable | High |
| `gpt-3.5-turbo` | 16K tokens | Fast, cheap | Low |

### Cost Optimization

```python
# For exploratory analysis (budget-conscious)
default_config.llm = "gpt-3.5-turbo"

# For production analysis (quality-focused)
default_config.llm = "gpt-4o"
```

## Azure OpenAI

### Overview

Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.

### Setup

1. **Azure Prerequisites:**
   - Active Azure subscription
   - Azure OpenAI resource created
   - Model deployment configured

2. **Environment Variables:**
   ```bash
   export AZURE_OPENAI_API_KEY="your-key"
   export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
   export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
   ```

3. **Configuration:**
   ```python
   from biomni.config import default_config

   # Option 1: Use deployment name
   default_config.llm = "azure/your-deployment-name"

   # Option 2: Specify endpoint explicitly
   default_config.llm = "azure/gpt-4"
   default_config.api_base = "https://your-resource.openai.azure.com/"
   ```

### Deployment Setup

Azure OpenAI requires explicit model deployments:

1. Navigate to Azure OpenAI Studio
2. Create deployment for desired model (e.g., GPT-4)
3. Note the deployment name
4. Use deployment name in Biomni configuration

### Example Configuration

```python
from biomni.config import default_config
import os

# Set Azure credentials
os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'

# Configure Biomni to use Azure deployment
default_config.llm = "azure/gpt-4-biomni"  # Your deployment name
default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
```

## Google Gemini

### Overview

Google's Gemini models offer multimodal capabilities and competitive performance.

### Setup

1. **Obtain API Key:**
   - Visit https://makersuite.google.com/app/apikey
   - Create new API key

2. **Environment Configuration:**
   ```bash
   export GEMINI_API_KEY="your-key"
   ```

3. **Set Model:**
   ```python
   from biomni.config import default_config

   default_config.llm = "gemini/gemini-1.5-pro"
   # Or: default_config.llm = "gemini/gemini-pro"
   ```

### Available Models

| Model | Context Window | Strengths |
|-------|---------------|-----------|
| `gemini/gemini-1.5-pro` | 1M tokens | Very large context, multimodal |
| `gemini/gemini-pro` | 32K tokens | Balanced performance |

### Use Cases

Gemini excels at:
- Tasks requiring very large context windows
- Multimodal analysis (when incorporating images)
- Cost-effective alternative to GPT-4

```python
# For tasks with large context requirements
default_config.llm = "gemini/gemini-1.5-pro"
default_config.timeout_seconds = 2400  # May need longer timeout
```

## Groq

### Overview

Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.

### Setup

1. **Get API Key:**
   - Sign up at https://console.groq.com/
   - Generate API key

2. **Configure:**
   ```bash
   export GROQ_API_KEY="gsk_..."
   ```

3. **Set Model:**
   ```python
   from biomni.config import default_config

   default_config.llm = "groq/llama-3.1-70b-versatile"
   # Or: default_config.llm = "groq/mixtral-8x7b-32768"
   ```

### Available Models

| Model | Context Window | Speed | Quality |
|-------|---------------|-------|---------|
| `groq/llama-3.1-70b-versatile` | 32K tokens | Very Fast | Good |
| `groq/mixtral-8x7b-32768` | 32K tokens | Very Fast | Good |
| `groq/llama-3-70b-8192` | 8K tokens | Ultra Fast | Moderate |

### Best Practices

```python
# For rapid prototyping and testing
default_config.llm = "groq/llama-3.1-70b-versatile"
default_config.timeout_seconds = 600  # Groq is fast

# Note: Quality may be lower than GPT-4/Claude for complex tasks
# Recommended for: QC, simple analyses, testing workflows
```

## Ollama (Local Deployment)

### Overview

Run LLMs entirely locally for offline use, data privacy, or cost savings.

### Setup

1. **Install Ollama:**
   ```bash
   # macOS/Linux
   curl -fsSL https://ollama.com/install.sh | sh

   # Or download from https://ollama.com/download
   ```

2. **Pull Models:**
   ```bash
   ollama pull llama3       # Meta Llama 3 (8B)
   ollama pull mixtral      # Mixtral (47B)
   ollama pull codellama    # Code-specialized
   ollama pull medllama     # Medical domain (if available)
   ```

3. **Start Ollama Server:**
   ```bash
   ollama serve  # Runs on http://localhost:11434
   ```

4. **Configure Biomni:**
   ```python
   from biomni.config import default_config

   default_config.llm = "ollama/llama3"
   default_config.api_base = "http://localhost:11434"
   ```

### Hardware Requirements

Minimum recommendations:
- **8B models:** 16GB RAM, CPU inference acceptable
- **70B models:** 64GB RAM, GPU highly recommended
- **Storage:** 5-50GB per model

### Model Selection

```python
# Fast, local, good for testing
default_config.llm = "ollama/llama3"

# Better quality (requires more resources)
default_config.llm = "ollama/mixtral"

# Code generation tasks
default_config.llm = "ollama/codellama"
```

### Advantages & Limitations

**Advantages:**
- Complete data privacy
- No API costs
- Offline operation
- Unlimited usage

**Limitations:**
- Lower quality than GPT-4/Claude for complex tasks
- Requires significant hardware
- Slower inference (especially on CPU)
- May struggle with specialized biomedical knowledge

## AWS Bedrock

### Overview

AWS-managed LLM service offering multiple model providers.

### Setup

1. **AWS Prerequisites:**
   - AWS account with Bedrock access
   - Model access enabled in Bedrock console
   - AWS credentials configured

2. **Configure AWS Credentials:**
   ```bash
   # Option 1: AWS CLI
   aws configure

   # Option 2: Environment variables
   export AWS_ACCESS_KEY_ID="your-key"
   export AWS_SECRET_ACCESS_KEY="your-secret"
   export AWS_REGION="us-east-1"
   ```

3. **Enable Model Access:**
   - Navigate to AWS Bedrock console
   - Request access to desired models
   - Wait for approval (may take hours/days)

4. **Configure Biomni:**
   ```python
   from biomni.config import default_config

   default_config.llm = "bedrock/anthropic.claude-3-sonnet"
   # Or: default_config.llm = "bedrock/anthropic.claude-v2"
   ```

### Available Models

Bedrock provides access to:
- Anthropic Claude models
- Amazon Titan models
- AI21 Jurassic models
- Cohere Command models
- Meta Llama models

### IAM Permissions

Required IAM policy:
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:*::foundation-model/*"
    }
  ]
}
```

### Example Configuration

```python
from biomni.config import default_config
import boto3

# Verify AWS credentials
session = boto3.Session()
credentials = session.get_credentials()
print(f"AWS Access Key: {credentials.access_key[:8]}...")

# Configure Biomni
default_config.llm = "bedrock/anthropic.claude-3-sonnet"
default_config.timeout_seconds = 1800
```

## Biomni-R0 (Local Specialized Model)

### Overview

Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.

### Setup

1. **Hardware Requirements:**
   - GPU with 48GB+ VRAM (e.g., A100, H100)
   - Or multi-GPU setup (2x 24GB)
   - 100GB+ storage for model weights

2. **Install Dependencies:**
   ```bash
   pip install "sglang[all]"
   pip install flashinfer  # Optional but recommended
   ```

3. **Deploy Model:**
   ```bash
   python -m sglang.launch_server \
       --model-path snap-stanford/biomni-r0 \
       --host 0.0.0.0 \
       --port 30000 \
       --trust-remote-code \
       --mem-fraction-static 0.8
   ```

   For multi-GPU:
   ```bash
   python -m sglang.launch_server \
       --model-path snap-stanford/biomni-r0 \
       --host 0.0.0.0 \
       --port 30000 \
       --trust-remote-code \
       --tp 2  # Tensor parallelism across 2 GPUs
   ```

4. **Configure Biomni:**
   ```python
   from biomni.config import default_config

   default_config.llm = "openai/biomni-r0"
   default_config.api_base = "http://localhost:30000/v1"
   default_config.timeout_seconds = 2400  # Longer for complex reasoning
   ```

### When to Use Biomni-R0

Biomni-R0 excels at:
- Multi-step biological reasoning
- Complex experimental design
- Hypothesis generation and evaluation
- Literature-informed analysis
- Tasks requiring deep biological knowledge

```python
# For complex biological reasoning tasks
default_config.llm = "openai/biomni-r0"

agent.go("""
Design a comprehensive CRISPR screening experiment to identify synthetic
lethal interactions with TP53 mutations in cancer cells, including:
1. Rationale and hypothesis
2. Guide RNA library design strategy
3. Experimental controls
4. Statistical analysis plan
5. Expected outcomes and validation approach
""")
```

### Performance Comparison

| Model | Speed | Biological Reasoning | Code Quality | Cost |
|-------|-------|---------------------|--------------|------|
| GPT-4 | Fast | Good | Excellent | Medium |
| Claude Sonnet 4 | Fast | Excellent | Excellent | Medium |
| Biomni-R0 | Moderate | Outstanding | Good | Free (local) |

## Multi-Provider Strategy

### Intelligent Model Selection

Use different models for different task types:

```python
from biomni.agent import A1
from biomni.config import default_config

# Strategy 1: Task-based selection
def get_agent_for_task(task_complexity):
    if task_complexity == "simple":
        default_config.llm = "gpt-3.5-turbo"
        default_config.timeout_seconds = 300
    elif task_complexity == "medium":
        default_config.llm = "claude-sonnet-4-20250514"
        default_config.timeout_seconds = 1200
    else:  # complex
        default_config.llm = "openai/biomni-r0"
        default_config.timeout_seconds = 2400

    return A1(path='./data')

# Strategy 2: Fallback on failure
def execute_with_fallback(task):
    models = [
        "claude-sonnet-4-20250514",
        "gpt-4o",
        "claude-opus-4-20250514"
    ]

    for model in models:
        try:
            default_config.llm = model
            agent = A1(path='./data')
            agent.go(task)
            return
        except Exception as e:
            print(f"Failed with {model}: {e}, trying next...")

    raise Exception("All models failed")
```

### Cost Optimization Strategy

```python
# Phase 1: Rapid prototyping with cheap models
default_config.llm = "gpt-3.5-turbo"
agent.go("Quick exploratory analysis of dataset structure")

# Phase 2: Detailed analysis with high-quality models
default_config.llm = "claude-sonnet-4-20250514"
agent.go("Comprehensive differential expression analysis with pathway enrichment")

# Phase 3: Complex reasoning with specialized models
default_config.llm = "openai/biomni-r0"
agent.go("Generate biological hypotheses based on multi-omics integration")
```

## Troubleshooting

### Common Issues

**Issue: "API key not found"**
- Verify environment variable is set: `echo $ANTHROPIC_API_KEY`
- Check `.env` file exists and is in correct location
- Try setting key programmatically: `os.environ['ANTHROPIC_API_KEY'] = 'key'`

**Issue: "Rate limit exceeded"**
- Implement exponential backoff and retry
- Upgrade API tier if available
- Switch to alternative provider temporarily

**Issue: "Model not found"**
- Verify model identifier is correct
- Check API key has access to requested model
- For Azure: ensure deployment exists with exact name

**Issue: "Timeout errors"**
- Increase `default_config.timeout_seconds`
- Break complex tasks into smaller steps
- Consider using faster model for initial phases

**Issue: "Connection refused (Ollama/Biomni-R0)"**
- Verify local server is running
- Check port is not blocked by firewall
- Confirm `api_base` URL is correct

### Testing Configuration

```python
from biomni.utils import list_available_models, validate_environment

# Check environment setup
status = validate_environment()
print("Environment Status:", status)

# List available models based on configured keys
models = list_available_models()
print("Available Models:", models)

# Test specific model
try:
    from biomni.agent import A1
    agent = A1(path='./data', llm='claude-sonnet-4-20250514')
    agent.go("Print 'Configuration successful!'")
except Exception as e:
    print(f"Configuration test failed: {e}")
```

## Best Practices Summary

1. **For most users:** Start with Claude Sonnet 4 or GPT-4o
2. **For cost sensitivity:** Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production
3. **For privacy/offline:** Deploy Ollama locally
4. **For complex reasoning:** Use Biomni-R0 if hardware available
5. **For enterprise:** Consider Azure OpenAI or AWS Bedrock
6. **For speed:** Use Groq for rapid iteration

7. **Always:**
   - Set appropriate timeouts
   - Implement error handling and retries
   - Log model and configuration for reproducibility
   - Test configuration before production use