Add more scientific skills

2026-03-27 07:09:27 +08:00 · 2025-10-19 14:12:02 -07:00
parent 78d5ac2b56
commit 660c8574d0
210 changed files with 88957 additions and 1 deletions
--- a/scientific-packages/biomni/references/api_reference.md
+++ b/scientific-packages/biomni/references/api_reference.md
@@ -0,0 +1,635 @@
+# Biomni API Reference
+
+This document provides comprehensive API documentation for the Biomni biomedical AI agent system.
+
+## Core Classes
+
+### A1 Agent
+
+The primary agent class for executing biomedical research tasks.
+
+#### Initialization
+
+```python
+from biomni.agent import A1
+
+agent = A1(
+    path='./data',              # Path to biomedical knowledge base
+    llm='claude-sonnet-4-20250514',  # LLM model identifier
+    timeout=None,               # Optional timeout in seconds
+    verbose=True               # Enable detailed logging
+)
+```
+
+**Parameters:**
+
+- `path` (str, required): Directory path where the biomedical knowledge base is stored or will be downloaded. First-time initialization will download ~11GB of data.
+- `llm` (str, optional): LLM model identifier. Defaults to the value in `default_config.llm`. Supports multiple providers (see LLM Providers section).
+- `timeout` (int, optional): Maximum execution time in seconds for agent operations. Overrides `default_config.timeout_seconds`.
+- `verbose` (bool, optional): Enable verbose logging for debugging. Default: True.
+
+**Returns:** A1 agent instance ready for task execution.
+
+#### Methods
+
+##### `go(task_description: str) -> None`
+
+Execute a biomedical research task autonomously.
+
+```python
+agent.go("Analyze this scRNA-seq dataset and identify cell types")
+```
+
+**Parameters:**
+- `task_description` (str, required): Natural language description of the biomedical task to execute. Be specific about:
+  - Data location and format
+  - Desired analysis or output
+  - Any specific methods or parameters
+  - Expected results format
+
+**Behavior:**
+1. Decomposes the task into executable steps
+2. Retrieves relevant biomedical knowledge from the data lake
+3. Generates and executes Python/R code
+4. Provides results and visualizations
+5. Handles errors and retries with refinement
+
+**Notes:**
+- Executes code with system privileges - use in sandboxed environments
+- Long-running tasks may require timeout adjustments
+- Intermediate results are displayed during execution
+
+##### `save_conversation_history(output_path: str, format: str = 'pdf') -> None`
+
+Export conversation history and execution trace as a formatted report.
+
+```python
+agent.save_conversation_history(
+    output_path='./reports/analysis_log.pdf',
+    format='pdf'
+)
+```
+
+**Parameters:**
+- `output_path` (str, required): File path for the output report
+- `format` (str, optional): Output format. Options: 'pdf', 'markdown'. Default: 'pdf'
+
+**Requirements:**
+- For PDF: Install one of: WeasyPrint, markdown2pdf, or Pandoc
+  ```bash
+  pip install weasyprint  # Recommended
+  # or
+  pip install markdown2pdf
+  # or install Pandoc system-wide
+  ```
+
+**Report Contents:**
+- Task description and parameters
+- Retrieved biomedical knowledge
+- Generated code with execution traces
+- Results, visualizations, and outputs
+- Timestamps and execution metadata
+
+##### `add_mcp(config_path: str) -> None`
+
+Add Model Context Protocol (MCP) tools to extend agent capabilities.
+
+```python
+agent.add_mcp(config_path='./mcp_tools_config.json')
+```
+
+**Parameters:**
+- `config_path` (str, required): Path to MCP configuration JSON file
+
+**MCP Configuration Format:**
+```json
+{
+  "tools": [
+    {
+      "name": "tool_name",
+      "endpoint": "http://localhost:8000/tool",
+      "description": "Tool description for LLM",
+      "parameters": {
+        "param1": "string",
+        "param2": "integer"
+      }
+    }
+  ]
+}
+```
+
+**Use Cases:**
+- Connect to laboratory information systems
+- Integrate proprietary databases
+- Access specialized computational resources
+- Link to institutional data repositories
+
+## Configuration
+
+### default_config
+
+Global configuration object for Biomni settings.
+
+```python
+from biomni.config import default_config
+```
+
+#### Attributes
+
+##### `llm: str`
+
+Default LLM model identifier for all agent instances.
+
+```python
+default_config.llm = "claude-sonnet-4-20250514"
+```
+
+**Supported Models:**
+
+**Anthropic:**
+- `claude-sonnet-4-20250514` (Recommended)
+- `claude-opus-4-20250514`
+- `claude-3-5-sonnet-20241022`
+- `claude-3-opus-20240229`
+
+**OpenAI:**
+- `gpt-4o`
+- `gpt-4`
+- `gpt-4-turbo`
+- `gpt-3.5-turbo`
+
+**Azure OpenAI:**
+- `azure/gpt-4`
+- `azure/<deployment-name>`
+
+**Google Gemini:**
+- `gemini/gemini-pro`
+- `gemini/gemini-1.5-pro`
+
+**Groq:**
+- `groq/llama-3.1-70b-versatile`
+- `groq/mixtral-8x7b-32768`
+
+**Ollama (Local):**
+- `ollama/llama3`
+- `ollama/mistral`
+- `ollama/<model-name>`
+
+**AWS Bedrock:**
+- `bedrock/anthropic.claude-v2`
+- `bedrock/anthropic.claude-3-sonnet`
+
+**Custom/Biomni-R0:**
+- `openai/biomni-r0` (requires local SGLang deployment)
+
+##### `timeout_seconds: int`
+
+Default timeout for agent operations in seconds.
+
+```python
+default_config.timeout_seconds = 1200  # 20 minutes
+```
+
+**Recommended Values:**
+- Simple tasks (QC, basic analysis): 300-600 seconds
+- Medium tasks (differential expression, clustering): 600-1200 seconds
+- Complex tasks (full pipelines, ML models): 1200-3600 seconds
+- Very complex tasks: 3600+ seconds
+
+##### `data_path: str`
+
+Default path to biomedical knowledge base.
+
+```python
+default_config.data_path = "/path/to/biomni/data"
+```
+
+**Storage Requirements:**
+- Initial download: ~11GB
+- Extracted size: ~15GB
+- Additional working space: ~5-10GB recommended
+
+##### `api_base: str`
+
+Custom API endpoint for LLM providers (advanced usage).
+
+```python
+# For local Biomni-R0 deployment
+default_config.api_base = "http://localhost:30000/v1"
+
+# For custom OpenAI-compatible endpoints
+default_config.api_base = "https://your-endpoint.com/v1"
+```
+
+##### `max_retries: int`
+
+Number of retry attempts for failed operations.
+
+```python
+default_config.max_retries = 3
+```
+
+#### Methods
+
+##### `reset() -> None`
+
+Reset all configuration values to system defaults.
+
+```python
+default_config.reset()
+```
+
+## Database Query System
+
+Biomni includes a retrieval-augmented generation (RAG) system for querying the biomedical knowledge base.
+
+### Query Functions
+
+#### `query_genes(query: str, top_k: int = 10) -> List[Dict]`
+
+Query gene information from integrated databases.
+
+```python
+from biomni.database import query_genes
+
+results = query_genes(
+    query="genes involved in p53 pathway",
+    top_k=20
+)
+```
+
+**Parameters:**
+- `query` (str): Natural language or gene identifier query
+- `top_k` (int): Number of results to return
+
+**Returns:** List of dictionaries containing:
+- `gene_symbol`: Official gene symbol
+- `gene_name`: Full gene name
+- `description`: Functional description
+- `pathways`: Associated biological pathways
+- `go_terms`: Gene Ontology annotations
+- `diseases`: Associated diseases
+- `similarity_score`: Relevance score (0-1)
+
+#### `query_proteins(query: str, top_k: int = 10) -> List[Dict]`
+
+Query protein information from UniProt and other sources.
+
+```python
+from biomni.database import query_proteins
+
+results = query_proteins(
+    query="kinase proteins in cell cycle",
+    top_k=15
+)
+```
+
+**Returns:** List of dictionaries with protein metadata:
+- `uniprot_id`: UniProt accession
+- `protein_name`: Protein name
+- `function`: Functional annotation
+- `domains`: Protein domains
+- `subcellular_location`: Cellular localization
+- `similarity_score`: Relevance score
+
+#### `query_drugs(query: str, top_k: int = 10) -> List[Dict]`
+
+Query drug and compound information.
+
+```python
+from biomni.database import query_drugs
+
+results = query_drugs(
+    query="FDA approved cancer drugs targeting EGFR",
+    top_k=10
+)
+```
+
+**Returns:** Drug information including:
+- `drug_name`: Common name
+- `drugbank_id`: DrugBank identifier
+- `indication`: Therapeutic indication
+- `mechanism`: Mechanism of action
+- `targets`: Molecular targets
+- `approval_status`: Regulatory status
+- `smiles`: Chemical structure (SMILES notation)
+
+#### `query_diseases(query: str, top_k: int = 10) -> List[Dict]`
+
+Query disease information from clinical databases.
+
+```python
+from biomni.database import query_diseases
+
+results = query_diseases(
+    query="autoimmune diseases affecting joints",
+    top_k=10
+)
+```
+
+**Returns:** Disease data:
+- `disease_name`: Standard disease name
+- `disease_id`: Ontology identifier
+- `symptoms`: Clinical manifestations
+- `associated_genes`: Genetic associations
+- `prevalence`: Epidemiological data
+
+#### `query_pathways(query: str, top_k: int = 10) -> List[Dict]`
+
+Query biological pathways from KEGG, Reactome, and other sources.
+
+```python
+from biomni.database import query_pathways
+
+results = query_pathways(
+    query="immune response signaling pathways",
+    top_k=15
+)
+```
+
+**Returns:** Pathway information:
+- `pathway_name`: Pathway name
+- `pathway_id`: Database identifier
+- `genes`: Genes in pathway
+- `description`: Functional description
+- `source`: Database source (KEGG, Reactome, etc.)
+
+## Data Structures
+
+### TaskResult
+
+Result object returned by complex agent operations.
+
+```python
+class TaskResult:
+    success: bool           # Whether task completed successfully
+    output: Any            # Task output (varies by task)
+    code: str             # Generated code
+    execution_time: float # Execution time in seconds
+    error: Optional[str]  # Error message if failed
+    metadata: Dict        # Additional metadata
+```
+
+### BiomedicalEntity
+
+Base class for biomedical entities in the knowledge base.
+
+```python
+class BiomedicalEntity:
+    entity_id: str        # Unique identifier
+    entity_type: str      # Type (gene, protein, drug, etc.)
+    name: str            # Entity name
+    description: str     # Description
+    attributes: Dict     # Additional attributes
+    references: List[str] # Literature references
+```
+
+## Utility Functions
+
+### `download_data(path: str, force: bool = False) -> None`
+
+Manually download or update the biomedical knowledge base.
+
+```python
+from biomni.utils import download_data
+
+download_data(
+    path='./data',
+    force=True  # Force re-download
+)
+```
+
+### `validate_environment() -> Dict[str, bool]`
+
+Check if the environment is properly configured.
+
+```python
+from biomni.utils import validate_environment
+
+status = validate_environment()
+# Returns: {
+#   'conda_env': True,
+#   'api_keys': True,
+#   'data_available': True,
+#   'dependencies': True
+# }
+```
+
+### `list_available_models() -> List[str]`
+
+Get a list of available LLM models based on configured API keys.
+
+```python
+from biomni.utils import list_available_models
+
+models = list_available_models()
+# Returns: ['claude-sonnet-4-20250514', 'gpt-4o', ...]
+```
+
+## Error Handling
+
+### Common Exceptions
+
+#### `BiomniConfigError`
+
+Raised when configuration is invalid or incomplete.
+
+```python
+from biomni.exceptions import BiomniConfigError
+
+try:
+    agent = A1(path='./data')
+except BiomniConfigError as e:
+    print(f"Configuration error: {e}")
+```
+
+#### `BiomniExecutionError`
+
+Raised when code generation or execution fails.
+
+```python
+from biomni.exceptions import BiomniExecutionError
+
+try:
+    agent.go("invalid task")
+except BiomniExecutionError as e:
+    print(f"Execution failed: {e}")
+    # Access failed code: e.code
+    # Access error details: e.details
+```
+
+#### `BiomniDataError`
+
+Raised when knowledge base or data access fails.
+
+```python
+from biomni.exceptions import BiomniDataError
+
+try:
+    results = query_genes("unknown query format")
+except BiomniDataError as e:
+    print(f"Data access error: {e}")
+```
+
+#### `BiomniTimeoutError`
+
+Raised when operations exceed timeout limit.
+
+```python
+from biomni.exceptions import BiomniTimeoutError
+
+try:
+    agent.go("very complex long-running task")
+except BiomniTimeoutError as e:
+    print(f"Task timed out after {e.duration} seconds")
+    # Partial results may be available: e.partial_results
+```
+
+## Best Practices
+
+### Efficient Knowledge Retrieval
+
+Pre-query databases for relevant context before complex tasks:
+
+```python
+from biomni.database import query_genes, query_pathways
+
+# Gather relevant biological context first
+genes = query_genes("cell cycle genes", top_k=50)
+pathways = query_pathways("cell cycle regulation", top_k=20)
+
+# Then execute task with enriched context
+agent.go(f"""
+Analyze the cell cycle progression in this dataset.
+Focus on these genes: {[g['gene_symbol'] for g in genes]}
+Consider these pathways: {[p['pathway_name'] for p in pathways]}
+""")
+```
+
+### Error Recovery
+
+Implement robust error handling for production workflows:
+
+```python
+from biomni.exceptions import BiomniExecutionError, BiomniTimeoutError
+
+max_attempts = 3
+for attempt in range(max_attempts):
+    try:
+        agent.go("complex biomedical task")
+        break
+    except BiomniTimeoutError:
+        # Increase timeout and retry
+        default_config.timeout_seconds *= 2
+        print(f"Timeout, retrying with {default_config.timeout_seconds}s timeout")
+    except BiomniExecutionError as e:
+        # Refine task based on error
+        print(f"Execution failed: {e}, refining task...")
+        # Optionally modify task description
+    else:
+        print("Task failed after max attempts")
+```
+
+### Memory Management
+
+For large-scale analyses, manage memory explicitly:
+
+```python
+import gc
+
+# Process datasets in chunks
+for chunk_id in range(num_chunks):
+    agent.go(f"Process data chunk {chunk_id} located at data/chunk_{chunk_id}.h5ad")
+
+    # Force garbage collection between chunks
+    gc.collect()
+
+    # Save intermediate results
+    agent.save_conversation_history(f"./reports/chunk_{chunk_id}.pdf")
+```
+
+### Reproducibility
+
+Ensure reproducible analyses by:
+
+1. **Fixing random seeds:**
+```python
+agent.go("Set random seed to 42 for all analyses, then perform clustering...")
+```
+
+2. **Logging configuration:**
+```python
+import json
+config_log = {
+    'llm': default_config.llm,
+    'timeout': default_config.timeout_seconds,
+    'data_path': default_config.data_path,
+    'timestamp': datetime.now().isoformat()
+}
+with open('config_log.json', 'w') as f:
+    json.dump(config_log, f, indent=2)
+```
+
+3. **Saving execution traces:**
+```python
+# Always save detailed reports
+agent.save_conversation_history('./reports/full_analysis.pdf')
+```
+
+## Performance Optimization
+
+### Model Selection Strategy
+
+Choose models based on task characteristics:
+
+```python
+# For exploratory, simple tasks
+default_config.llm = "gpt-3.5-turbo"  # Fast, cost-effective
+
+# For standard biomedical analyses
+default_config.llm = "claude-sonnet-4-20250514"  # Recommended
+
+# For complex reasoning and hypothesis generation
+default_config.llm = "claude-opus-4-20250514"  # Highest quality
+
+# For specialized biological reasoning
+default_config.llm = "openai/biomni-r0"  # Requires local deployment
+```
+
+### Timeout Tuning
+
+Set appropriate timeouts based on task complexity:
+
+```python
+# Quick queries and simple analyses
+agent = A1(path='./data', timeout=300)
+
+# Standard workflows
+agent = A1(path='./data', timeout=1200)
+
+# Full pipelines with ML training
+agent = A1(path='./data', timeout=3600)
+```
+
+### Caching and Reuse
+
+Reuse agent instances for multiple related tasks:
+
+```python
+# Create agent once
+agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+
+# Execute multiple related tasks
+tasks = [
+    "Load and QC the scRNA-seq dataset",
+    "Perform clustering with resolution 0.5",
+    "Identify marker genes for each cluster",
+    "Annotate cell types based on markers"
+]
+
+for task in tasks:
+    agent.go(task)
+
+# Save complete workflow
+agent.save_conversation_history('./reports/full_workflow.pdf')
+```
--- a/scientific-packages/biomni/references/llm_providers.md
+++ b/scientific-packages/biomni/references/llm_providers.md
@@ -0,0 +1,649 @@
+# LLM Provider Configuration Guide
+
+This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.
+
+## Overview
+
+Biomni supports multiple LLM providers through a unified interface. Configure providers using:
+- Environment variables
+- `.env` files
+- Runtime configuration via `default_config`
+
+## Quick Reference Table
+
+| Provider | Recommended For | API Key Required | Cost | Setup Complexity |
+|----------|----------------|------------------|------|------------------|
+| Anthropic Claude | Most biomedical tasks | Yes | Medium | Easy |
+| OpenAI | General tasks | Yes | Medium-High | Easy |
+| Azure OpenAI | Enterprise deployment | Yes | Varies | Medium |
+| Google Gemini | Multimodal tasks | Yes | Medium | Easy |
+| Groq | Fast inference | Yes | Low | Easy |
+| Ollama | Local/offline use | No | Free | Medium |
+| AWS Bedrock | AWS ecosystem | Yes | Varies | Hard |
+| Biomni-R0 | Complex biological reasoning | No | Free | Hard |
+
+## Anthropic Claude (Recommended)
+
+### Overview
+
+Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.
+
+### Setup
+
+1. **Obtain API Key:**
+   - Sign up at https://console.anthropic.com/
+   - Navigate to API Keys section
+   - Generate a new key
+
+2. **Configure Environment:**
+
+   **Option A: Environment Variable**
+   ```bash
+   export ANTHROPIC_API_KEY="sk-ant-api03-..."
+   ```
+
+   **Option B: .env File**
+   ```bash
+   # .env file in project root
+   ANTHROPIC_API_KEY=sk-ant-api03-...
+   ```
+
+3. **Set Model in Code:**
+   ```python
+   from biomni.config import default_config
+
+   # Claude Sonnet 4 (Recommended)
+   default_config.llm = "claude-sonnet-4-20250514"
+
+   # Claude Opus 4 (Most capable)
+   default_config.llm = "claude-opus-4-20250514"
+
+   # Claude 3.5 Sonnet (Previous version)
+   default_config.llm = "claude-3-5-sonnet-20241022"
+   ```
+
+### Available Models
+
+| Model | Context Window | Strengths | Best For |
+|-------|---------------|-----------|----------|
+| `claude-sonnet-4-20250514` | 200K tokens | Balanced performance, cost-effective | Most biomedical tasks |
+| `claude-opus-4-20250514` | 200K tokens | Highest capability, complex reasoning | Difficult multi-step analyses |
+| `claude-3-5-sonnet-20241022` | 200K tokens | Fast, reliable | Standard workflows |
+| `claude-3-opus-20240229` | 200K tokens | Strong reasoning | Legacy support |
+
+### Advanced Configuration
+
+```python
+from biomni.config import default_config
+
+# Use Claude with custom parameters
+default_config.llm = "claude-sonnet-4-20250514"
+default_config.timeout_seconds = 1800
+
+# Optional: Custom API endpoint (for proxy/enterprise)
+default_config.api_base = "https://your-proxy.com/v1"
+```
+
+### Cost Estimation
+
+Approximate costs per 1M tokens (as of January 2025):
+- Input: $3-15 depending on model
+- Output: $15-75 depending on model
+
+For a typical biomedical analysis (~50K tokens total): $0.50-$2.00
+
+## OpenAI
+
+### Overview
+
+OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.
+
+### Setup
+
+1. **Obtain API Key:**
+   - Sign up at https://platform.openai.com/
+   - Navigate to API Keys
+   - Create new secret key
+
+2. **Configure Environment:**
+
+   ```bash
+   export OPENAI_API_KEY="sk-proj-..."
+   ```
+
+   Or in `.env`:
+   ```
+   OPENAI_API_KEY=sk-proj-...
+   ```
+
+3. **Set Model:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "gpt-4o"          # Recommended
+   # default_config.llm = "gpt-4"         # Previous flagship
+   # default_config.llm = "gpt-4-turbo"   # Fast variant
+   # default_config.llm = "gpt-3.5-turbo" # Budget option
+   ```
+
+### Available Models
+
+| Model | Context Window | Strengths | Cost |
+|-------|---------------|-----------|------|
+| `gpt-4o` | 128K tokens | Fast, multimodal | Medium |
+| `gpt-4-turbo` | 128K tokens | Fast inference | Medium |
+| `gpt-4` | 8K tokens | Reliable | High |
+| `gpt-3.5-turbo` | 16K tokens | Fast, cheap | Low |
+
+### Cost Optimization
+
+```python
+# For exploratory analysis (budget-conscious)
+default_config.llm = "gpt-3.5-turbo"
+
+# For production analysis (quality-focused)
+default_config.llm = "gpt-4o"
+```
+
+## Azure OpenAI
+
+### Overview
+
+Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.
+
+### Setup
+
+1. **Azure Prerequisites:**
+   - Active Azure subscription
+   - Azure OpenAI resource created
+   - Model deployment configured
+
+2. **Environment Variables:**
+   ```bash
+   export AZURE_OPENAI_API_KEY="your-key"
+   export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
+   export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
+   ```
+
+3. **Configuration:**
+   ```python
+   from biomni.config import default_config
+
+   # Option 1: Use deployment name
+   default_config.llm = "azure/your-deployment-name"
+
+   # Option 2: Specify endpoint explicitly
+   default_config.llm = "azure/gpt-4"
+   default_config.api_base = "https://your-resource.openai.azure.com/"
+   ```
+
+### Deployment Setup
+
+Azure OpenAI requires explicit model deployments:
+
+1. Navigate to Azure OpenAI Studio
+2. Create deployment for desired model (e.g., GPT-4)
+3. Note the deployment name
+4. Use deployment name in Biomni configuration
+
+### Example Configuration
+
+```python
+from biomni.config import default_config
+import os
+
+# Set Azure credentials
+os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
+os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'
+
+# Configure Biomni to use Azure deployment
+default_config.llm = "azure/gpt-4-biomni"  # Your deployment name
+default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
+```
+
+## Google Gemini
+
+### Overview
+
+Google's Gemini models offer multimodal capabilities and competitive performance.
+
+### Setup
+
+1. **Obtain API Key:**
+   - Visit https://makersuite.google.com/app/apikey
+   - Create new API key
+
+2. **Environment Configuration:**
+   ```bash
+   export GEMINI_API_KEY="your-key"
+   ```
+
+3. **Set Model:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "gemini/gemini-1.5-pro"
+   # Or: default_config.llm = "gemini/gemini-pro"
+   ```
+
+### Available Models
+
+| Model | Context Window | Strengths |
+|-------|---------------|-----------|
+| `gemini/gemini-1.5-pro` | 1M tokens | Very large context, multimodal |
+| `gemini/gemini-pro` | 32K tokens | Balanced performance |
+
+### Use Cases
+
+Gemini excels at:
+- Tasks requiring very large context windows
+- Multimodal analysis (when incorporating images)
+- Cost-effective alternative to GPT-4
+
+```python
+# For tasks with large context requirements
+default_config.llm = "gemini/gemini-1.5-pro"
+default_config.timeout_seconds = 2400  # May need longer timeout
+```
+
+## Groq
+
+### Overview
+
+Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.
+
+### Setup
+
+1. **Get API Key:**
+   - Sign up at https://console.groq.com/
+   - Generate API key
+
+2. **Configure:**
+   ```bash
+   export GROQ_API_KEY="gsk_..."
+   ```
+
+3. **Set Model:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "groq/llama-3.1-70b-versatile"
+   # Or: default_config.llm = "groq/mixtral-8x7b-32768"
+   ```
+
+### Available Models
+
+| Model | Context Window | Speed | Quality |
+|-------|---------------|-------|---------|
+| `groq/llama-3.1-70b-versatile` | 32K tokens | Very Fast | Good |
+| `groq/mixtral-8x7b-32768` | 32K tokens | Very Fast | Good |
+| `groq/llama-3-70b-8192` | 8K tokens | Ultra Fast | Moderate |
+
+### Best Practices
+
+```python
+# For rapid prototyping and testing
+default_config.llm = "groq/llama-3.1-70b-versatile"
+default_config.timeout_seconds = 600  # Groq is fast
+
+# Note: Quality may be lower than GPT-4/Claude for complex tasks
+# Recommended for: QC, simple analyses, testing workflows
+```
+
+## Ollama (Local Deployment)
+
+### Overview
+
+Run LLMs entirely locally for offline use, data privacy, or cost savings.
+
+### Setup
+
+1. **Install Ollama:**
+   ```bash
+   # macOS/Linux
+   curl -fsSL https://ollama.com/install.sh | sh
+
+   # Or download from https://ollama.com/download
+   ```
+
+2. **Pull Models:**
+   ```bash
+   ollama pull llama3       # Meta Llama 3 (8B)
+   ollama pull mixtral      # Mixtral (47B)
+   ollama pull codellama    # Code-specialized
+   ollama pull medllama     # Medical domain (if available)
+   ```
+
+3. **Start Ollama Server:**
+   ```bash
+   ollama serve  # Runs on http://localhost:11434
+   ```
+
+4. **Configure Biomni:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "ollama/llama3"
+   default_config.api_base = "http://localhost:11434"
+   ```
+
+### Hardware Requirements
+
+Minimum recommendations:
+- **8B models:** 16GB RAM, CPU inference acceptable
+- **70B models:** 64GB RAM, GPU highly recommended
+- **Storage:** 5-50GB per model
+
+### Model Selection
+
+```python
+# Fast, local, good for testing
+default_config.llm = "ollama/llama3"
+
+# Better quality (requires more resources)
+default_config.llm = "ollama/mixtral"
+
+# Code generation tasks
+default_config.llm = "ollama/codellama"
+```
+
+### Advantages & Limitations
+
+**Advantages:**
+- Complete data privacy
+- No API costs
+- Offline operation
+- Unlimited usage
+
+**Limitations:**
+- Lower quality than GPT-4/Claude for complex tasks
+- Requires significant hardware
+- Slower inference (especially on CPU)
+- May struggle with specialized biomedical knowledge
+
+## AWS Bedrock
+
+### Overview
+
+AWS-managed LLM service offering multiple model providers.
+
+### Setup
+
+1. **AWS Prerequisites:**
+   - AWS account with Bedrock access
+   - Model access enabled in Bedrock console
+   - AWS credentials configured
+
+2. **Configure AWS Credentials:**
+   ```bash
+   # Option 1: AWS CLI
+   aws configure
+
+   # Option 2: Environment variables
+   export AWS_ACCESS_KEY_ID="your-key"
+   export AWS_SECRET_ACCESS_KEY="your-secret"
+   export AWS_REGION="us-east-1"
+   ```
+
+3. **Enable Model Access:**
+   - Navigate to AWS Bedrock console
+   - Request access to desired models
+   - Wait for approval (may take hours/days)
+
+4. **Configure Biomni:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "bedrock/anthropic.claude-3-sonnet"
+   # Or: default_config.llm = "bedrock/anthropic.claude-v2"
+   ```
+
+### Available Models
+
+Bedrock provides access to:
+- Anthropic Claude models
+- Amazon Titan models
+- AI21 Jurassic models
+- Cohere Command models
+- Meta Llama models
+
+### IAM Permissions
+
+Required IAM policy:
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "bedrock:InvokeModel",
+        "bedrock:InvokeModelWithResponseStream"
+      ],
+      "Resource": "arn:aws:bedrock:*::foundation-model/*"
+    }
+  ]
+}
+```
+
+### Example Configuration
+
+```python
+from biomni.config import default_config
+import boto3
+
+# Verify AWS credentials
+session = boto3.Session()
+credentials = session.get_credentials()
+print(f"AWS Access Key: {credentials.access_key[:8]}...")
+
+# Configure Biomni
+default_config.llm = "bedrock/anthropic.claude-3-sonnet"
+default_config.timeout_seconds = 1800
+```
+
+## Biomni-R0 (Local Specialized Model)
+
+### Overview
+
+Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.
+
+### Setup
+
+1. **Hardware Requirements:**
+   - GPU with 48GB+ VRAM (e.g., A100, H100)
+   - Or multi-GPU setup (2x 24GB)
+   - 100GB+ storage for model weights
+
+2. **Install Dependencies:**
+   ```bash
+   pip install "sglang[all]"
+   pip install flashinfer  # Optional but recommended
+   ```
+
+3. **Deploy Model:**
+   ```bash
+   python -m sglang.launch_server \
+       --model-path snap-stanford/biomni-r0 \
+       --host 0.0.0.0 \
+       --port 30000 \
+       --trust-remote-code \
+       --mem-fraction-static 0.8
+   ```
+
+   For multi-GPU:
+   ```bash
+   python -m sglang.launch_server \
+       --model-path snap-stanford/biomni-r0 \
+       --host 0.0.0.0 \
+       --port 30000 \
+       --trust-remote-code \
+       --tp 2  # Tensor parallelism across 2 GPUs
+   ```
+
+4. **Configure Biomni:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "openai/biomni-r0"
+   default_config.api_base = "http://localhost:30000/v1"
+   default_config.timeout_seconds = 2400  # Longer for complex reasoning
+   ```
+
+### When to Use Biomni-R0
+
+Biomni-R0 excels at:
+- Multi-step biological reasoning
+- Complex experimental design
+- Hypothesis generation and evaluation
+- Literature-informed analysis
+- Tasks requiring deep biological knowledge
+
+```python
+# For complex biological reasoning tasks
+default_config.llm = "openai/biomni-r0"
+
+agent.go("""
+Design a comprehensive CRISPR screening experiment to identify synthetic
+lethal interactions with TP53 mutations in cancer cells, including:
+1. Rationale and hypothesis
+2. Guide RNA library design strategy
+3. Experimental controls
+4. Statistical analysis plan
+5. Expected outcomes and validation approach
+""")
+```
+
+### Performance Comparison
+
+| Model | Speed | Biological Reasoning | Code Quality | Cost |
+|-------|-------|---------------------|--------------|------|
+| GPT-4 | Fast | Good | Excellent | Medium |
+| Claude Sonnet 4 | Fast | Excellent | Excellent | Medium |
+| Biomni-R0 | Moderate | Outstanding | Good | Free (local) |
+
+## Multi-Provider Strategy
+
+### Intelligent Model Selection
+
+Use different models for different task types:
+
+```python
+from biomni.agent import A1
+from biomni.config import default_config
+
+# Strategy 1: Task-based selection
+def get_agent_for_task(task_complexity):
+    if task_complexity == "simple":
+        default_config.llm = "gpt-3.5-turbo"
+        default_config.timeout_seconds = 300
+    elif task_complexity == "medium":
+        default_config.llm = "claude-sonnet-4-20250514"
+        default_config.timeout_seconds = 1200
+    else:  # complex
+        default_config.llm = "openai/biomni-r0"
+        default_config.timeout_seconds = 2400
+
+    return A1(path='./data')
+
+# Strategy 2: Fallback on failure
+def execute_with_fallback(task):
+    models = [
+        "claude-sonnet-4-20250514",
+        "gpt-4o",
+        "claude-opus-4-20250514"
+    ]
+
+    for model in models:
+        try:
+            default_config.llm = model
+            agent = A1(path='./data')
+            agent.go(task)
+            return
+        except Exception as e:
+            print(f"Failed with {model}: {e}, trying next...")
+
+    raise Exception("All models failed")
+```
+
+### Cost Optimization Strategy
+
+```python
+# Phase 1: Rapid prototyping with cheap models
+default_config.llm = "gpt-3.5-turbo"
+agent.go("Quick exploratory analysis of dataset structure")
+
+# Phase 2: Detailed analysis with high-quality models
+default_config.llm = "claude-sonnet-4-20250514"
+agent.go("Comprehensive differential expression analysis with pathway enrichment")
+
+# Phase 3: Complex reasoning with specialized models
+default_config.llm = "openai/biomni-r0"
+agent.go("Generate biological hypotheses based on multi-omics integration")
+```
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue: "API key not found"**
+- Verify environment variable is set: `echo $ANTHROPIC_API_KEY`
+- Check `.env` file exists and is in correct location
+- Try setting key programmatically: `os.environ['ANTHROPIC_API_KEY'] = 'key'`
+
+**Issue: "Rate limit exceeded"**
+- Implement exponential backoff and retry
+- Upgrade API tier if available
+- Switch to alternative provider temporarily
+
+**Issue: "Model not found"**
+- Verify model identifier is correct
+- Check API key has access to requested model
+- For Azure: ensure deployment exists with exact name
+
+**Issue: "Timeout errors"**
+- Increase `default_config.timeout_seconds`
+- Break complex tasks into smaller steps
+- Consider using faster model for initial phases
+
+**Issue: "Connection refused (Ollama/Biomni-R0)"**
+- Verify local server is running
+- Check port is not blocked by firewall
+- Confirm `api_base` URL is correct
+
+### Testing Configuration
+
+```python
+from biomni.utils import list_available_models, validate_environment
+
+# Check environment setup
+status = validate_environment()
+print("Environment Status:", status)
+
+# List available models based on configured keys
+models = list_available_models()
+print("Available Models:", models)
+
+# Test specific model
+try:
+    from biomni.agent import A1
+    agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+    agent.go("Print 'Configuration successful!'")
+except Exception as e:
+    print(f"Configuration test failed: {e}")
+```
+
+## Best Practices Summary
+
+1. **For most users:** Start with Claude Sonnet 4 or GPT-4o
+2. **For cost sensitivity:** Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production
+3. **For privacy/offline:** Deploy Ollama locally
+4. **For complex reasoning:** Use Biomni-R0 if hardware available
+5. **For enterprise:** Consider Azure OpenAI or AWS Bedrock
+6. **For speed:** Use Groq for rapid iteration
+
+7. **Always:**
+   - Set appropriate timeouts
+   - Implement error handling and retries
+   - Log model and configuration for reproducibility
+   - Test configuration before production use
--- a/scientific-packages/biomni/references/task_examples.md
+++ b/scientific-packages/biomni/references/task_examples.md