claude-scientific-skills/scientific-packages/biomni/SKILL.md

---
name: biomni
description: General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning.
---

# Biomni

## Overview

Biomni is a general-purpose biomedical AI agent that autonomously executes research tasks across diverse biomedical subfields. It combines large language model reasoning with retrieval-augmented planning and code-based execution to enhance scientific productivity and hypothesis generation. The system operates with an ~11GB biomedical knowledge base covering molecular, genomic, and clinical domains.

## Quick Start

Initialize and use the Biomni agent with these basic steps:

```python
from biomni.agent import A1

# Initialize agent with data path and LLM model
agent = A1(path='./data', llm='claude-sonnet-4-20250514')

# Execute a biomedical research task
agent.go("Your biomedical task description")
```

The agent will autonomously decompose the task, retrieve relevant biomedical knowledge, generate and execute code, and provide results.

## Installation and Setup

### Environment Preparation

1. **Set up the conda environment:**
   - Follow instructions in `biomni_env/README.md` from the repository
   - Activate the environment: `conda activate biomni_e1`

2. **Install the package:**
   ```bash
   pip install biomni --upgrade
   ```

   Or install from source:
   ```bash
   git clone https://github.com/snap-stanford/biomni.git
   cd biomni
   pip install -e .
   ```

3. **Configure API keys:**

   Set up credentials via environment variables or `.env` file:
   ```bash
   export ANTHROPIC_API_KEY="your-key-here"
   export OPENAI_API_KEY="your-key-here"  # Optional
   ```

4. **Data initialization:**

   On first use, the agent will automatically download the ~11GB biomedical knowledge base.

### LLM Provider Configuration

Biomni supports multiple LLM providers. Configure the default provider using:

```python
from biomni.config import default_config

# Set the default LLM model
default_config.llm = "claude-sonnet-4-20250514"  # Anthropic
# default_config.llm = "gpt-4"  # OpenAI
# default_config.llm = "azure/gpt-4"  # Azure OpenAI
# default_config.llm = "gemini/gemini-pro"  # Google Gemini

# Set timeout (optional)
default_config.timeout_seconds = 1200

# Set data path (optional)
default_config.data_path = "./custom/data/path"
```

Refer to `references/llm_providers.md` for detailed configuration options for each provider.

## Core Biomedical Research Tasks

### 1. CRISPR Screening and Design

Execute CRISPR screening tasks including guide RNA design, off-target analysis, and screening experiment planning:

```python
agent.go("Design a CRISPR screening experiment to identify genes involved in cancer cell resistance to drug X")
```

The agent will:
- Retrieve relevant gene databases
- Design guide RNAs with specificity analysis
- Plan experimental controls and readout strategies
- Generate analysis code for screening results

### 2. Single-Cell RNA-seq Analysis

Perform comprehensive scRNA-seq analysis workflows:

```python
agent.go("Analyze this 10X Genomics scRNA-seq dataset, identify cell types, and find differentially expressed genes between clusters")
```

Capabilities include:
- Quality control and preprocessing
- Dimensionality reduction and clustering
- Cell type annotation using marker databases
- Differential expression analysis
- Pathway enrichment analysis

### 3. Molecular Property Prediction (ADMET)

Predict absorption, distribution, metabolism, excretion, and toxicity properties:

```python
agent.go("Predict ADMET properties for these drug candidates: [SMILES strings]")
```

The agent handles:
- Molecular descriptor calculation
- Property prediction using integrated models
- Toxicity screening
- Drug-likeness assessment

### 4. Genomic Analysis

Execute genomic data analysis tasks:

```python
agent.go("Perform GWAS analysis to identify SNPs associated with disease phenotype in this cohort")
```

Supports:
- Genome-wide association studies (GWAS)
- Variant calling and annotation
- Population genetics analysis
- Functional genomics integration

### 5. Protein Structure and Function

Analyze protein sequences and structures:

```python
agent.go("Predict the structure of this protein sequence and identify potential binding sites")
```

Capabilities:
- Sequence analysis and domain identification
- Structure prediction integration
- Binding site prediction
- Protein-protein interaction analysis

### 6. Disease Diagnosis and Classification

Perform disease classification from multi-omics data:

```python
agent.go("Build a classifier to diagnose disease X from patient RNA-seq and clinical data")
```

### 7. Systems Biology and Pathway Analysis

Analyze biological pathways and networks:

```python
agent.go("Identify dysregulated pathways in this differential expression dataset")
```

### 8. Drug Discovery and Repurposing

Support drug discovery workflows:

```python
agent.go("Identify FDA-approved drugs that could be repurposed for treating disease Y based on mechanism of action")
```

## Advanced Features

### Custom Configuration per Agent

Override global configuration for specific agent instances:

```python
agent = A1(
    path='./project_data',
    llm='gpt-4o',
    timeout=1800
)
```

### Conversation History and Reporting

Save execution traces as formatted PDF reports:

```python
# After executing tasks
agent.save_conversation_history(
    output_path='./reports/experiment_log.pdf',
    format='pdf'
)
```

Requires one of: WeasyPrint, markdown2pdf, or Pandoc.

### Model Context Protocol (MCP) Integration

Extend agent capabilities with external tools:

```python
# Add MCP-compatible tools
agent.add_mcp(config_path='./mcp_config.json')
```

MCP enables integration with:
- Laboratory information management systems (LIMS)
- Specialized bioinformatics databases
- Custom analysis pipelines
- External computational resources

### Using Biomni-R0 (Specialized Reasoning Model)

Deploy the 32B parameter Biomni-R0 model for enhanced biological reasoning:

```bash
# Install SGLang
pip install "sglang[all]"

# Deploy Biomni-R0
python -m sglang.launch_server \
    --model-path snap-stanford/biomni-r0 \
    --port 30000 \
    --trust-remote-code
```

Then configure the agent:

```python
from biomni.config import default_config

default_config.llm = "openai/biomni-r0"
default_config.api_base = "http://localhost:30000/v1"
```

Biomni-R0 provides specialized reasoning for:
- Complex multi-step biological workflows
- Hypothesis generation and evaluation
- Experimental design optimization
- Literature-informed analysis

## Best Practices

### Task Specification

Provide clear, specific task descriptions:

✅ **Good:** "Analyze this scRNA-seq dataset (file: data.h5ad) to identify T cell subtypes, then perform differential expression analysis comparing activated vs. resting T cells"

❌ **Vague:** "Analyze my RNA-seq data"

### Data Organization

Structure data directories for efficient retrieval:

```
project/
├── data/              # Biomni knowledge base
├── raw_data/          # Your experimental data
├── results/           # Analysis outputs
└── reports/           # Generated reports
```

### Iterative Refinement

Use iterative task execution for complex analyses:

```python
# Step 1: Exploratory analysis
agent.go("Load and perform initial QC on the proteomics dataset")

# Step 2: Based on results, refine analysis
agent.go("Based on the QC results, remove low-quality samples and normalize using method X")

# Step 3: Downstream analysis
agent.go("Perform differential abundance analysis with adjusted parameters")
```

### Security Considerations

**CRITICAL:** Biomni executes LLM-generated code with full system privileges. For production use:

1. **Use sandboxed environments:** Deploy in Docker containers or VMs with restricted permissions
2. **Validate sensitive operations:** Review code before execution for file access, network calls, or credential usage
3. **Limit data access:** Restrict agent access to only necessary data directories
4. **Monitor execution:** Log all executed code for audit trails

Never run Biomni with:
- Unrestricted file system access
- Direct access to sensitive credentials
- Network access to production systems
- Elevated system privileges

### Model Selection Guidelines

Choose models based on task complexity:

- **Claude Sonnet 4:** Recommended for most biomedical tasks, excellent biological reasoning
- **GPT-4/GPT-4o:** Strong general capabilities, good for diverse tasks
- **Biomni-R0:** Specialized for complex biological reasoning, multi-step workflows
- **Smaller models:** Use for simple, well-defined tasks to reduce cost

## Evaluation and Benchmarking

Biomni-Eval1 benchmark contains 433 evaluation instances across 10 biological tasks:

- GWAS analysis
- Disease diagnosis
- Gene detection and classification
- Molecular property prediction
- Pathway analysis
- Protein function prediction
- Drug response prediction
- Variant interpretation
- Cell type annotation
- Biomarker discovery

Use the benchmark to:
- Evaluate custom agent configurations
- Compare LLM providers for specific tasks
- Validate analysis pipelines

## Troubleshooting

### Common Issues

**Issue:** Data download fails or times out
**Solution:** Manually download the knowledge base or increase timeout settings

**Issue:** Package dependency conflicts
**Solution:** Some optional dependencies cannot be installed by default due to conflicts. Install specific packages manually and uncomment relevant code sections as documented in the repository

**Issue:** LLM API errors
**Solution:** Verify API key configuration, check rate limits, ensure sufficient credits

**Issue:** Memory errors with large datasets
**Solution:** Process data in chunks, use data subsampling, or deploy on higher-memory instances

### Getting Help

For detailed troubleshooting:
- Review the Biomni GitHub repository issues
- Check `references/api_reference.md` for detailed API documentation
- Consult `references/task_examples.md` for comprehensive task patterns

## Resources

### references/
Detailed reference documentation for advanced usage:

- **api_reference.md:** Complete API documentation for A1 agent, configuration objects, and utility functions
- **llm_providers.md:** Comprehensive guide for configuring all supported LLM providers (Anthropic, OpenAI, Azure, Gemini, Groq, Ollama, AWS Bedrock)
- **task_examples.md:** Extensive collection of biomedical task examples with code patterns

### scripts/
Helper scripts for common operations:

- **setup_environment.py:** Automated environment setup and validation
- **generate_report.py:** Enhanced PDF report generation with custom formatting

Load reference documentation as needed:
```python
# Claude can read reference files when needed for detailed information
# Example: "Check references/llm_providers.md for Azure OpenAI configuration"
```