mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
376 lines
11 KiB
Markdown
376 lines
11 KiB
Markdown
---
|
|
name: biomni
|
|
description: General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning.
|
|
---
|
|
|
|
# Biomni
|
|
|
|
## Overview
|
|
|
|
Biomni is a general-purpose biomedical AI agent that autonomously executes research tasks across diverse biomedical subfields. It combines large language model reasoning with retrieval-augmented planning and code-based execution to enhance scientific productivity and hypothesis generation. The system operates with an ~11GB biomedical knowledge base covering molecular, genomic, and clinical domains.
|
|
|
|
## Quick Start
|
|
|
|
Initialize and use the Biomni agent with these basic steps:
|
|
|
|
```python
|
|
from biomni.agent import A1
|
|
|
|
# Initialize agent with data path and LLM model
|
|
agent = A1(path='./data', llm='claude-sonnet-4-20250514')
|
|
|
|
# Execute a biomedical research task
|
|
agent.go("Your biomedical task description")
|
|
```
|
|
|
|
The agent will autonomously decompose the task, retrieve relevant biomedical knowledge, generate and execute code, and provide results.
|
|
|
|
## Installation and Setup
|
|
|
|
### Environment Preparation
|
|
|
|
1. **Set up the conda environment:**
|
|
- Follow instructions in `biomni_env/README.md` from the repository
|
|
- Activate the environment: `conda activate biomni_e1`
|
|
|
|
2. **Install the package:**
|
|
```bash
|
|
pip install biomni --upgrade
|
|
```
|
|
|
|
Or install from source:
|
|
```bash
|
|
git clone https://github.com/snap-stanford/biomni.git
|
|
cd biomni
|
|
pip install -e .
|
|
```
|
|
|
|
3. **Configure API keys:**
|
|
|
|
Set up credentials via environment variables or `.env` file:
|
|
```bash
|
|
export ANTHROPIC_API_KEY="your-key-here"
|
|
export OPENAI_API_KEY="your-key-here" # Optional
|
|
```
|
|
|
|
4. **Data initialization:**
|
|
|
|
On first use, the agent will automatically download the ~11GB biomedical knowledge base.
|
|
|
|
### LLM Provider Configuration
|
|
|
|
Biomni supports multiple LLM providers. Configure the default provider using:
|
|
|
|
```python
|
|
from biomni.config import default_config
|
|
|
|
# Set the default LLM model
|
|
default_config.llm = "claude-sonnet-4-20250514" # Anthropic
|
|
# default_config.llm = "gpt-4" # OpenAI
|
|
# default_config.llm = "azure/gpt-4" # Azure OpenAI
|
|
# default_config.llm = "gemini/gemini-pro" # Google Gemini
|
|
|
|
# Set timeout (optional)
|
|
default_config.timeout_seconds = 1200
|
|
|
|
# Set data path (optional)
|
|
default_config.data_path = "./custom/data/path"
|
|
```
|
|
|
|
Refer to `references/llm_providers.md` for detailed configuration options for each provider.
|
|
|
|
## Core Biomedical Research Tasks
|
|
|
|
### 1. CRISPR Screening and Design
|
|
|
|
Execute CRISPR screening tasks including guide RNA design, off-target analysis, and screening experiment planning:
|
|
|
|
```python
|
|
agent.go("Design a CRISPR screening experiment to identify genes involved in cancer cell resistance to drug X")
|
|
```
|
|
|
|
The agent will:
|
|
- Retrieve relevant gene databases
|
|
- Design guide RNAs with specificity analysis
|
|
- Plan experimental controls and readout strategies
|
|
- Generate analysis code for screening results
|
|
|
|
### 2. Single-Cell RNA-seq Analysis
|
|
|
|
Perform comprehensive scRNA-seq analysis workflows:
|
|
|
|
```python
|
|
agent.go("Analyze this 10X Genomics scRNA-seq dataset, identify cell types, and find differentially expressed genes between clusters")
|
|
```
|
|
|
|
Capabilities include:
|
|
- Quality control and preprocessing
|
|
- Dimensionality reduction and clustering
|
|
- Cell type annotation using marker databases
|
|
- Differential expression analysis
|
|
- Pathway enrichment analysis
|
|
|
|
### 3. Molecular Property Prediction (ADMET)
|
|
|
|
Predict absorption, distribution, metabolism, excretion, and toxicity properties:
|
|
|
|
```python
|
|
agent.go("Predict ADMET properties for these drug candidates: [SMILES strings]")
|
|
```
|
|
|
|
The agent handles:
|
|
- Molecular descriptor calculation
|
|
- Property prediction using integrated models
|
|
- Toxicity screening
|
|
- Drug-likeness assessment
|
|
|
|
### 4. Genomic Analysis
|
|
|
|
Execute genomic data analysis tasks:
|
|
|
|
```python
|
|
agent.go("Perform GWAS analysis to identify SNPs associated with disease phenotype in this cohort")
|
|
```
|
|
|
|
Supports:
|
|
- Genome-wide association studies (GWAS)
|
|
- Variant calling and annotation
|
|
- Population genetics analysis
|
|
- Functional genomics integration
|
|
|
|
### 5. Protein Structure and Function
|
|
|
|
Analyze protein sequences and structures:
|
|
|
|
```python
|
|
agent.go("Predict the structure of this protein sequence and identify potential binding sites")
|
|
```
|
|
|
|
Capabilities:
|
|
- Sequence analysis and domain identification
|
|
- Structure prediction integration
|
|
- Binding site prediction
|
|
- Protein-protein interaction analysis
|
|
|
|
### 6. Disease Diagnosis and Classification
|
|
|
|
Perform disease classification from multi-omics data:
|
|
|
|
```python
|
|
agent.go("Build a classifier to diagnose disease X from patient RNA-seq and clinical data")
|
|
```
|
|
|
|
### 7. Systems Biology and Pathway Analysis
|
|
|
|
Analyze biological pathways and networks:
|
|
|
|
```python
|
|
agent.go("Identify dysregulated pathways in this differential expression dataset")
|
|
```
|
|
|
|
### 8. Drug Discovery and Repurposing
|
|
|
|
Support drug discovery workflows:
|
|
|
|
```python
|
|
agent.go("Identify FDA-approved drugs that could be repurposed for treating disease Y based on mechanism of action")
|
|
```
|
|
|
|
## Advanced Features
|
|
|
|
### Custom Configuration per Agent
|
|
|
|
Override global configuration for specific agent instances:
|
|
|
|
```python
|
|
agent = A1(
|
|
path='./project_data',
|
|
llm='gpt-4o',
|
|
timeout=1800
|
|
)
|
|
```
|
|
|
|
### Conversation History and Reporting
|
|
|
|
Save execution traces as formatted PDF reports:
|
|
|
|
```python
|
|
# After executing tasks
|
|
agent.save_conversation_history(
|
|
output_path='./reports/experiment_log.pdf',
|
|
format='pdf'
|
|
)
|
|
```
|
|
|
|
Requires one of: WeasyPrint, markdown2pdf, or Pandoc.
|
|
|
|
### Model Context Protocol (MCP) Integration
|
|
|
|
Extend agent capabilities with external tools:
|
|
|
|
```python
|
|
# Add MCP-compatible tools
|
|
agent.add_mcp(config_path='./mcp_config.json')
|
|
```
|
|
|
|
MCP enables integration with:
|
|
- Laboratory information management systems (LIMS)
|
|
- Specialized bioinformatics databases
|
|
- Custom analysis pipelines
|
|
- External computational resources
|
|
|
|
### Using Biomni-R0 (Specialized Reasoning Model)
|
|
|
|
Deploy the 32B parameter Biomni-R0 model for enhanced biological reasoning:
|
|
|
|
```bash
|
|
# Install SGLang
|
|
pip install "sglang[all]"
|
|
|
|
# Deploy Biomni-R0
|
|
python -m sglang.launch_server \
|
|
--model-path snap-stanford/biomni-r0 \
|
|
--port 30000 \
|
|
--trust-remote-code
|
|
```
|
|
|
|
Then configure the agent:
|
|
|
|
```python
|
|
from biomni.config import default_config
|
|
|
|
default_config.llm = "openai/biomni-r0"
|
|
default_config.api_base = "http://localhost:30000/v1"
|
|
```
|
|
|
|
Biomni-R0 provides specialized reasoning for:
|
|
- Complex multi-step biological workflows
|
|
- Hypothesis generation and evaluation
|
|
- Experimental design optimization
|
|
- Literature-informed analysis
|
|
|
|
## Best Practices
|
|
|
|
### Task Specification
|
|
|
|
Provide clear, specific task descriptions:
|
|
|
|
✅ **Good:** "Analyze this scRNA-seq dataset (file: data.h5ad) to identify T cell subtypes, then perform differential expression analysis comparing activated vs. resting T cells"
|
|
|
|
❌ **Vague:** "Analyze my RNA-seq data"
|
|
|
|
### Data Organization
|
|
|
|
Structure data directories for efficient retrieval:
|
|
|
|
```
|
|
project/
|
|
├── data/ # Biomni knowledge base
|
|
├── raw_data/ # Your experimental data
|
|
├── results/ # Analysis outputs
|
|
└── reports/ # Generated reports
|
|
```
|
|
|
|
### Iterative Refinement
|
|
|
|
Use iterative task execution for complex analyses:
|
|
|
|
```python
|
|
# Step 1: Exploratory analysis
|
|
agent.go("Load and perform initial QC on the proteomics dataset")
|
|
|
|
# Step 2: Based on results, refine analysis
|
|
agent.go("Based on the QC results, remove low-quality samples and normalize using method X")
|
|
|
|
# Step 3: Downstream analysis
|
|
agent.go("Perform differential abundance analysis with adjusted parameters")
|
|
```
|
|
|
|
### Security Considerations
|
|
|
|
**CRITICAL:** Biomni executes LLM-generated code with full system privileges. For production use:
|
|
|
|
1. **Use sandboxed environments:** Deploy in Docker containers or VMs with restricted permissions
|
|
2. **Validate sensitive operations:** Review code before execution for file access, network calls, or credential usage
|
|
3. **Limit data access:** Restrict agent access to only necessary data directories
|
|
4. **Monitor execution:** Log all executed code for audit trails
|
|
|
|
Never run Biomni with:
|
|
- Unrestricted file system access
|
|
- Direct access to sensitive credentials
|
|
- Network access to production systems
|
|
- Elevated system privileges
|
|
|
|
### Model Selection Guidelines
|
|
|
|
Choose models based on task complexity:
|
|
|
|
- **Claude Sonnet 4:** Recommended for most biomedical tasks, excellent biological reasoning
|
|
- **GPT-4/GPT-4o:** Strong general capabilities, good for diverse tasks
|
|
- **Biomni-R0:** Specialized for complex biological reasoning, multi-step workflows
|
|
- **Smaller models:** Use for simple, well-defined tasks to reduce cost
|
|
|
|
## Evaluation and Benchmarking
|
|
|
|
Biomni-Eval1 benchmark contains 433 evaluation instances across 10 biological tasks:
|
|
|
|
- GWAS analysis
|
|
- Disease diagnosis
|
|
- Gene detection and classification
|
|
- Molecular property prediction
|
|
- Pathway analysis
|
|
- Protein function prediction
|
|
- Drug response prediction
|
|
- Variant interpretation
|
|
- Cell type annotation
|
|
- Biomarker discovery
|
|
|
|
Use the benchmark to:
|
|
- Evaluate custom agent configurations
|
|
- Compare LLM providers for specific tasks
|
|
- Validate analysis pipelines
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Issue:** Data download fails or times out
|
|
**Solution:** Manually download the knowledge base or increase timeout settings
|
|
|
|
**Issue:** Package dependency conflicts
|
|
**Solution:** Some optional dependencies cannot be installed by default due to conflicts. Install specific packages manually and uncomment relevant code sections as documented in the repository
|
|
|
|
**Issue:** LLM API errors
|
|
**Solution:** Verify API key configuration, check rate limits, ensure sufficient credits
|
|
|
|
**Issue:** Memory errors with large datasets
|
|
**Solution:** Process data in chunks, use data subsampling, or deploy on higher-memory instances
|
|
|
|
### Getting Help
|
|
|
|
For detailed troubleshooting:
|
|
- Review the Biomni GitHub repository issues
|
|
- Check `references/api_reference.md` for detailed API documentation
|
|
- Consult `references/task_examples.md` for comprehensive task patterns
|
|
|
|
## Resources
|
|
|
|
### references/
|
|
Detailed reference documentation for advanced usage:
|
|
|
|
- **api_reference.md:** Complete API documentation for A1 agent, configuration objects, and utility functions
|
|
- **llm_providers.md:** Comprehensive guide for configuring all supported LLM providers (Anthropic, OpenAI, Azure, Gemini, Groq, Ollama, AWS Bedrock)
|
|
- **task_examples.md:** Extensive collection of biomedical task examples with code patterns
|
|
|
|
### scripts/
|
|
Helper scripts for common operations:
|
|
|
|
- **setup_environment.py:** Automated environment setup and validation
|
|
- **generate_report.py:** Enhanced PDF report generation with custom formatting
|
|
|
|
Load reference documentation as needed:
|
|
```python
|
|
# Claude can read reference files when needed for detailed information
|
|
# Example: "Check references/llm_providers.md for Azure OpenAI configuration"
|
|
```
|