mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-01-26 16:58:56 +08:00

Files

Timothy Kassis 660c8574d0 Add more scientific skills

2025-10-19 14:12:02 -07:00

11 KiB

Raw Blame History

name, description

name	description
biomni	General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning.

Biomni

Overview

Biomni is a general-purpose biomedical AI agent that autonomously executes research tasks across diverse biomedical subfields. It combines large language model reasoning with retrieval-augmented planning and code-based execution to enhance scientific productivity and hypothesis generation. The system operates with an ~11GB biomedical knowledge base covering molecular, genomic, and clinical domains.

Quick Start

Initialize and use the Biomni agent with these basic steps:

from biomni.agent import A1

# Initialize agent with data path and LLM model
agent = A1(path='./data', llm='claude-sonnet-4-20250514')

# Execute a biomedical research task
agent.go("Your biomedical task description")

The agent will autonomously decompose the task, retrieve relevant biomedical knowledge, generate and execute code, and provide results.

Installation and Setup

Environment Preparation

Set up the conda environment:
- Follow instructions in biomni_env/README.md from the repository
- Activate the environment: conda activate biomni_e1

Install the package:

pip install biomni --upgrade

Or install from source:

git clone https://github.com/snap-stanford/biomni.git
cd biomni
pip install -e .

Configure API keys:

Set up credentials via environment variables or .env file:

export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"  # Optional

Data initialization:

On first use, the agent will automatically download the ~11GB biomedical knowledge base.

LLM Provider Configuration

Biomni supports multiple LLM providers. Configure the default provider using:

from biomni.config import default_config

# Set the default LLM model
default_config.llm = "claude-sonnet-4-20250514"  # Anthropic
# default_config.llm = "gpt-4"  # OpenAI
# default_config.llm = "azure/gpt-4"  # Azure OpenAI
# default_config.llm = "gemini/gemini-pro"  # Google Gemini

# Set timeout (optional)
default_config.timeout_seconds = 1200

# Set data path (optional)
default_config.data_path = "./custom/data/path"

Refer to references/llm_providers.md for detailed configuration options for each provider.

Core Biomedical Research Tasks

1. CRISPR Screening and Design

Execute CRISPR screening tasks including guide RNA design, off-target analysis, and screening experiment planning:

agent.go("Design a CRISPR screening experiment to identify genes involved in cancer cell resistance to drug X")

The agent will:

Retrieve relevant gene databases
Design guide RNAs with specificity analysis
Plan experimental controls and readout strategies
Generate analysis code for screening results

2. Single-Cell RNA-seq Analysis

Perform comprehensive scRNA-seq analysis workflows:

agent.go("Analyze this 10X Genomics scRNA-seq dataset, identify cell types, and find differentially expressed genes between clusters")

Capabilities include:

Quality control and preprocessing
Dimensionality reduction and clustering
Cell type annotation using marker databases
Differential expression analysis
Pathway enrichment analysis

3. Molecular Property Prediction (ADMET)

Predict absorption, distribution, metabolism, excretion, and toxicity properties:

agent.go("Predict ADMET properties for these drug candidates: [SMILES strings]")

The agent handles:

Molecular descriptor calculation
Property prediction using integrated models
Toxicity screening
Drug-likeness assessment

4. Genomic Analysis

Execute genomic data analysis tasks:

agent.go("Perform GWAS analysis to identify SNPs associated with disease phenotype in this cohort")

Supports:

Genome-wide association studies (GWAS)
Variant calling and annotation
Population genetics analysis
Functional genomics integration

5. Protein Structure and Function

Analyze protein sequences and structures:

agent.go("Predict the structure of this protein sequence and identify potential binding sites")

Capabilities:

Sequence analysis and domain identification
Structure prediction integration
Binding site prediction
Protein-protein interaction analysis

6. Disease Diagnosis and Classification

Perform disease classification from multi-omics data:

agent.go("Build a classifier to diagnose disease X from patient RNA-seq and clinical data")

7. Systems Biology and Pathway Analysis

Analyze biological pathways and networks:

agent.go("Identify dysregulated pathways in this differential expression dataset")

8. Drug Discovery and Repurposing

Support drug discovery workflows:

agent.go("Identify FDA-approved drugs that could be repurposed for treating disease Y based on mechanism of action")

Advanced Features

Custom Configuration per Agent

Override global configuration for specific agent instances:

agent = A1(
    path='./project_data',
    llm='gpt-4o',
    timeout=1800
)

Conversation History and Reporting

Save execution traces as formatted PDF reports:

# After executing tasks
agent.save_conversation_history(
    output_path='./reports/experiment_log.pdf',
    format='pdf'
)

Requires one of: WeasyPrint, markdown2pdf, or Pandoc.

Model Context Protocol (MCP) Integration

Extend agent capabilities with external tools:

# Add MCP-compatible tools
agent.add_mcp(config_path='./mcp_config.json')

MCP enables integration with:

Laboratory information management systems (LIMS)
Specialized bioinformatics databases
Custom analysis pipelines
External computational resources

Using Biomni-R0 (Specialized Reasoning Model)

Deploy the 32B parameter Biomni-R0 model for enhanced biological reasoning:

# Install SGLang
pip install "sglang[all]"

# Deploy Biomni-R0
python -m sglang.launch_server \
    --model-path snap-stanford/biomni-r0 \
    --port 30000 \
    --trust-remote-code

Then configure the agent:

from biomni.config import default_config

default_config.llm = "openai/biomni-r0"
default_config.api_base = "http://localhost:30000/v1"

Biomni-R0 provides specialized reasoning for:

Complex multi-step biological workflows
Hypothesis generation and evaluation
Experimental design optimization
Literature-informed analysis

Best Practices

Task Specification

Provide clear, specific task descriptions:

✅ Good: "Analyze this scRNA-seq dataset (file: data.h5ad) to identify T cell subtypes, then perform differential expression analysis comparing activated vs. resting T cells"

❌ Vague: "Analyze my RNA-seq data"

Data Organization

Structure data directories for efficient retrieval:

project/
├── data/              # Biomni knowledge base
├── raw_data/          # Your experimental data
├── results/           # Analysis outputs
└── reports/           # Generated reports

Iterative Refinement

Use iterative task execution for complex analyses:

# Step 1: Exploratory analysis
agent.go("Load and perform initial QC on the proteomics dataset")

# Step 2: Based on results, refine analysis
agent.go("Based on the QC results, remove low-quality samples and normalize using method X")

# Step 3: Downstream analysis
agent.go("Perform differential abundance analysis with adjusted parameters")

Security Considerations

CRITICAL: Biomni executes LLM-generated code with full system privileges. For production use:

Use sandboxed environments: Deploy in Docker containers or VMs with restricted permissions
Validate sensitive operations: Review code before execution for file access, network calls, or credential usage
Limit data access: Restrict agent access to only necessary data directories
Monitor execution: Log all executed code for audit trails

Never run Biomni with:

Unrestricted file system access
Direct access to sensitive credentials
Network access to production systems
Elevated system privileges

Model Selection Guidelines

Choose models based on task complexity:

Claude Sonnet 4: Recommended for most biomedical tasks, excellent biological reasoning
GPT-4/GPT-4o: Strong general capabilities, good for diverse tasks
Biomni-R0: Specialized for complex biological reasoning, multi-step workflows
Smaller models: Use for simple, well-defined tasks to reduce cost

Evaluation and Benchmarking

Biomni-Eval1 benchmark contains 433 evaluation instances across 10 biological tasks:

GWAS analysis
Disease diagnosis
Gene detection and classification
Molecular property prediction
Pathway analysis
Protein function prediction
Drug response prediction
Variant interpretation
Cell type annotation
Biomarker discovery

Use the benchmark to:

Evaluate custom agent configurations
Compare LLM providers for specific tasks
Validate analysis pipelines

Troubleshooting

Common Issues

Issue: Data download fails or times out Solution: Manually download the knowledge base or increase timeout settings

Issue: Package dependency conflicts Solution: Some optional dependencies cannot be installed by default due to conflicts. Install specific packages manually and uncomment relevant code sections as documented in the repository

Issue: LLM API errors Solution: Verify API key configuration, check rate limits, ensure sufficient credits

Issue: Memory errors with large datasets Solution: Process data in chunks, use data subsampling, or deploy on higher-memory instances

Getting Help

For detailed troubleshooting:

Review the Biomni GitHub repository issues
Check references/api_reference.md for detailed API documentation
Consult references/task_examples.md for comprehensive task patterns

Resources

references/

Detailed reference documentation for advanced usage:

api_reference.md: Complete API documentation for A1 agent, configuration objects, and utility functions
llm_providers.md: Comprehensive guide for configuring all supported LLM providers (Anthropic, OpenAI, Azure, Gemini, Groq, Ollama, AWS Bedrock)
task_examples.md: Extensive collection of biomedical task examples with code patterns

scripts/

Helper scripts for common operations:

setup_environment.py: Automated environment setup and validation
generate_report.py: Enhanced PDF report generation with custom formatting

Load reference documentation as needed:

# Claude can read reference files when needed for detailed information
# Example: "Check references/llm_providers.md for Azure OpenAI configuration"

11 KiB Raw Blame History