skills/claude-scientific-skills

Fork 0

mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-01-26 16:58:56 +08:00

Files

Timothy Kassis 660c8574d0 Add more scientific skills

2025-10-19 14:12:02 -07:00

15 KiB

Raw Blame History

Biomni API Reference

This document provides comprehensive API documentation for the Biomni biomedical AI agent system.

Core Classes

A1 Agent

The primary agent class for executing biomedical research tasks.

Initialization

from biomni.agent import A1

agent = A1(
    path='./data',              # Path to biomedical knowledge base
    llm='claude-sonnet-4-20250514',  # LLM model identifier
    timeout=None,               # Optional timeout in seconds
    verbose=True               # Enable detailed logging
)

Parameters:

path (str, required): Directory path where the biomedical knowledge base is stored or will be downloaded. First-time initialization will download ~11GB of data.
llm (str, optional): LLM model identifier. Defaults to the value in default_config.llm. Supports multiple providers (see LLM Providers section).
timeout (int, optional): Maximum execution time in seconds for agent operations. Overrides default_config.timeout_seconds.
verbose (bool, optional): Enable verbose logging for debugging. Default: True.

Returns: A1 agent instance ready for task execution.

Methods

`go(task_description: str) -> None`

Execute a biomedical research task autonomously.

agent.go("Analyze this scRNA-seq dataset and identify cell types")

Parameters:

task_description (str, required): Natural language description of the biomedical task to execute. Be specific about:
- Data location and format
- Desired analysis or output
- Any specific methods or parameters
- Expected results format

Behavior:

Decomposes the task into executable steps
Retrieves relevant biomedical knowledge from the data lake
Generates and executes Python/R code
Provides results and visualizations
Handles errors and retries with refinement

Notes:

Executes code with system privileges - use in sandboxed environments
Long-running tasks may require timeout adjustments
Intermediate results are displayed during execution

`save_conversation_history(output_path: str, format: str = 'pdf') -> None`

Export conversation history and execution trace as a formatted report.

agent.save_conversation_history(
    output_path='./reports/analysis_log.pdf',
    format='pdf'
)

Parameters:

output_path (str, required): File path for the output report
format (str, optional): Output format. Options: 'pdf', 'markdown'. Default: 'pdf'

Requirements:

For PDF: Install one of: WeasyPrint, markdown2pdf, or Pandoc

pip install weasyprint  # Recommended
# or
pip install markdown2pdf
# or install Pandoc system-wide

Report Contents:

Task description and parameters
Retrieved biomedical knowledge
Generated code with execution traces
Results, visualizations, and outputs
Timestamps and execution metadata

`add_mcp(config_path: str) -> None`

Add Model Context Protocol (MCP) tools to extend agent capabilities.

agent.add_mcp(config_path='./mcp_tools_config.json')

Parameters:

config_path (str, required): Path to MCP configuration JSON file

MCP Configuration Format:

{
  "tools": [
    {
      "name": "tool_name",
      "endpoint": "http://localhost:8000/tool",
      "description": "Tool description for LLM",
      "parameters": {
        "param1": "string",
        "param2": "integer"
      }
    }
  ]
}

Use Cases:

Connect to laboratory information systems
Integrate proprietary databases
Access specialized computational resources
Link to institutional data repositories

Configuration

default_config

Global configuration object for Biomni settings.

from biomni.config import default_config

Attributes

`llm: str`

Default LLM model identifier for all agent instances.

default_config.llm = "claude-sonnet-4-20250514"

Supported Models:

Anthropic:

claude-sonnet-4-20250514 (Recommended)
claude-opus-4-20250514
claude-3-5-sonnet-20241022
claude-3-opus-20240229

OpenAI:

gpt-4o
gpt-4
gpt-4-turbo
gpt-3.5-turbo

Azure OpenAI:

azure/gpt-4
azure/<deployment-name>

Google Gemini:

gemini/gemini-pro
gemini/gemini-1.5-pro

Groq:

groq/llama-3.1-70b-versatile
groq/mixtral-8x7b-32768

Ollama (Local):

ollama/llama3
ollama/mistral
ollama/<model-name>

AWS Bedrock:

bedrock/anthropic.claude-v2
bedrock/anthropic.claude-3-sonnet

Custom/Biomni-R0:

openai/biomni-r0 (requires local SGLang deployment)

`timeout_seconds: int`

Default timeout for agent operations in seconds.

default_config.timeout_seconds = 1200  # 20 minutes

Recommended Values:

Simple tasks (QC, basic analysis): 300-600 seconds
Medium tasks (differential expression, clustering): 600-1200 seconds
Complex tasks (full pipelines, ML models): 1200-3600 seconds
Very complex tasks: 3600+ seconds

`data_path: str`

Default path to biomedical knowledge base.

default_config.data_path = "/path/to/biomni/data"

Storage Requirements:

Initial download: ~11GB
Extracted size: ~15GB
Additional working space: ~5-10GB recommended

`api_base: str`

Custom API endpoint for LLM providers (advanced usage).

# For local Biomni-R0 deployment
default_config.api_base = "http://localhost:30000/v1"

# For custom OpenAI-compatible endpoints
default_config.api_base = "https://your-endpoint.com/v1"

`max_retries: int`

Number of retry attempts for failed operations.

default_config.max_retries = 3

Methods

`reset() -> None`

Reset all configuration values to system defaults.

default_config.reset()

Database Query System

Biomni includes a retrieval-augmented generation (RAG) system for querying the biomedical knowledge base.

Query Functions

`query_genes(query: str, top_k: int = 10) -> List[Dict]`

Query gene information from integrated databases.

from biomni.database import query_genes

results = query_genes(
    query="genes involved in p53 pathway",
    top_k=20
)

Parameters:

query (str): Natural language or gene identifier query
top_k (int): Number of results to return

Returns: List of dictionaries containing:

gene_symbol: Official gene symbol
gene_name: Full gene name
description: Functional description
pathways: Associated biological pathways
go_terms: Gene Ontology annotations
diseases: Associated diseases
similarity_score: Relevance score (0-1)

`query_proteins(query: str, top_k: int = 10) -> List[Dict]`

Query protein information from UniProt and other sources.

from biomni.database import query_proteins

results = query_proteins(
    query="kinase proteins in cell cycle",
    top_k=15
)

Returns: List of dictionaries with protein metadata:

uniprot_id: UniProt accession
protein_name: Protein name
function: Functional annotation
domains: Protein domains
subcellular_location: Cellular localization
similarity_score: Relevance score

`query_drugs(query: str, top_k: int = 10) -> List[Dict]`

Query drug and compound information.

from biomni.database import query_drugs

results = query_drugs(
    query="FDA approved cancer drugs targeting EGFR",
    top_k=10
)

Returns: Drug information including:

drug_name: Common name
drugbank_id: DrugBank identifier
indication: Therapeutic indication
mechanism: Mechanism of action
targets: Molecular targets
approval_status: Regulatory status
smiles: Chemical structure (SMILES notation)

`query_diseases(query: str, top_k: int = 10) -> List[Dict]`

Query disease information from clinical databases.

from biomni.database import query_diseases

results = query_diseases(
    query="autoimmune diseases affecting joints",
    top_k=10
)

Returns: Disease data:

disease_name: Standard disease name
disease_id: Ontology identifier
symptoms: Clinical manifestations
associated_genes: Genetic associations
prevalence: Epidemiological data

`query_pathways(query: str, top_k: int = 10) -> List[Dict]`

Query biological pathways from KEGG, Reactome, and other sources.

from biomni.database import query_pathways

results = query_pathways(
    query="immune response signaling pathways",
    top_k=15
)

Returns: Pathway information:

pathway_name: Pathway name
pathway_id: Database identifier
genes: Genes in pathway
description: Functional description
source: Database source (KEGG, Reactome, etc.)

Data Structures

TaskResult

Result object returned by complex agent operations.

class TaskResult:
    success: bool           # Whether task completed successfully
    output: Any            # Task output (varies by task)
    code: str             # Generated code
    execution_time: float # Execution time in seconds
    error: Optional[str]  # Error message if failed
    metadata: Dict        # Additional metadata

BiomedicalEntity

Base class for biomedical entities in the knowledge base.

class BiomedicalEntity:
    entity_id: str        # Unique identifier
    entity_type: str      # Type (gene, protein, drug, etc.)
    name: str            # Entity name
    description: str     # Description
    attributes: Dict     # Additional attributes
    references: List[str] # Literature references

Utility Functions

`download_data(path: str, force: bool = False) -> None`

Manually download or update the biomedical knowledge base.

from biomni.utils import download_data

download_data(
    path='./data',
    force=True  # Force re-download
)

`validate_environment() -> Dict[str, bool]`

Check if the environment is properly configured.

from biomni.utils import validate_environment

status = validate_environment()
# Returns: {
#   'conda_env': True,
#   'api_keys': True,
#   'data_available': True,
#   'dependencies': True
# }

`list_available_models() -> List[str]`

Get a list of available LLM models based on configured API keys.

from biomni.utils import list_available_models

models = list_available_models()
# Returns: ['claude-sonnet-4-20250514', 'gpt-4o', ...]

Error Handling

Common Exceptions

`BiomniConfigError`

Raised when configuration is invalid or incomplete.

from biomni.exceptions import BiomniConfigError

try:
    agent = A1(path='./data')
except BiomniConfigError as e:
    print(f"Configuration error: {e}")

`BiomniExecutionError`

Raised when code generation or execution fails.

from biomni.exceptions import BiomniExecutionError

try:
    agent.go("invalid task")
except BiomniExecutionError as e:
    print(f"Execution failed: {e}")
    # Access failed code: e.code
    # Access error details: e.details

`BiomniDataError`

Raised when knowledge base or data access fails.

from biomni.exceptions import BiomniDataError

try:
    results = query_genes("unknown query format")
except BiomniDataError as e:
    print(f"Data access error: {e}")

`BiomniTimeoutError`

Raised when operations exceed timeout limit.

from biomni.exceptions import BiomniTimeoutError

try:
    agent.go("very complex long-running task")
except BiomniTimeoutError as e:
    print(f"Task timed out after {e.duration} seconds")
    # Partial results may be available: e.partial_results

Best Practices

Efficient Knowledge Retrieval

Pre-query databases for relevant context before complex tasks:

from biomni.database import query_genes, query_pathways

# Gather relevant biological context first
genes = query_genes("cell cycle genes", top_k=50)
pathways = query_pathways("cell cycle regulation", top_k=20)

# Then execute task with enriched context
agent.go(f"""
Analyze the cell cycle progression in this dataset.
Focus on these genes: {[g['gene_symbol'] for g in genes]}
Consider these pathways: {[p['pathway_name'] for p in pathways]}
""")

Error Recovery

Implement robust error handling for production workflows:

from biomni.exceptions import BiomniExecutionError, BiomniTimeoutError

max_attempts = 3
for attempt in range(max_attempts):
    try:
        agent.go("complex biomedical task")
        break
    except BiomniTimeoutError:
        # Increase timeout and retry
        default_config.timeout_seconds *= 2
        print(f"Timeout, retrying with {default_config.timeout_seconds}s timeout")
    except BiomniExecutionError as e:
        # Refine task based on error
        print(f"Execution failed: {e}, refining task...")
        # Optionally modify task description
    else:
        print("Task failed after max attempts")

Memory Management

For large-scale analyses, manage memory explicitly:

import gc

# Process datasets in chunks
for chunk_id in range(num_chunks):
    agent.go(f"Process data chunk {chunk_id} located at data/chunk_{chunk_id}.h5ad")

    # Force garbage collection between chunks
    gc.collect()

    # Save intermediate results
    agent.save_conversation_history(f"./reports/chunk_{chunk_id}.pdf")

Reproducibility

Ensure reproducible analyses by:

Fixing random seeds:

agent.go("Set random seed to 42 for all analyses, then perform clustering...")

Logging configuration:

import json
config_log = {
    'llm': default_config.llm,
    'timeout': default_config.timeout_seconds,
    'data_path': default_config.data_path,
    'timestamp': datetime.now().isoformat()
}
with open('config_log.json', 'w') as f:
    json.dump(config_log, f, indent=2)

Saving execution traces:

# Always save detailed reports
agent.save_conversation_history('./reports/full_analysis.pdf')

Performance Optimization

Model Selection Strategy

Choose models based on task characteristics:

# For exploratory, simple tasks
default_config.llm = "gpt-3.5-turbo"  # Fast, cost-effective

# For standard biomedical analyses
default_config.llm = "claude-sonnet-4-20250514"  # Recommended

# For complex reasoning and hypothesis generation
default_config.llm = "claude-opus-4-20250514"  # Highest quality

# For specialized biological reasoning
default_config.llm = "openai/biomni-r0"  # Requires local deployment

Timeout Tuning

Set appropriate timeouts based on task complexity:

# Quick queries and simple analyses
agent = A1(path='./data', timeout=300)

# Standard workflows
agent = A1(path='./data', timeout=1200)

# Full pipelines with ML training
agent = A1(path='./data', timeout=3600)

Caching and Reuse

Reuse agent instances for multiple related tasks:

# Create agent once
agent = A1(path='./data', llm='claude-sonnet-4-20250514')

# Execute multiple related tasks
tasks = [
    "Load and QC the scRNA-seq dataset",
    "Perform clustering with resolution 0.5",
    "Identify marker genes for each cluster",
    "Annotate cell types based on markers"
]

for task in tasks:
    agent.go(task)

# Save complete workflow
agent.save_conversation_history('./reports/full_workflow.pdf')

15 KiB Raw Blame History

Biomni API Reference

Core Classes

A1 Agent

Initialization

Methods

go(task_description: str) -> None

save_conversation_history(output_path: str, format: str = 'pdf') -> None

add_mcp(config_path: str) -> None

Configuration

default_config

Attributes

llm: str

timeout_seconds: int

data_path: str

api_base: str

max_retries: int

Methods

reset() -> None

Database Query System

Query Functions

query_genes(query: str, top_k: int = 10) -> List[Dict]

query_proteins(query: str, top_k: int = 10) -> List[Dict]

query_drugs(query: str, top_k: int = 10) -> List[Dict]

query_diseases(query: str, top_k: int = 10) -> List[Dict]

query_pathways(query: str, top_k: int = 10) -> List[Dict]

Data Structures

TaskResult

BiomedicalEntity

Utility Functions

download_data(path: str, force: bool = False) -> None

validate_environment() -> Dict[str, bool]

list_available_models() -> List[str]

Error Handling

Common Exceptions

BiomniConfigError

BiomniExecutionError

BiomniDataError

BiomniTimeoutError

Best Practices

Efficient Knowledge Retrieval

Error Recovery

Memory Management

Reproducibility

Performance Optimization

Model Selection Strategy

Timeout Tuning

Caching and Reuse

15 KiB

Raw Blame History

`go(task_description: str) -> None`

`save_conversation_history(output_path: str, format: str = 'pdf') -> None`

`add_mcp(config_path: str) -> None`

`llm: str`

`timeout_seconds: int`

`data_path: str`

`api_base: str`

`max_retries: int`

`reset() -> None`

`query_genes(query: str, top_k: int = 10) -> List[Dict]`

`query_proteins(query: str, top_k: int = 10) -> List[Dict]`

`query_drugs(query: str, top_k: int = 10) -> List[Dict]`

`query_diseases(query: str, top_k: int = 10) -> List[Dict]`

`query_pathways(query: str, top_k: int = 10) -> List[Dict]`

`download_data(path: str, force: bool = False) -> None`

`validate_environment() -> Dict[str, bool]`

`list_available_models() -> List[str]`

`BiomniConfigError`

`BiomniExecutionError`

`BiomniDataError`

`BiomniTimeoutError`