Add more scientific skills

2026-03-27 07:09:27 +08:00 · 2025-10-19 14:12:02 -07:00
parent 78d5ac2b56
commit 660c8574d0
210 changed files with 88957 additions and 1 deletions
--- a/scientific-packages/biomni/SKILL.md
+++ b/scientific-packages/biomni/SKILL.md
@@ -0,0 +1,375 @@
+---
+name: biomni
+description: General-purpose biomedical AI agent for autonomously executing research tasks across diverse biomedical domains. Use this skill when working with biomedical data analysis, CRISPR screening, single-cell RNA-seq, molecular property prediction, genomics, proteomics, drug discovery, or any computational biology task requiring LLM-powered code generation and retrieval-augmented planning.
+---
+
+# Biomni
+
+## Overview
+
+Biomni is a general-purpose biomedical AI agent that autonomously executes research tasks across diverse biomedical subfields. It combines large language model reasoning with retrieval-augmented planning and code-based execution to enhance scientific productivity and hypothesis generation. The system operates with an ~11GB biomedical knowledge base covering molecular, genomic, and clinical domains.
+
+## Quick Start
+
+Initialize and use the Biomni agent with these basic steps:
+
+```python
+from biomni.agent import A1
+
+# Initialize agent with data path and LLM model
+agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+
+# Execute a biomedical research task
+agent.go("Your biomedical task description")
+```
+
+The agent will autonomously decompose the task, retrieve relevant biomedical knowledge, generate and execute code, and provide results.
+
+## Installation and Setup
+
+### Environment Preparation
+
+1. **Set up the conda environment:**
+   - Follow instructions in `biomni_env/README.md` from the repository
+   - Activate the environment: `conda activate biomni_e1`
+
+2. **Install the package:**
+   ```bash
+   pip install biomni --upgrade
+   ```
+
+   Or install from source:
+   ```bash
+   git clone https://github.com/snap-stanford/biomni.git
+   cd biomni
+   pip install -e .
+   ```
+
+3. **Configure API keys:**
+
+   Set up credentials via environment variables or `.env` file:
+   ```bash
+   export ANTHROPIC_API_KEY="your-key-here"
+   export OPENAI_API_KEY="your-key-here"  # Optional
+   ```
+
+4. **Data initialization:**
+
+   On first use, the agent will automatically download the ~11GB biomedical knowledge base.
+
+### LLM Provider Configuration
+
+Biomni supports multiple LLM providers. Configure the default provider using:
+
+```python
+from biomni.config import default_config
+
+# Set the default LLM model
+default_config.llm = "claude-sonnet-4-20250514"  # Anthropic
+# default_config.llm = "gpt-4"  # OpenAI
+# default_config.llm = "azure/gpt-4"  # Azure OpenAI
+# default_config.llm = "gemini/gemini-pro"  # Google Gemini
+
+# Set timeout (optional)
+default_config.timeout_seconds = 1200
+
+# Set data path (optional)
+default_config.data_path = "./custom/data/path"
+```
+
+Refer to `references/llm_providers.md` for detailed configuration options for each provider.
+
+## Core Biomedical Research Tasks
+
+### 1. CRISPR Screening and Design
+
+Execute CRISPR screening tasks including guide RNA design, off-target analysis, and screening experiment planning:
+
+```python
+agent.go("Design a CRISPR screening experiment to identify genes involved in cancer cell resistance to drug X")
+```
+
+The agent will:
+- Retrieve relevant gene databases
+- Design guide RNAs with specificity analysis
+- Plan experimental controls and readout strategies
+- Generate analysis code for screening results
+
+### 2. Single-Cell RNA-seq Analysis
+
+Perform comprehensive scRNA-seq analysis workflows:
+
+```python
+agent.go("Analyze this 10X Genomics scRNA-seq dataset, identify cell types, and find differentially expressed genes between clusters")
+```
+
+Capabilities include:
+- Quality control and preprocessing
+- Dimensionality reduction and clustering
+- Cell type annotation using marker databases
+- Differential expression analysis
+- Pathway enrichment analysis
+
+### 3. Molecular Property Prediction (ADMET)
+
+Predict absorption, distribution, metabolism, excretion, and toxicity properties:
+
+```python
+agent.go("Predict ADMET properties for these drug candidates: [SMILES strings]")
+```
+
+The agent handles:
+- Molecular descriptor calculation
+- Property prediction using integrated models
+- Toxicity screening
+- Drug-likeness assessment
+
+### 4. Genomic Analysis
+
+Execute genomic data analysis tasks:
+
+```python
+agent.go("Perform GWAS analysis to identify SNPs associated with disease phenotype in this cohort")
+```
+
+Supports:
+- Genome-wide association studies (GWAS)
+- Variant calling and annotation
+- Population genetics analysis
+- Functional genomics integration
+
+### 5. Protein Structure and Function
+
+Analyze protein sequences and structures:
+
+```python
+agent.go("Predict the structure of this protein sequence and identify potential binding sites")
+```
+
+Capabilities:
+- Sequence analysis and domain identification
+- Structure prediction integration
+- Binding site prediction
+- Protein-protein interaction analysis
+
+### 6. Disease Diagnosis and Classification
+
+Perform disease classification from multi-omics data:
+
+```python
+agent.go("Build a classifier to diagnose disease X from patient RNA-seq and clinical data")
+```
+
+### 7. Systems Biology and Pathway Analysis
+
+Analyze biological pathways and networks:
+
+```python
+agent.go("Identify dysregulated pathways in this differential expression dataset")
+```
+
+### 8. Drug Discovery and Repurposing
+
+Support drug discovery workflows:
+
+```python
+agent.go("Identify FDA-approved drugs that could be repurposed for treating disease Y based on mechanism of action")
+```
+
+## Advanced Features
+
+### Custom Configuration per Agent
+
+Override global configuration for specific agent instances:
+
+```python
+agent = A1(
+    path='./project_data',
+    llm='gpt-4o',
+    timeout=1800
+)
+```
+
+### Conversation History and Reporting
+
+Save execution traces as formatted PDF reports:
+
+```python
+# After executing tasks
+agent.save_conversation_history(
+    output_path='./reports/experiment_log.pdf',
+    format='pdf'
+)
+```
+
+Requires one of: WeasyPrint, markdown2pdf, or Pandoc.
+
+### Model Context Protocol (MCP) Integration
+
+Extend agent capabilities with external tools:
+
+```python
+# Add MCP-compatible tools
+agent.add_mcp(config_path='./mcp_config.json')
+```
+
+MCP enables integration with:
+- Laboratory information management systems (LIMS)
+- Specialized bioinformatics databases
+- Custom analysis pipelines
+- External computational resources
+
+### Using Biomni-R0 (Specialized Reasoning Model)
+
+Deploy the 32B parameter Biomni-R0 model for enhanced biological reasoning:
+
+```bash
+# Install SGLang
+pip install "sglang[all]"
+
+# Deploy Biomni-R0
+python -m sglang.launch_server \
+    --model-path snap-stanford/biomni-r0 \
+    --port 30000 \
+    --trust-remote-code
+```
+
+Then configure the agent:
+
+```python
+from biomni.config import default_config
+
+default_config.llm = "openai/biomni-r0"
+default_config.api_base = "http://localhost:30000/v1"
+```
+
+Biomni-R0 provides specialized reasoning for:
+- Complex multi-step biological workflows
+- Hypothesis generation and evaluation
+- Experimental design optimization
+- Literature-informed analysis
+
+## Best Practices
+
+### Task Specification
+
+Provide clear, specific task descriptions:
+
+✅ **Good:** "Analyze this scRNA-seq dataset (file: data.h5ad) to identify T cell subtypes, then perform differential expression analysis comparing activated vs. resting T cells"
+
+❌ **Vague:** "Analyze my RNA-seq data"
+
+### Data Organization
+
+Structure data directories for efficient retrieval:
+
+```
+project/
+├── data/              # Biomni knowledge base
+├── raw_data/          # Your experimental data
+├── results/           # Analysis outputs
+└── reports/           # Generated reports
+```
+
+### Iterative Refinement
+
+Use iterative task execution for complex analyses:
+
+```python
+# Step 1: Exploratory analysis
+agent.go("Load and perform initial QC on the proteomics dataset")
+
+# Step 2: Based on results, refine analysis
+agent.go("Based on the QC results, remove low-quality samples and normalize using method X")
+
+# Step 3: Downstream analysis
+agent.go("Perform differential abundance analysis with adjusted parameters")
+```
+
+### Security Considerations
+
+**CRITICAL:** Biomni executes LLM-generated code with full system privileges. For production use:
+
+1. **Use sandboxed environments:** Deploy in Docker containers or VMs with restricted permissions
+2. **Validate sensitive operations:** Review code before execution for file access, network calls, or credential usage
+3. **Limit data access:** Restrict agent access to only necessary data directories
+4. **Monitor execution:** Log all executed code for audit trails
+
+Never run Biomni with:
+- Unrestricted file system access
+- Direct access to sensitive credentials
+- Network access to production systems
+- Elevated system privileges
+
+### Model Selection Guidelines
+
+Choose models based on task complexity:
+
+- **Claude Sonnet 4:** Recommended for most biomedical tasks, excellent biological reasoning
+- **GPT-4/GPT-4o:** Strong general capabilities, good for diverse tasks
+- **Biomni-R0:** Specialized for complex biological reasoning, multi-step workflows
+- **Smaller models:** Use for simple, well-defined tasks to reduce cost
+
+## Evaluation and Benchmarking
+
+Biomni-Eval1 benchmark contains 433 evaluation instances across 10 biological tasks:
+
+- GWAS analysis
+- Disease diagnosis
+- Gene detection and classification
+- Molecular property prediction
+- Pathway analysis
+- Protein function prediction
+- Drug response prediction
+- Variant interpretation
+- Cell type annotation
+- Biomarker discovery
+
+Use the benchmark to:
+- Evaluate custom agent configurations
+- Compare LLM providers for specific tasks
+- Validate analysis pipelines
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue:** Data download fails or times out
+**Solution:** Manually download the knowledge base or increase timeout settings
+
+**Issue:** Package dependency conflicts
+**Solution:** Some optional dependencies cannot be installed by default due to conflicts. Install specific packages manually and uncomment relevant code sections as documented in the repository
+
+**Issue:** LLM API errors
+**Solution:** Verify API key configuration, check rate limits, ensure sufficient credits
+
+**Issue:** Memory errors with large datasets
+**Solution:** Process data in chunks, use data subsampling, or deploy on higher-memory instances
+
+### Getting Help
+
+For detailed troubleshooting:
+- Review the Biomni GitHub repository issues
+- Check `references/api_reference.md` for detailed API documentation
+- Consult `references/task_examples.md` for comprehensive task patterns
+
+## Resources
+
+### references/
+Detailed reference documentation for advanced usage:
+
+- **api_reference.md:** Complete API documentation for A1 agent, configuration objects, and utility functions
+- **llm_providers.md:** Comprehensive guide for configuring all supported LLM providers (Anthropic, OpenAI, Azure, Gemini, Groq, Ollama, AWS Bedrock)
+- **task_examples.md:** Extensive collection of biomedical task examples with code patterns
+
+### scripts/
+Helper scripts for common operations:
+
+- **setup_environment.py:** Automated environment setup and validation
+- **generate_report.py:** Enhanced PDF report generation with custom formatting
+
+Load reference documentation as needed:
+```python
+# Claude can read reference files when needed for detailed information
+# Example: "Check references/llm_providers.md for Azure OpenAI configuration"
+```
--- a/scientific-packages/biomni/references/api_reference.md
+++ b/scientific-packages/biomni/references/api_reference.md
@@ -0,0 +1,635 @@
+# Biomni API Reference
+
+This document provides comprehensive API documentation for the Biomni biomedical AI agent system.
+
+## Core Classes
+
+### A1 Agent
+
+The primary agent class for executing biomedical research tasks.
+
+#### Initialization
+
+```python
+from biomni.agent import A1
+
+agent = A1(
+    path='./data',              # Path to biomedical knowledge base
+    llm='claude-sonnet-4-20250514',  # LLM model identifier
+    timeout=None,               # Optional timeout in seconds
+    verbose=True               # Enable detailed logging
+)
+```
+
+**Parameters:**
+
+- `path` (str, required): Directory path where the biomedical knowledge base is stored or will be downloaded. First-time initialization will download ~11GB of data.
+- `llm` (str, optional): LLM model identifier. Defaults to the value in `default_config.llm`. Supports multiple providers (see LLM Providers section).
+- `timeout` (int, optional): Maximum execution time in seconds for agent operations. Overrides `default_config.timeout_seconds`.
+- `verbose` (bool, optional): Enable verbose logging for debugging. Default: True.
+
+**Returns:** A1 agent instance ready for task execution.
+
+#### Methods
+
+##### `go(task_description: str) -> None`
+
+Execute a biomedical research task autonomously.
+
+```python
+agent.go("Analyze this scRNA-seq dataset and identify cell types")
+```
+
+**Parameters:**
+- `task_description` (str, required): Natural language description of the biomedical task to execute. Be specific about:
+  - Data location and format
+  - Desired analysis or output
+  - Any specific methods or parameters
+  - Expected results format
+
+**Behavior:**
+1. Decomposes the task into executable steps
+2. Retrieves relevant biomedical knowledge from the data lake
+3. Generates and executes Python/R code
+4. Provides results and visualizations
+5. Handles errors and retries with refinement
+
+**Notes:**
+- Executes code with system privileges - use in sandboxed environments
+- Long-running tasks may require timeout adjustments
+- Intermediate results are displayed during execution
+
+##### `save_conversation_history(output_path: str, format: str = 'pdf') -> None`
+
+Export conversation history and execution trace as a formatted report.
+
+```python
+agent.save_conversation_history(
+    output_path='./reports/analysis_log.pdf',
+    format='pdf'
+)
+```
+
+**Parameters:**
+- `output_path` (str, required): File path for the output report
+- `format` (str, optional): Output format. Options: 'pdf', 'markdown'. Default: 'pdf'
+
+**Requirements:**
+- For PDF: Install one of: WeasyPrint, markdown2pdf, or Pandoc
+  ```bash
+  pip install weasyprint  # Recommended
+  # or
+  pip install markdown2pdf
+  # or install Pandoc system-wide
+  ```
+
+**Report Contents:**
+- Task description and parameters
+- Retrieved biomedical knowledge
+- Generated code with execution traces
+- Results, visualizations, and outputs
+- Timestamps and execution metadata
+
+##### `add_mcp(config_path: str) -> None`
+
+Add Model Context Protocol (MCP) tools to extend agent capabilities.
+
+```python
+agent.add_mcp(config_path='./mcp_tools_config.json')
+```
+
+**Parameters:**
+- `config_path` (str, required): Path to MCP configuration JSON file
+
+**MCP Configuration Format:**
+```json
+{
+  "tools": [
+    {
+      "name": "tool_name",
+      "endpoint": "http://localhost:8000/tool",
+      "description": "Tool description for LLM",
+      "parameters": {
+        "param1": "string",
+        "param2": "integer"
+      }
+    }
+  ]
+}
+```
+
+**Use Cases:**
+- Connect to laboratory information systems
+- Integrate proprietary databases
+- Access specialized computational resources
+- Link to institutional data repositories
+
+## Configuration
+
+### default_config
+
+Global configuration object for Biomni settings.
+
+```python
+from biomni.config import default_config
+```
+
+#### Attributes
+
+##### `llm: str`
+
+Default LLM model identifier for all agent instances.
+
+```python
+default_config.llm = "claude-sonnet-4-20250514"
+```
+
+**Supported Models:**
+
+**Anthropic:**
+- `claude-sonnet-4-20250514` (Recommended)
+- `claude-opus-4-20250514`
+- `claude-3-5-sonnet-20241022`
+- `claude-3-opus-20240229`
+
+**OpenAI:**
+- `gpt-4o`
+- `gpt-4`
+- `gpt-4-turbo`
+- `gpt-3.5-turbo`
+
+**Azure OpenAI:**
+- `azure/gpt-4`
+- `azure/<deployment-name>`
+
+**Google Gemini:**
+- `gemini/gemini-pro`
+- `gemini/gemini-1.5-pro`
+
+**Groq:**
+- `groq/llama-3.1-70b-versatile`
+- `groq/mixtral-8x7b-32768`
+
+**Ollama (Local):**
+- `ollama/llama3`
+- `ollama/mistral`
+- `ollama/<model-name>`
+
+**AWS Bedrock:**
+- `bedrock/anthropic.claude-v2`
+- `bedrock/anthropic.claude-3-sonnet`
+
+**Custom/Biomni-R0:**
+- `openai/biomni-r0` (requires local SGLang deployment)
+
+##### `timeout_seconds: int`
+
+Default timeout for agent operations in seconds.
+
+```python
+default_config.timeout_seconds = 1200  # 20 minutes
+```
+
+**Recommended Values:**
+- Simple tasks (QC, basic analysis): 300-600 seconds
+- Medium tasks (differential expression, clustering): 600-1200 seconds
+- Complex tasks (full pipelines, ML models): 1200-3600 seconds
+- Very complex tasks: 3600+ seconds
+
+##### `data_path: str`
+
+Default path to biomedical knowledge base.
+
+```python
+default_config.data_path = "/path/to/biomni/data"
+```
+
+**Storage Requirements:**
+- Initial download: ~11GB
+- Extracted size: ~15GB
+- Additional working space: ~5-10GB recommended
+
+##### `api_base: str`
+
+Custom API endpoint for LLM providers (advanced usage).
+
+```python
+# For local Biomni-R0 deployment
+default_config.api_base = "http://localhost:30000/v1"
+
+# For custom OpenAI-compatible endpoints
+default_config.api_base = "https://your-endpoint.com/v1"
+```
+
+##### `max_retries: int`
+
+Number of retry attempts for failed operations.
+
+```python
+default_config.max_retries = 3
+```
+
+#### Methods
+
+##### `reset() -> None`
+
+Reset all configuration values to system defaults.
+
+```python
+default_config.reset()
+```
+
+## Database Query System
+
+Biomni includes a retrieval-augmented generation (RAG) system for querying the biomedical knowledge base.
+
+### Query Functions
+
+#### `query_genes(query: str, top_k: int = 10) -> List[Dict]`
+
+Query gene information from integrated databases.
+
+```python
+from biomni.database import query_genes
+
+results = query_genes(
+    query="genes involved in p53 pathway",
+    top_k=20
+)
+```
+
+**Parameters:**
+- `query` (str): Natural language or gene identifier query
+- `top_k` (int): Number of results to return
+
+**Returns:** List of dictionaries containing:
+- `gene_symbol`: Official gene symbol
+- `gene_name`: Full gene name
+- `description`: Functional description
+- `pathways`: Associated biological pathways
+- `go_terms`: Gene Ontology annotations
+- `diseases`: Associated diseases
+- `similarity_score`: Relevance score (0-1)
+
+#### `query_proteins(query: str, top_k: int = 10) -> List[Dict]`
+
+Query protein information from UniProt and other sources.
+
+```python
+from biomni.database import query_proteins
+
+results = query_proteins(
+    query="kinase proteins in cell cycle",
+    top_k=15
+)
+```
+
+**Returns:** List of dictionaries with protein metadata:
+- `uniprot_id`: UniProt accession
+- `protein_name`: Protein name
+- `function`: Functional annotation
+- `domains`: Protein domains
+- `subcellular_location`: Cellular localization
+- `similarity_score`: Relevance score
+
+#### `query_drugs(query: str, top_k: int = 10) -> List[Dict]`
+
+Query drug and compound information.
+
+```python
+from biomni.database import query_drugs
+
+results = query_drugs(
+    query="FDA approved cancer drugs targeting EGFR",
+    top_k=10
+)
+```
+
+**Returns:** Drug information including:
+- `drug_name`: Common name
+- `drugbank_id`: DrugBank identifier
+- `indication`: Therapeutic indication
+- `mechanism`: Mechanism of action
+- `targets`: Molecular targets
+- `approval_status`: Regulatory status
+- `smiles`: Chemical structure (SMILES notation)
+
+#### `query_diseases(query: str, top_k: int = 10) -> List[Dict]`
+
+Query disease information from clinical databases.
+
+```python
+from biomni.database import query_diseases
+
+results = query_diseases(
+    query="autoimmune diseases affecting joints",
+    top_k=10
+)
+```
+
+**Returns:** Disease data:
+- `disease_name`: Standard disease name
+- `disease_id`: Ontology identifier
+- `symptoms`: Clinical manifestations
+- `associated_genes`: Genetic associations
+- `prevalence`: Epidemiological data
+
+#### `query_pathways(query: str, top_k: int = 10) -> List[Dict]`
+
+Query biological pathways from KEGG, Reactome, and other sources.
+
+```python
+from biomni.database import query_pathways
+
+results = query_pathways(
+    query="immune response signaling pathways",
+    top_k=15
+)
+```
+
+**Returns:** Pathway information:
+- `pathway_name`: Pathway name
+- `pathway_id`: Database identifier
+- `genes`: Genes in pathway
+- `description`: Functional description
+- `source`: Database source (KEGG, Reactome, etc.)
+
+## Data Structures
+
+### TaskResult
+
+Result object returned by complex agent operations.
+
+```python
+class TaskResult:
+    success: bool           # Whether task completed successfully
+    output: Any            # Task output (varies by task)
+    code: str             # Generated code
+    execution_time: float # Execution time in seconds
+    error: Optional[str]  # Error message if failed
+    metadata: Dict        # Additional metadata
+```
+
+### BiomedicalEntity
+
+Base class for biomedical entities in the knowledge base.
+
+```python
+class BiomedicalEntity:
+    entity_id: str        # Unique identifier
+    entity_type: str      # Type (gene, protein, drug, etc.)
+    name: str            # Entity name
+    description: str     # Description
+    attributes: Dict     # Additional attributes
+    references: List[str] # Literature references
+```
+
+## Utility Functions
+
+### `download_data(path: str, force: bool = False) -> None`
+
+Manually download or update the biomedical knowledge base.
+
+```python
+from biomni.utils import download_data
+
+download_data(
+    path='./data',
+    force=True  # Force re-download
+)
+```
+
+### `validate_environment() -> Dict[str, bool]`
+
+Check if the environment is properly configured.
+
+```python
+from biomni.utils import validate_environment
+
+status = validate_environment()
+# Returns: {
+#   'conda_env': True,
+#   'api_keys': True,
+#   'data_available': True,
+#   'dependencies': True
+# }
+```
+
+### `list_available_models() -> List[str]`
+
+Get a list of available LLM models based on configured API keys.
+
+```python
+from biomni.utils import list_available_models
+
+models = list_available_models()
+# Returns: ['claude-sonnet-4-20250514', 'gpt-4o', ...]
+```
+
+## Error Handling
+
+### Common Exceptions
+
+#### `BiomniConfigError`
+
+Raised when configuration is invalid or incomplete.
+
+```python
+from biomni.exceptions import BiomniConfigError
+
+try:
+    agent = A1(path='./data')
+except BiomniConfigError as e:
+    print(f"Configuration error: {e}")
+```
+
+#### `BiomniExecutionError`
+
+Raised when code generation or execution fails.
+
+```python
+from biomni.exceptions import BiomniExecutionError
+
+try:
+    agent.go("invalid task")
+except BiomniExecutionError as e:
+    print(f"Execution failed: {e}")
+    # Access failed code: e.code
+    # Access error details: e.details
+```
+
+#### `BiomniDataError`
+
+Raised when knowledge base or data access fails.
+
+```python
+from biomni.exceptions import BiomniDataError
+
+try:
+    results = query_genes("unknown query format")
+except BiomniDataError as e:
+    print(f"Data access error: {e}")
+```
+
+#### `BiomniTimeoutError`
+
+Raised when operations exceed timeout limit.
+
+```python
+from biomni.exceptions import BiomniTimeoutError
+
+try:
+    agent.go("very complex long-running task")
+except BiomniTimeoutError as e:
+    print(f"Task timed out after {e.duration} seconds")
+    # Partial results may be available: e.partial_results
+```
+
+## Best Practices
+
+### Efficient Knowledge Retrieval
+
+Pre-query databases for relevant context before complex tasks:
+
+```python
+from biomni.database import query_genes, query_pathways
+
+# Gather relevant biological context first
+genes = query_genes("cell cycle genes", top_k=50)
+pathways = query_pathways("cell cycle regulation", top_k=20)
+
+# Then execute task with enriched context
+agent.go(f"""
+Analyze the cell cycle progression in this dataset.
+Focus on these genes: {[g['gene_symbol'] for g in genes]}
+Consider these pathways: {[p['pathway_name'] for p in pathways]}
+""")
+```
+
+### Error Recovery
+
+Implement robust error handling for production workflows:
+
+```python
+from biomni.exceptions import BiomniExecutionError, BiomniTimeoutError
+
+max_attempts = 3
+for attempt in range(max_attempts):
+    try:
+        agent.go("complex biomedical task")
+        break
+    except BiomniTimeoutError:
+        # Increase timeout and retry
+        default_config.timeout_seconds *= 2
+        print(f"Timeout, retrying with {default_config.timeout_seconds}s timeout")
+    except BiomniExecutionError as e:
+        # Refine task based on error
+        print(f"Execution failed: {e}, refining task...")
+        # Optionally modify task description
+    else:
+        print("Task failed after max attempts")
+```
+
+### Memory Management
+
+For large-scale analyses, manage memory explicitly:
+
+```python
+import gc
+
+# Process datasets in chunks
+for chunk_id in range(num_chunks):
+    agent.go(f"Process data chunk {chunk_id} located at data/chunk_{chunk_id}.h5ad")
+
+    # Force garbage collection between chunks
+    gc.collect()
+
+    # Save intermediate results
+    agent.save_conversation_history(f"./reports/chunk_{chunk_id}.pdf")
+```
+
+### Reproducibility
+
+Ensure reproducible analyses by:
+
+1. **Fixing random seeds:**
+```python
+agent.go("Set random seed to 42 for all analyses, then perform clustering...")
+```
+
+2. **Logging configuration:**
+```python
+import json
+config_log = {
+    'llm': default_config.llm,
+    'timeout': default_config.timeout_seconds,
+    'data_path': default_config.data_path,
+    'timestamp': datetime.now().isoformat()
+}
+with open('config_log.json', 'w') as f:
+    json.dump(config_log, f, indent=2)
+```
+
+3. **Saving execution traces:**
+```python
+# Always save detailed reports
+agent.save_conversation_history('./reports/full_analysis.pdf')
+```
+
+## Performance Optimization
+
+### Model Selection Strategy
+
+Choose models based on task characteristics:
+
+```python
+# For exploratory, simple tasks
+default_config.llm = "gpt-3.5-turbo"  # Fast, cost-effective
+
+# For standard biomedical analyses
+default_config.llm = "claude-sonnet-4-20250514"  # Recommended
+
+# For complex reasoning and hypothesis generation
+default_config.llm = "claude-opus-4-20250514"  # Highest quality
+
+# For specialized biological reasoning
+default_config.llm = "openai/biomni-r0"  # Requires local deployment
+```
+
+### Timeout Tuning
+
+Set appropriate timeouts based on task complexity:
+
+```python
+# Quick queries and simple analyses
+agent = A1(path='./data', timeout=300)
+
+# Standard workflows
+agent = A1(path='./data', timeout=1200)
+
+# Full pipelines with ML training
+agent = A1(path='./data', timeout=3600)
+```
+
+### Caching and Reuse
+
+Reuse agent instances for multiple related tasks:
+
+```python
+# Create agent once
+agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+
+# Execute multiple related tasks
+tasks = [
+    "Load and QC the scRNA-seq dataset",
+    "Perform clustering with resolution 0.5",
+    "Identify marker genes for each cluster",
+    "Annotate cell types based on markers"
+]
+
+for task in tasks:
+    agent.go(task)
+
+# Save complete workflow
+agent.save_conversation_history('./reports/full_workflow.pdf')
+```
--- a/scientific-packages/biomni/references/llm_providers.md
+++ b/scientific-packages/biomni/references/llm_providers.md
@@ -0,0 +1,649 @@
+# LLM Provider Configuration Guide
+
+This document provides comprehensive configuration instructions for all LLM providers supported by Biomni.
+
+## Overview
+
+Biomni supports multiple LLM providers through a unified interface. Configure providers using:
+- Environment variables
+- `.env` files
+- Runtime configuration via `default_config`
+
+## Quick Reference Table
+
+| Provider | Recommended For | API Key Required | Cost | Setup Complexity |
+|----------|----------------|------------------|------|------------------|
+| Anthropic Claude | Most biomedical tasks | Yes | Medium | Easy |
+| OpenAI | General tasks | Yes | Medium-High | Easy |
+| Azure OpenAI | Enterprise deployment | Yes | Varies | Medium |
+| Google Gemini | Multimodal tasks | Yes | Medium | Easy |
+| Groq | Fast inference | Yes | Low | Easy |
+| Ollama | Local/offline use | No | Free | Medium |
+| AWS Bedrock | AWS ecosystem | Yes | Varies | Hard |
+| Biomni-R0 | Complex biological reasoning | No | Free | Hard |
+
+## Anthropic Claude (Recommended)
+
+### Overview
+
+Claude models from Anthropic provide excellent biological reasoning capabilities and are the recommended choice for most Biomni tasks.
+
+### Setup
+
+1. **Obtain API Key:**
+   - Sign up at https://console.anthropic.com/
+   - Navigate to API Keys section
+   - Generate a new key
+
+2. **Configure Environment:**
+
+   **Option A: Environment Variable**
+   ```bash
+   export ANTHROPIC_API_KEY="sk-ant-api03-..."
+   ```
+
+   **Option B: .env File**
+   ```bash
+   # .env file in project root
+   ANTHROPIC_API_KEY=sk-ant-api03-...
+   ```
+
+3. **Set Model in Code:**
+   ```python
+   from biomni.config import default_config
+
+   # Claude Sonnet 4 (Recommended)
+   default_config.llm = "claude-sonnet-4-20250514"
+
+   # Claude Opus 4 (Most capable)
+   default_config.llm = "claude-opus-4-20250514"
+
+   # Claude 3.5 Sonnet (Previous version)
+   default_config.llm = "claude-3-5-sonnet-20241022"
+   ```
+
+### Available Models
+
+| Model | Context Window | Strengths | Best For |
+|-------|---------------|-----------|----------|
+| `claude-sonnet-4-20250514` | 200K tokens | Balanced performance, cost-effective | Most biomedical tasks |
+| `claude-opus-4-20250514` | 200K tokens | Highest capability, complex reasoning | Difficult multi-step analyses |
+| `claude-3-5-sonnet-20241022` | 200K tokens | Fast, reliable | Standard workflows |
+| `claude-3-opus-20240229` | 200K tokens | Strong reasoning | Legacy support |
+
+### Advanced Configuration
+
+```python
+from biomni.config import default_config
+
+# Use Claude with custom parameters
+default_config.llm = "claude-sonnet-4-20250514"
+default_config.timeout_seconds = 1800
+
+# Optional: Custom API endpoint (for proxy/enterprise)
+default_config.api_base = "https://your-proxy.com/v1"
+```
+
+### Cost Estimation
+
+Approximate costs per 1M tokens (as of January 2025):
+- Input: $3-15 depending on model
+- Output: $15-75 depending on model
+
+For a typical biomedical analysis (~50K tokens total): $0.50-$2.00
+
+## OpenAI
+
+### Overview
+
+OpenAI's GPT models provide strong general capabilities suitable for diverse biomedical tasks.
+
+### Setup
+
+1. **Obtain API Key:**
+   - Sign up at https://platform.openai.com/
+   - Navigate to API Keys
+   - Create new secret key
+
+2. **Configure Environment:**
+
+   ```bash
+   export OPENAI_API_KEY="sk-proj-..."
+   ```
+
+   Or in `.env`:
+   ```
+   OPENAI_API_KEY=sk-proj-...
+   ```
+
+3. **Set Model:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "gpt-4o"          # Recommended
+   # default_config.llm = "gpt-4"         # Previous flagship
+   # default_config.llm = "gpt-4-turbo"   # Fast variant
+   # default_config.llm = "gpt-3.5-turbo" # Budget option
+   ```
+
+### Available Models
+
+| Model | Context Window | Strengths | Cost |
+|-------|---------------|-----------|------|
+| `gpt-4o` | 128K tokens | Fast, multimodal | Medium |
+| `gpt-4-turbo` | 128K tokens | Fast inference | Medium |
+| `gpt-4` | 8K tokens | Reliable | High |
+| `gpt-3.5-turbo` | 16K tokens | Fast, cheap | Low |
+
+### Cost Optimization
+
+```python
+# For exploratory analysis (budget-conscious)
+default_config.llm = "gpt-3.5-turbo"
+
+# For production analysis (quality-focused)
+default_config.llm = "gpt-4o"
+```
+
+## Azure OpenAI
+
+### Overview
+
+Azure-hosted OpenAI models for enterprise users requiring data residency and compliance.
+
+### Setup
+
+1. **Azure Prerequisites:**
+   - Active Azure subscription
+   - Azure OpenAI resource created
+   - Model deployment configured
+
+2. **Environment Variables:**
+   ```bash
+   export AZURE_OPENAI_API_KEY="your-key"
+   export AZURE_OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
+   export AZURE_OPENAI_API_VERSION="2024-02-15-preview"
+   ```
+
+3. **Configuration:**
+   ```python
+   from biomni.config import default_config
+
+   # Option 1: Use deployment name
+   default_config.llm = "azure/your-deployment-name"
+
+   # Option 2: Specify endpoint explicitly
+   default_config.llm = "azure/gpt-4"
+   default_config.api_base = "https://your-resource.openai.azure.com/"
+   ```
+
+### Deployment Setup
+
+Azure OpenAI requires explicit model deployments:
+
+1. Navigate to Azure OpenAI Studio
+2. Create deployment for desired model (e.g., GPT-4)
+3. Note the deployment name
+4. Use deployment name in Biomni configuration
+
+### Example Configuration
+
+```python
+from biomni.config import default_config
+import os
+
+# Set Azure credentials
+os.environ['AZURE_OPENAI_API_KEY'] = 'your-key'
+os.environ['AZURE_OPENAI_ENDPOINT'] = 'https://your-resource.openai.azure.com/'
+
+# Configure Biomni to use Azure deployment
+default_config.llm = "azure/gpt-4-biomni"  # Your deployment name
+default_config.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
+```
+
+## Google Gemini
+
+### Overview
+
+Google's Gemini models offer multimodal capabilities and competitive performance.
+
+### Setup
+
+1. **Obtain API Key:**
+   - Visit https://makersuite.google.com/app/apikey
+   - Create new API key
+
+2. **Environment Configuration:**
+   ```bash
+   export GEMINI_API_KEY="your-key"
+   ```
+
+3. **Set Model:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "gemini/gemini-1.5-pro"
+   # Or: default_config.llm = "gemini/gemini-pro"
+   ```
+
+### Available Models
+
+| Model | Context Window | Strengths |
+|-------|---------------|-----------|
+| `gemini/gemini-1.5-pro` | 1M tokens | Very large context, multimodal |
+| `gemini/gemini-pro` | 32K tokens | Balanced performance |
+
+### Use Cases
+
+Gemini excels at:
+- Tasks requiring very large context windows
+- Multimodal analysis (when incorporating images)
+- Cost-effective alternative to GPT-4
+
+```python
+# For tasks with large context requirements
+default_config.llm = "gemini/gemini-1.5-pro"
+default_config.timeout_seconds = 2400  # May need longer timeout
+```
+
+## Groq
+
+### Overview
+
+Groq provides ultra-fast inference with open-source models, ideal for rapid iteration.
+
+### Setup
+
+1. **Get API Key:**
+   - Sign up at https://console.groq.com/
+   - Generate API key
+
+2. **Configure:**
+   ```bash
+   export GROQ_API_KEY="gsk_..."
+   ```
+
+3. **Set Model:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "groq/llama-3.1-70b-versatile"
+   # Or: default_config.llm = "groq/mixtral-8x7b-32768"
+   ```
+
+### Available Models
+
+| Model | Context Window | Speed | Quality |
+|-------|---------------|-------|---------|
+| `groq/llama-3.1-70b-versatile` | 32K tokens | Very Fast | Good |
+| `groq/mixtral-8x7b-32768` | 32K tokens | Very Fast | Good |
+| `groq/llama-3-70b-8192` | 8K tokens | Ultra Fast | Moderate |
+
+### Best Practices
+
+```python
+# For rapid prototyping and testing
+default_config.llm = "groq/llama-3.1-70b-versatile"
+default_config.timeout_seconds = 600  # Groq is fast
+
+# Note: Quality may be lower than GPT-4/Claude for complex tasks
+# Recommended for: QC, simple analyses, testing workflows
+```
+
+## Ollama (Local Deployment)
+
+### Overview
+
+Run LLMs entirely locally for offline use, data privacy, or cost savings.
+
+### Setup
+
+1. **Install Ollama:**
+   ```bash
+   # macOS/Linux
+   curl -fsSL https://ollama.com/install.sh | sh
+
+   # Or download from https://ollama.com/download
+   ```
+
+2. **Pull Models:**
+   ```bash
+   ollama pull llama3       # Meta Llama 3 (8B)
+   ollama pull mixtral      # Mixtral (47B)
+   ollama pull codellama    # Code-specialized
+   ollama pull medllama     # Medical domain (if available)
+   ```
+
+3. **Start Ollama Server:**
+   ```bash
+   ollama serve  # Runs on http://localhost:11434
+   ```
+
+4. **Configure Biomni:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "ollama/llama3"
+   default_config.api_base = "http://localhost:11434"
+   ```
+
+### Hardware Requirements
+
+Minimum recommendations:
+- **8B models:** 16GB RAM, CPU inference acceptable
+- **70B models:** 64GB RAM, GPU highly recommended
+- **Storage:** 5-50GB per model
+
+### Model Selection
+
+```python
+# Fast, local, good for testing
+default_config.llm = "ollama/llama3"
+
+# Better quality (requires more resources)
+default_config.llm = "ollama/mixtral"
+
+# Code generation tasks
+default_config.llm = "ollama/codellama"
+```
+
+### Advantages & Limitations
+
+**Advantages:**
+- Complete data privacy
+- No API costs
+- Offline operation
+- Unlimited usage
+
+**Limitations:**
+- Lower quality than GPT-4/Claude for complex tasks
+- Requires significant hardware
+- Slower inference (especially on CPU)
+- May struggle with specialized biomedical knowledge
+
+## AWS Bedrock
+
+### Overview
+
+AWS-managed LLM service offering multiple model providers.
+
+### Setup
+
+1. **AWS Prerequisites:**
+   - AWS account with Bedrock access
+   - Model access enabled in Bedrock console
+   - AWS credentials configured
+
+2. **Configure AWS Credentials:**
+   ```bash
+   # Option 1: AWS CLI
+   aws configure
+
+   # Option 2: Environment variables
+   export AWS_ACCESS_KEY_ID="your-key"
+   export AWS_SECRET_ACCESS_KEY="your-secret"
+   export AWS_REGION="us-east-1"
+   ```
+
+3. **Enable Model Access:**
+   - Navigate to AWS Bedrock console
+   - Request access to desired models
+   - Wait for approval (may take hours/days)
+
+4. **Configure Biomni:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "bedrock/anthropic.claude-3-sonnet"
+   # Or: default_config.llm = "bedrock/anthropic.claude-v2"
+   ```
+
+### Available Models
+
+Bedrock provides access to:
+- Anthropic Claude models
+- Amazon Titan models
+- AI21 Jurassic models
+- Cohere Command models
+- Meta Llama models
+
+### IAM Permissions
+
+Required IAM policy:
+```json
+{
+  "Version": "2012-10-17",
+  "Statement": [
+    {
+      "Effect": "Allow",
+      "Action": [
+        "bedrock:InvokeModel",
+        "bedrock:InvokeModelWithResponseStream"
+      ],
+      "Resource": "arn:aws:bedrock:*::foundation-model/*"
+    }
+  ]
+}
+```
+
+### Example Configuration
+
+```python
+from biomni.config import default_config
+import boto3
+
+# Verify AWS credentials
+session = boto3.Session()
+credentials = session.get_credentials()
+print(f"AWS Access Key: {credentials.access_key[:8]}...")
+
+# Configure Biomni
+default_config.llm = "bedrock/anthropic.claude-3-sonnet"
+default_config.timeout_seconds = 1800
+```
+
+## Biomni-R0 (Local Specialized Model)
+
+### Overview
+
+Biomni-R0 is a 32B parameter reasoning model specifically trained for biological problem-solving. Provides the highest quality for complex biomedical reasoning but requires local deployment.
+
+### Setup
+
+1. **Hardware Requirements:**
+   - GPU with 48GB+ VRAM (e.g., A100, H100)
+   - Or multi-GPU setup (2x 24GB)
+   - 100GB+ storage for model weights
+
+2. **Install Dependencies:**
+   ```bash
+   pip install "sglang[all]"
+   pip install flashinfer  # Optional but recommended
+   ```
+
+3. **Deploy Model:**
+   ```bash
+   python -m sglang.launch_server \
+       --model-path snap-stanford/biomni-r0 \
+       --host 0.0.0.0 \
+       --port 30000 \
+       --trust-remote-code \
+       --mem-fraction-static 0.8
+   ```
+
+   For multi-GPU:
+   ```bash
+   python -m sglang.launch_server \
+       --model-path snap-stanford/biomni-r0 \
+       --host 0.0.0.0 \
+       --port 30000 \
+       --trust-remote-code \
+       --tp 2  # Tensor parallelism across 2 GPUs
+   ```
+
+4. **Configure Biomni:**
+   ```python
+   from biomni.config import default_config
+
+   default_config.llm = "openai/biomni-r0"
+   default_config.api_base = "http://localhost:30000/v1"
+   default_config.timeout_seconds = 2400  # Longer for complex reasoning
+   ```
+
+### When to Use Biomni-R0
+
+Biomni-R0 excels at:
+- Multi-step biological reasoning
+- Complex experimental design
+- Hypothesis generation and evaluation
+- Literature-informed analysis
+- Tasks requiring deep biological knowledge
+
+```python
+# For complex biological reasoning tasks
+default_config.llm = "openai/biomni-r0"
+
+agent.go("""
+Design a comprehensive CRISPR screening experiment to identify synthetic
+lethal interactions with TP53 mutations in cancer cells, including:
+1. Rationale and hypothesis
+2. Guide RNA library design strategy
+3. Experimental controls
+4. Statistical analysis plan
+5. Expected outcomes and validation approach
+""")
+```
+
+### Performance Comparison
+
+| Model | Speed | Biological Reasoning | Code Quality | Cost |
+|-------|-------|---------------------|--------------|------|
+| GPT-4 | Fast | Good | Excellent | Medium |
+| Claude Sonnet 4 | Fast | Excellent | Excellent | Medium |
+| Biomni-R0 | Moderate | Outstanding | Good | Free (local) |
+
+## Multi-Provider Strategy
+
+### Intelligent Model Selection
+
+Use different models for different task types:
+
+```python
+from biomni.agent import A1
+from biomni.config import default_config
+
+# Strategy 1: Task-based selection
+def get_agent_for_task(task_complexity):
+    if task_complexity == "simple":
+        default_config.llm = "gpt-3.5-turbo"
+        default_config.timeout_seconds = 300
+    elif task_complexity == "medium":
+        default_config.llm = "claude-sonnet-4-20250514"
+        default_config.timeout_seconds = 1200
+    else:  # complex
+        default_config.llm = "openai/biomni-r0"
+        default_config.timeout_seconds = 2400
+
+    return A1(path='./data')
+
+# Strategy 2: Fallback on failure
+def execute_with_fallback(task):
+    models = [
+        "claude-sonnet-4-20250514",
+        "gpt-4o",
+        "claude-opus-4-20250514"
+    ]
+
+    for model in models:
+        try:
+            default_config.llm = model
+            agent = A1(path='./data')
+            agent.go(task)
+            return
+        except Exception as e:
+            print(f"Failed with {model}: {e}, trying next...")
+
+    raise Exception("All models failed")
+```
+
+### Cost Optimization Strategy
+
+```python
+# Phase 1: Rapid prototyping with cheap models
+default_config.llm = "gpt-3.5-turbo"
+agent.go("Quick exploratory analysis of dataset structure")
+
+# Phase 2: Detailed analysis with high-quality models
+default_config.llm = "claude-sonnet-4-20250514"
+agent.go("Comprehensive differential expression analysis with pathway enrichment")
+
+# Phase 3: Complex reasoning with specialized models
+default_config.llm = "openai/biomni-r0"
+agent.go("Generate biological hypotheses based on multi-omics integration")
+```
+
+## Troubleshooting
+
+### Common Issues
+
+**Issue: "API key not found"**
+- Verify environment variable is set: `echo $ANTHROPIC_API_KEY`
+- Check `.env` file exists and is in correct location
+- Try setting key programmatically: `os.environ['ANTHROPIC_API_KEY'] = 'key'`
+
+**Issue: "Rate limit exceeded"**
+- Implement exponential backoff and retry
+- Upgrade API tier if available
+- Switch to alternative provider temporarily
+
+**Issue: "Model not found"**
+- Verify model identifier is correct
+- Check API key has access to requested model
+- For Azure: ensure deployment exists with exact name
+
+**Issue: "Timeout errors"**
+- Increase `default_config.timeout_seconds`
+- Break complex tasks into smaller steps
+- Consider using faster model for initial phases
+
+**Issue: "Connection refused (Ollama/Biomni-R0)"**
+- Verify local server is running
+- Check port is not blocked by firewall
+- Confirm `api_base` URL is correct
+
+### Testing Configuration
+
+```python
+from biomni.utils import list_available_models, validate_environment
+
+# Check environment setup
+status = validate_environment()
+print("Environment Status:", status)
+
+# List available models based on configured keys
+models = list_available_models()
+print("Available Models:", models)
+
+# Test specific model
+try:
+    from biomni.agent import A1
+    agent = A1(path='./data', llm='claude-sonnet-4-20250514')
+    agent.go("Print 'Configuration successful!'")
+except Exception as e:
+    print(f"Configuration test failed: {e}")
+```
+
+## Best Practices Summary
+
+1. **For most users:** Start with Claude Sonnet 4 or GPT-4o
+2. **For cost sensitivity:** Use GPT-3.5-turbo for exploration, Claude Sonnet 4 for production
+3. **For privacy/offline:** Deploy Ollama locally
+4. **For complex reasoning:** Use Biomni-R0 if hardware available
+5. **For enterprise:** Consider Azure OpenAI or AWS Bedrock
+6. **For speed:** Use Groq for rapid iteration
+
+7. **Always:**
+   - Set appropriate timeouts
+   - Implement error handling and retries
+   - Log model and configuration for reproducibility
+   - Test configuration before production use
--- a/scientific-packages/biomni/references/task_examples.md
+++ b/scientific-packages/biomni/references/task_examples.md
--- a/scientific-packages/biomni/scripts/generate_report.py
+++ b/scientific-packages/biomni/scripts/generate_report.py
@@ -0,0 +1,381 @@
+#!/usr/bin/env python3
+"""
+Enhanced PDF Report Generation for Biomni
+
+This script provides advanced PDF report generation with custom formatting,
+styling, and metadata for Biomni analysis results.
+"""
+
+import argparse
+import sys
+from pathlib import Path
+from datetime import datetime
+from typing import Optional, Dict, Any
+
+
+def generate_markdown_report(
+    title: str,
+    sections: list,
+    metadata: Optional[Dict[str, Any]] = None,
+    output_path: str = "report.md"
+) -> str:
+    """
+    Generate a formatted markdown report.
+
+    Args:
+        title: Report title
+        sections: List of dicts with 'heading' and 'content' keys
+        metadata: Optional metadata dict (author, date, etc.)
+        output_path: Path to save markdown file
+
+    Returns:
+        Path to generated markdown file
+    """
+    md_content = []
+
+    # Title
+    md_content.append(f"# {title}\n")
+
+    # Metadata
+    if metadata:
+        md_content.append("---\n")
+        for key, value in metadata.items():
+            md_content.append(f"**{key}:** {value}  \n")
+        md_content.append("---\n\n")
+
+    # Sections
+    for section in sections:
+        heading = section.get('heading', 'Section')
+        content = section.get('content', '')
+        level = section.get('level', 2)  # Default to h2
+
+        md_content.append(f"{'#' * level} {heading}\n\n")
+        md_content.append(f"{content}\n\n")
+
+    # Write to file
+    output = Path(output_path)
+    output.write_text('\n'.join(md_content))
+
+    return str(output)
+
+
+def convert_to_pdf_weasyprint(
+    markdown_path: str,
+    output_path: str,
+    css_style: Optional[str] = None
+) -> bool:
+    """
+    Convert markdown to PDF using WeasyPrint.
+
+    Args:
+        markdown_path: Path to markdown file
+        output_path: Path for output PDF
+        css_style: Optional CSS stylesheet path
+
+    Returns:
+        True if successful, False otherwise
+    """
+    try:
+        import markdown
+        from weasyprint import HTML, CSS
+
+        # Read markdown
+        with open(markdown_path, 'r') as f:
+            md_content = f.read()
+
+        # Convert to HTML
+        html_content = markdown.markdown(
+            md_content,
+            extensions=['tables', 'fenced_code', 'codehilite']
+        )
+
+        # Wrap in HTML template
+        html_template = f"""
+        <!DOCTYPE html>
+        <html>
+        <head>
+            <meta charset="utf-8">
+            <title>Biomni Report</title>
+            <style>
+                body {{
+                    font-family: 'Helvetica', 'Arial', sans-serif;
+                    line-height: 1.6;
+                    color: #333;
+                    max-width: 800px;
+                    margin: 40px auto;
+                    padding: 20px;
+                }}
+                h1 {{
+                    color: #2c3e50;
+                    border-bottom: 3px solid #3498db;
+                    padding-bottom: 10px;
+                }}
+                h2 {{
+                    color: #34495e;
+                    margin-top: 30px;
+                    border-bottom: 1px solid #bdc3c7;
+                    padding-bottom: 5px;
+                }}
+                h3 {{
+                    color: #7f8c8d;
+                }}
+                code {{
+                    background-color: #f4f4f4;
+                    padding: 2px 6px;
+                    border-radius: 3px;
+                    font-family: 'Courier New', monospace;
+                }}
+                pre {{
+                    background-color: #f4f4f4;
+                    padding: 15px;
+                    border-radius: 5px;
+                    overflow-x: auto;
+                }}
+                table {{
+                    border-collapse: collapse;
+                    width: 100%;
+                    margin: 20px 0;
+                }}
+                th, td {{
+                    border: 1px solid #ddd;
+                    padding: 12px;
+                    text-align: left;
+                }}
+                th {{
+                    background-color: #3498db;
+                    color: white;
+                }}
+                tr:nth-child(even) {{
+                    background-color: #f9f9f9;
+                }}
+                .metadata {{
+                    background-color: #ecf0f1;
+                    padding: 15px;
+                    border-radius: 5px;
+                    margin: 20px 0;
+                }}
+            </style>
+        </head>
+        <body>
+            {html_content}
+        </body>
+        </html>
+        """
+
+        # Generate PDF
+        pdf = HTML(string=html_template)
+
+        # Add custom CSS if provided
+        stylesheets = []
+        if css_style and Path(css_style).exists():
+            stylesheets.append(CSS(filename=css_style))
+
+        pdf.write_pdf(output_path, stylesheets=stylesheets)
+
+        return True
+
+    except ImportError:
+        print("Error: WeasyPrint not installed. Install with: pip install weasyprint")
+        return False
+    except Exception as e:
+        print(f"Error generating PDF: {e}")
+        return False
+
+
+def convert_to_pdf_pandoc(markdown_path: str, output_path: str) -> bool:
+    """
+    Convert markdown to PDF using Pandoc.
+
+    Args:
+        markdown_path: Path to markdown file
+        output_path: Path for output PDF
+
+    Returns:
+        True if successful, False otherwise
+    """
+    try:
+        import subprocess
+
+        # Check if pandoc is installed
+        result = subprocess.run(
+            ['pandoc', '--version'],
+            capture_output=True,
+            text=True
+        )
+
+        if result.returncode != 0:
+            print("Error: Pandoc not installed")
+            return False
+
+        # Convert with pandoc
+        result = subprocess.run(
+            [
+                'pandoc',
+                markdown_path,
+                '-o', output_path,
+                '--pdf-engine=pdflatex',
+                '-V', 'geometry:margin=1in',
+                '--toc'
+            ],
+            capture_output=True,
+            text=True
+        )
+
+        if result.returncode != 0:
+            print(f"Pandoc error: {result.stderr}")
+            return False
+
+        return True
+
+    except FileNotFoundError:
+        print("Error: Pandoc not found. Install from https://pandoc.org/")
+        return False
+    except Exception as e:
+        print(f"Error: {e}")
+        return False
+
+
+def create_biomni_report(
+    conversation_history: list,
+    output_path: str = "biomni_report.pdf",
+    method: str = "weasyprint"
+) -> bool:
+    """
+    Create a formatted PDF report from Biomni conversation history.
+
+    Args:
+        conversation_history: List of conversation turns
+        output_path: Output PDF path
+        method: Conversion method ('weasyprint' or 'pandoc')
+
+    Returns:
+        True if successful
+    """
+    # Prepare report sections
+    metadata = {
+        'Date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+        'Tool': 'Biomni AI Agent',
+        'Report Type': 'Analysis Summary'
+    }
+
+    sections = []
+
+    # Executive Summary
+    sections.append({
+        'heading': 'Executive Summary',
+        'level': 2,
+        'content': 'This report contains the complete analysis workflow executed by the Biomni biomedical AI agent.'
+    })
+
+    # Conversation history
+    for i, turn in enumerate(conversation_history, 1):
+        sections.append({
+            'heading': f'Task {i}: {turn.get("task", "Analysis")}',
+            'level': 2,
+            'content': f'**Input:**\n```\n{turn.get("input", "")}\n```\n\n**Output:**\n{turn.get("output", "")}'
+        })
+
+    # Generate markdown
+    md_path = output_path.replace('.pdf', '.md')
+    generate_markdown_report(
+        title="Biomni Analysis Report",
+        sections=sections,
+        metadata=metadata,
+        output_path=md_path
+    )
+
+    # Convert to PDF
+    if method == 'weasyprint':
+        success = convert_to_pdf_weasyprint(md_path, output_path)
+    elif method == 'pandoc':
+        success = convert_to_pdf_pandoc(md_path, output_path)
+    else:
+        print(f"Unknown method: {method}")
+        return False
+
+    if success:
+        print(f"✓ Report generated: {output_path}")
+        print(f"  Markdown: {md_path}")
+    else:
+        print("✗ Failed to generate PDF")
+        print(f"  Markdown available: {md_path}")
+
+    return success
+
+
+def main():
+    """CLI for report generation."""
+    parser = argparse.ArgumentParser(
+        description='Generate formatted PDF reports for Biomni analyses'
+    )
+
+    parser.add_argument(
+        'input',
+        type=str,
+        help='Input markdown file or conversation history'
+    )
+
+    parser.add_argument(
+        '-o', '--output',
+        type=str,
+        default='biomni_report.pdf',
+        help='Output PDF path (default: biomni_report.pdf)'
+    )
+
+    parser.add_argument(
+        '-m', '--method',
+        type=str,
+        choices=['weasyprint', 'pandoc'],
+        default='weasyprint',
+        help='Conversion method (default: weasyprint)'
+    )
+
+    parser.add_argument(
+        '--css',
+        type=str,
+        help='Custom CSS stylesheet path'
+    )
+
+    args = parser.parse_args()
+
+    # Check if input is markdown or conversation history
+    input_path = Path(args.input)
+
+    if not input_path.exists():
+        print(f"Error: Input file not found: {args.input}")
+        return 1
+
+    # If input is markdown, convert directly
+    if input_path.suffix == '.md':
+        if args.method == 'weasyprint':
+            success = convert_to_pdf_weasyprint(
+                str(input_path),
+                args.output,
+                args.css
+            )
+        else:
+            success = convert_to_pdf_pandoc(str(input_path), args.output)
+
+        return 0 if success else 1
+
+    # Otherwise, assume it's conversation history (JSON)
+    try:
+        import json
+        with open(input_path) as f:
+            history = json.load(f)
+
+        success = create_biomni_report(
+            history,
+            args.output,
+            args.method
+        )
+
+        return 0 if success else 1
+
+    except json.JSONDecodeError:
+        print("Error: Input file is not valid JSON or markdown")
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scientific-packages/biomni/scripts/setup_environment.py
+++ b/scientific-packages/biomni/scripts/setup_environment.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python3
+"""
+Biomni Environment Setup and Validation Script
+
+This script helps users set up and validate their Biomni environment,
+including checking dependencies, API keys, and data availability.
+"""
+
+import os
+import sys
+import subprocess
+from pathlib import Path
+from typing import Dict, List, Tuple
+
+
+def check_python_version() -> Tuple[bool, str]:
+    """Check if Python version is compatible."""
+    version = sys.version_info
+    if version.major == 3 and version.minor >= 8:
+        return True, f"Python {version.major}.{version.minor}.{version.micro} ✓"
+    else:
+        return False, f"Python {version.major}.{version.minor} - requires Python 3.8+"
+
+
+def check_conda_env() -> Tuple[bool, str]:
+    """Check if running in biomni conda environment."""
+    conda_env = os.environ.get('CONDA_DEFAULT_ENV', None)
+    if conda_env == 'biomni_e1':
+        return True, f"Conda environment: {conda_env} ✓"
+    else:
+        return False, f"Not in biomni_e1 environment (current: {conda_env})"
+
+
+def check_package_installed(package: str) -> bool:
+    """Check if a Python package is installed."""
+    try:
+        __import__(package)
+        return True
+    except ImportError:
+        return False
+
+
+def check_dependencies() -> Tuple[bool, List[str]]:
+    """Check for required and optional dependencies."""
+    required = ['biomni']
+    optional = ['weasyprint', 'markdown2pdf']
+
+    missing_required = [pkg for pkg in required if not check_package_installed(pkg)]
+    missing_optional = [pkg for pkg in optional if not check_package_installed(pkg)]
+
+    messages = []
+    success = len(missing_required) == 0
+
+    if missing_required:
+        messages.append(f"Missing required packages: {', '.join(missing_required)}")
+        messages.append("Install with: pip install biomni --upgrade")
+    else:
+        messages.append("Required packages: ✓")
+
+    if missing_optional:
+        messages.append(f"Missing optional packages: {', '.join(missing_optional)}")
+        messages.append("For PDF reports, install: pip install weasyprint")
+
+    return success, messages
+
+
+def check_api_keys() -> Tuple[bool, Dict[str, bool]]:
+    """Check which API keys are configured."""
+    api_keys = {
+        'ANTHROPIC_API_KEY': os.environ.get('ANTHROPIC_API_KEY'),
+        'OPENAI_API_KEY': os.environ.get('OPENAI_API_KEY'),
+        'GEMINI_API_KEY': os.environ.get('GEMINI_API_KEY'),
+        'GROQ_API_KEY': os.environ.get('GROQ_API_KEY'),
+    }
+
+    configured = {key: bool(value) for key, value in api_keys.items()}
+    has_any = any(configured.values())
+
+    return has_any, configured
+
+
+def check_data_directory(data_path: str = './data') -> Tuple[bool, str]:
+    """Check if Biomni data directory exists and has content."""
+    path = Path(data_path)
+
+    if not path.exists():
+        return False, f"Data directory not found at {data_path}"
+
+    # Check if directory has files (data has been downloaded)
+    files = list(path.glob('*'))
+    if len(files) == 0:
+        return False, f"Data directory exists but is empty. Run agent once to download."
+
+    # Rough size check (should be ~11GB)
+    total_size = sum(f.stat().st_size for f in path.rglob('*') if f.is_file())
+    size_gb = total_size / (1024**3)
+
+    if size_gb < 1:
+        return False, f"Data directory exists but seems incomplete ({size_gb:.1f} GB)"
+
+    return True, f"Data directory: {data_path} ({size_gb:.1f} GB) ✓"
+
+
+def check_disk_space(required_gb: float = 20) -> Tuple[bool, str]:
+    """Check if sufficient disk space is available."""
+    try:
+        import shutil
+        stat = shutil.disk_usage('.')
+        free_gb = stat.free / (1024**3)
+
+        if free_gb >= required_gb:
+            return True, f"Disk space: {free_gb:.1f} GB available ✓"
+        else:
+            return False, f"Low disk space: {free_gb:.1f} GB (need {required_gb} GB)"
+    except Exception as e:
+        return False, f"Could not check disk space: {e}"
+
+
+def test_biomni_import() -> Tuple[bool, str]:
+    """Test if Biomni can be imported and initialized."""
+    try:
+        from biomni.agent import A1
+        from biomni.config import default_config
+        return True, "Biomni import successful ✓"
+    except ImportError as e:
+        return False, f"Cannot import Biomni: {e}"
+    except Exception as e:
+        return False, f"Biomni import error: {e}"
+
+
+def suggest_fixes(results: Dict[str, Tuple[bool, any]]) -> List[str]:
+    """Generate suggestions for fixing issues."""
+    suggestions = []
+
+    if not results['python'][0]:
+        suggestions.append("➜ Upgrade Python to 3.8 or higher")
+
+    if not results['conda'][0]:
+        suggestions.append("➜ Activate biomni environment: conda activate biomni_e1")
+
+    if not results['dependencies'][0]:
+        suggestions.append("➜ Install Biomni: pip install biomni --upgrade")
+
+    if not results['api_keys'][0]:
+        suggestions.append("➜ Set API key: export ANTHROPIC_API_KEY='your-key'")
+        suggestions.append("   Or create .env file with API keys")
+
+    if not results['data'][0]:
+        suggestions.append("➜ Data will auto-download on first agent.go() call")
+
+    if not results['disk_space'][0]:
+        suggestions.append("➜ Free up disk space (need ~20GB total)")
+
+    return suggestions
+
+
+def main():
+    """Run all environment checks and display results."""
+    print("=" * 60)
+    print("Biomni Environment Validation")
+    print("=" * 60)
+    print()
+
+    # Run all checks
+    results = {}
+
+    print("Checking Python version...")
+    results['python'] = check_python_version()
+    print(f"  {results['python'][1]}")
+    print()
+
+    print("Checking conda environment...")
+    results['conda'] = check_conda_env()
+    print(f"  {results['conda'][1]}")
+    print()
+
+    print("Checking dependencies...")
+    results['dependencies'] = check_dependencies()
+    for msg in results['dependencies'][1]:
+        print(f"  {msg}")
+    print()
+
+    print("Checking API keys...")
+    results['api_keys'] = check_api_keys()
+    has_keys, key_status = results['api_keys']
+    for key, configured in key_status.items():
+        status = "✓" if configured else "✗"
+        print(f"  {key}: {status}")
+    print()
+
+    print("Checking Biomni data directory...")
+    results['data'] = check_data_directory()
+    print(f"  {results['data'][1]}")
+    print()
+
+    print("Checking disk space...")
+    results['disk_space'] = check_disk_space()
+    print(f"  {results['disk_space'][1]}")
+    print()
+
+    print("Testing Biomni import...")
+    results['biomni_import'] = test_biomni_import()
+    print(f"  {results['biomni_import'][1]}")
+    print()
+
+    # Summary
+    print("=" * 60)
+    all_passed = all(result[0] for result in results.values())
+
+    if all_passed:
+        print("✓ All checks passed! Environment is ready.")
+        print()
+        print("Quick start:")
+        print("  from biomni.agent import A1")
+        print("  agent = A1(path='./data', llm='claude-sonnet-4-20250514')")
+        print("  agent.go('Your biomedical task')")
+    else:
+        print("⚠ Some checks failed. See suggestions below:")
+        print()
+        suggestions = suggest_fixes(results)
+        for suggestion in suggestions:
+            print(suggestion)
+
+    print("=" * 60)
+
+    return 0 if all_passed else 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())