Update Modal skill

2026-03-27 07:09:27 +08:00 · 2026-03-23 16:21:31 -07:00
parent 71e26ffa6d
commit b75f4e8d08
15 changed files with 2062 additions and 2413 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -6,7 +6,7 @@
  },
  "metadata": {
    "description": "Claude scientific skills from K-Dense Inc",
-    "version": "2.29.0"
+    "version": "2.30.0"
  },
  "plugins": [
    {
--- a/docs/scientific-skills.md
+++ b/docs/scientific-skills.md
@@ -77,7 +77,7 @@
 ### Data Management & Infrastructure
 - **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing
- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container), persistent storage via Volumes for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs, parallel execution with `.map()` for batch processing, input concurrency for I/O-bound workloads, and resource configuration (CPU cores, memory, disk). Supports custom Docker images, integration with Hugging Face/Weights & Biases, FastAPI for web endpoints, and distributed training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, embeddings), GPU-accelerated training, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
+- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200, B200+), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv (recommended)/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container, up to 1,536 GB VRAM), persistent storage via Volumes (v1 and v2) for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs (FastAPI, ASGI, WSGI, WebSockets), parallel execution with `.map()` for batch processing, input concurrency and dynamic batching for I/O-bound workloads, and resource configuration (CPU cores, memory, ephemeral disk up to 3 TiB). Supports custom Docker images, Micromamba/Conda environments, integration with Hugging Face/Weights & Biases, and distributed multi-GPU training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, speech, embeddings), GPU-accelerated training and fine-tuning, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, protein folding and computational biology, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
 ### Cheminformatics & Drug Discovery
 - **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines
--- a/scientific-skills/modal/SKILL.md
+++ b/scientific-skills/modal/SKILL.md
@@ -1,7 +1,7 @@
 ---
 name: modal
-description: Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.
+description: Cloud computing platform for running Python on GPUs and serverless infrastructure. Use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud. Use this skill whenever the user mentions Modal, serverless GPU compute, deploying ML models to the cloud, serving inference endpoints, running batch processing in the cloud, or needs to scale Python workloads beyond their local machine. Also use when the user wants to run code on H100s, A100s, or other cloud GPUs, or needs to create a web API for a model.
-license: Apache-2.0 license
+license: Apache-2.0
 metadata:
  skill-author: K-Dense Inc.
 ---
@@ -10,372 +10,391 @@ metadata:
 ## Overview
-Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.
+Modal is a cloud platform for running Python code serverlessly, with a focus on AI/ML workloads. Key capabilities:
 - **GPU compute** on demand (T4, L4, A10, L40S, A100, H100, H200, B200)
 - **Serverless functions** with autoscaling from zero to thousands of containers
 - **Custom container images** built entirely in Python code
 - **Persistent storage** via Volumes for model weights and datasets
 - **Web endpoints** for serving models and APIs
 - **Scheduled jobs** via cron or fixed intervals
 - **Sub-second cold starts** for low-latency inference
-Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.
+Everything in Modal is defined as code — no YAML, no Dockerfiles required (though both are supported).
 ## When to Use This Skill
-Use Modal for:
+Use this skill when:
- Deploying and serving ML models (LLMs, image generation, embedding models)
+- Deploy or serve AI/ML models in the cloud
- Running GPU-accelerated computation (training, inference, rendering)
+- Run GPU-accelerated computations (training, inference, fine-tuning)
- Batch processing large datasets in parallel
+- Create serverless web APIs or endpoints
- Scheduling compute-intensive jobs (daily data processing, model training)
+- Scale batch processing jobs in parallel
- Building serverless APIs that need automatic scaling
+- Schedule recurring tasks (data pipelines, retraining, scraping)
- Scientific computing requiring distributed compute or specialized hardware
+- Need persistent cloud storage for model weights or datasets
 - Want to run code in custom container environments
 - Build job queues or async task processing systems
-## Authentication and Setup
+## Installation and Authentication
-Modal requires authentication via API token.
+### Install
 ### Initial Setup
 ```bash
-# Install Modal
+uv pip install modal
 uv uv pip install modal
 # Authenticate (opens browser for login)
 modal token new
 ```
-This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations.
+### Authenticate
-### Verify Setup
+```bash
 modal setup
 ```
 This opens a browser for authentication. For CI/CD or headless environments, set environment variables:
 ```bash
 export MODAL_TOKEN_ID=<your-token-id>
 export MODAL_TOKEN_SECRET=<your-token-secret>
 ```
 Generate tokens at https://modal.com/settings
 Modal offers a free tier with $30/month in credits.
 **Reference**: See `references/getting-started.md` for detailed setup and first app walkthrough.
 ## Core Concepts
 ### App and Functions
 A Modal `App` groups related functions. Functions decorated with `@app.function()` run remotely in the cloud:
 ```python
 import modal
-app = modal.App("test-app")
+app = modal.App("my-app")
@app.function()
-def hello():
+def square(x):
-    print("Modal is working!")
+    return x ** 2
 ```
 Run with: `modal run script.py`
 ## Core Capabilities
 Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.
 ### 1. Define Container Images
 Specify dependencies and environment for functions using Modal Images.
 ```python
 import modal
 # Basic image with Python packages
 image = (
    modal.Image.debian_slim(python_version="3.12")
    .uv_pip_install("torch", "transformers", "numpy")
 )
 app = modal.App("ml-app", image=image)
 ```
 **Common patterns:**
 - Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
 - Install system packages: `.apt_install("ffmpeg", "git")`
 - Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
 - Add local code: `.add_local_python_source("my_module")`
 See `references/images.md` for comprehensive image building documentation.
 ### 2. Create Functions
 Define functions that run in the cloud with the `@app.function()` decorator.
 ```python
@app.function()
 def process_data(file_path: str):
    import pandas as pd
    df = pd.read_csv(file_path)
    return df.describe()
 ```
 **Call functions:**
 ```python
 # From local entrypoint
@app.local_entrypoint()
 def main():
-    result = process_data.remote("data.csv")
+    # .remote() runs in the cloud
-    print(result)
+    print(square.remote(42))
 ```
-Run with: `modal run script.py`
+Run with `modal run script.py`. Deploy with `modal deploy script.py`.
-See `references/functions.md` for function patterns, deployment, and parameter handling.
+**Reference**: See `references/functions.md` for lifecycle hooks, classes, `.map()`, `.spawn()`, and more.
-### 3. Request GPUs
+### Container Images
-Attach GPUs to functions for accelerated computation.
+Modal builds container images from Python code. The recommended package installer is `uv`:
 ```python
 image = (
    modal.Image.debian_slim(python_version="3.11")
    .uv_pip_install("torch==2.8.0", "transformers", "accelerate")
    .apt_install("git")
 )
@app.function(image=image)
 def inference(prompt):
    from transformers import pipeline
    pipe = pipeline("text-generation", model="meta-llama/Llama-3-8B")
    return pipe(prompt)
 ```
 Key image methods:
 - `.uv_pip_install()` — Install Python packages with uv (recommended)
 - `.pip_install()` — Install with pip (fallback)
 - `.apt_install()` — Install system packages
 - `.run_commands()` — Run shell commands during build
 - `.run_function()` — Run Python during build (e.g., download model weights)
 - `.add_local_python_source()` — Add local modules
 - `.env()` — Set environment variables
 **Reference**: See `references/images.md` for Dockerfiles, micromamba, caching, GPU build steps.
 ### GPU Compute
 Request GPUs via the `gpu` parameter:
 ```python
@app.function(gpu="H100")
 def train_model():
    import torch
-    assert torch.cuda.is_available()
+    device = torch.device("cuda")
-    # GPU-accelerated code here
+    # GPU training code here
 # Multiple GPUs
@app.function(gpu="H100:4")
 def distributed_training():
    ...
 # GPU fallback chain
@app.function(gpu=["H100", "A100-80GB", "A100-40GB"])
 def flexible_inference():
    ...
 ```
-**Available GPU types:**
+Available GPUs: T4, L4, A10, L40S, A100-40GB, A100-80GB, H100, H200, B200, B200+
 - `T4`, `L4` - Cost-effective inference
 - `A10`, `A100`, `A100-80GB` - Standard training/inference
 - `L40S` - Excellent cost/performance balance (48GB)
 - `H100`, `H200` - High-performance training
 - `B200` - Flagship performance (most powerful)
-**Request multiple GPUs:**
+- Up to 8 GPUs per container (except A10: up to 4)
-```python
+- L40S is recommended for inference (cost/performance balance, 48 GB VRAM)
-@app.function(gpu="H100:8")  # 8x H100 GPUs
+- H100/A100 can be auto-upgraded to H200/A100-80GB at no extra cost
-def train_large_model():
+- Use `gpu="H100!"` to prevent auto-upgrade
    pass
 ```
-See `references/gpu.md` for GPU selection guidance, CUDA setup, and multi-GPU configuration.
+**Reference**: See `references/gpu.md` for GPU selection guidance and multi-GPU training.
-### 4. Configure Resources
+### Volumes (Persistent Storage)
-Request CPU cores, memory, and disk for functions.
+Volumes provide distributed, persistent file storage:
 ```python
-@app.function(
+vol = modal.Volume.from_name("model-weights", create_if_missing=True)
-    cpu=8.0,           # 8 physical cores
+
-    memory=32768,      # 32 GiB RAM
+@app.function(volumes={"/data": vol})
-    ephemeral_disk=10240  # 10 GiB disk
+def save_model():
-)
+    # Write to the mounted path
-def memory_intensive_task():
+    with open("/data/model.pt", "wb") as f:
-    pass
+        torch.save(model.state_dict(), f)
@app.function(volumes={"/data": vol})
 def load_model():
    model.load_state_dict(torch.load("/data/model.pt"))
 ```
-Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.
+- Optimized for write-once, read-many workloads (model weights, datasets)
 - CLI access: `modal volume ls`, `modal volume put`, `modal volume get`
 - Background auto-commits every few seconds
-See `references/resources.md` for resource limits and billing details.
+**Reference**: See `references/volumes.md` for v2 volumes, concurrent writes, and best practices.
-### 5. Scale Automatically
+### Secrets
-Modal autoscales functions from zero to thousands of containers based on demand.
+Securely pass credentials to functions:
 ```python
@app.function(secrets=[modal.Secret.from_name("my-api-keys")])
 def call_api():
    import os
    api_key = os.environ["API_KEY"]
    # Use the key
 ```
 Create secrets via CLI: `modal secret create my-api-keys API_KEY=sk-xxx`
 Or from a `.env` file: `modal.Secret.from_dotenv()`
 **Reference**: See `references/secrets.md` for dashboard setup, multiple secrets, and templates.
 ### Web Endpoints
 Serve models and APIs as web endpoints:
 **Process inputs in parallel:**
 ```python
@app.function()
-def analyze_sample(sample_id: int):
+@modal.fastapi_endpoint()
-    # Process single sample
+def predict(text: str):
-    return result
+    return {"result": model.predict(text)}
@app.local_entrypoint()
 def main():
    sample_ids = range(1000)
    # Automatically parallelized across containers
    results = list(analyze_sample.map(sample_ids))
 ```
-**Configure autoscaling:**
+- `modal serve script.py` — Development with hot reload and temporary URL
 - `modal deploy script.py` — Production deployment with permanent URL
 - Supports FastAPI, ASGI (Starlette, FastHTML), WSGI (Flask, Django), WebSockets
 - Request bodies up to 4 GiB, unlimited response size
 **Reference**: See `references/web-endpoints.md` for ASGI/WSGI apps, streaming, auth, and WebSockets.
 ### Scheduled Jobs
 Run functions on a schedule:
 ```python
@app.function(schedule=modal.Cron("0 9 * * *"))  # Daily at 9 AM UTC
 def daily_pipeline():
    # ETL, retraining, scraping, etc.
    ...
@app.function(schedule=modal.Period(hours=6))
 def periodic_check():
    ...
 ```
 Deploy with `modal deploy script.py` to activate the schedule.
 - `modal.Cron("...")` — Standard cron syntax, stable across deploys
 - `modal.Period(hours=N)` — Fixed interval, resets on redeploy
 - Monitor runs in the Modal dashboard
 **Reference**: See `references/scheduled-jobs.md` for cron syntax and management.
 ### Scaling and Concurrency
 Modal autoscales containers automatically. Configure limits:
 ```python
@app.function(
    max_containers=100,    # Upper limit
-    min_containers=2,        # Keep warm
+    min_containers=2,      # Keep warm for low latency
-    buffer_containers=5      # Idle buffer for bursts
+    buffer_containers=5,   # Reserve capacity
    scaledown_window=300,  # Idle seconds before shutdown
 )
-def inference():
+def process(data):
-    pass
+    ...
 ```
-See `references/scaling.md` for autoscaling configuration, concurrency, and scaling limits.
+Process inputs in parallel with `.map()`:
 ### 6. Store Data Persistently
 Use Volumes for persistent storage across function invocations.
 ```python
-volume = modal.Volume.from_name("my-data", create_if_missing=True)
+results = list(process.map([item1, item2, item3, ...]))
@app.function(volumes={"/data": volume})
 def save_results(data):
    with open("/data/results.txt", "w") as f:
        f.write(data)
    volume.commit()  # Persist changes
 ```
-Volumes persist data between runs, store model weights, cache datasets, and share data between functions.
+Enable concurrent request handling per container:
 See `references/volumes.md` for volume management, commits, and caching patterns.
 ### 7. Manage Secrets
 Store API keys and credentials securely using Modal Secrets.
 ```python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
 def download_model():
    import os
    token = os.environ["HF_TOKEN"]
    # Use token for authentication
 ```
 **Create secrets in Modal dashboard or via CLI:**
 ```bash
 modal secret create my-secret KEY=value API_TOKEN=xyz
 ```
 See `references/secrets.md` for secret management and authentication patterns.
 ### 8. Deploy Web Endpoints
 Serve HTTP endpoints, APIs, and webhooks with `@modal.web_endpoint()`.
 ```python
@app.function()
-@modal.web_endpoint(method="POST")
+@modal.concurrent(max_inputs=10)
-def predict(data: dict):
+async def handle_request(req):
-    # Process request
+    ...
    result = model.predict(data["input"])
    return {"prediction": result}
 ```
-**Deploy with:**
+**Reference**: See `references/scaling.md` for `.map()`, `.starmap()`, `.spawn()`, and limits.
 ```bash
 modal deploy script.py
 ```
-Modal provides HTTPS URL for the endpoint.
+### Resource Configuration
 See `references/web-endpoints.md` for FastAPI integration, streaming, authentication, and WebSocket support.
 ### 9. Schedule Jobs
 Run functions on a schedule with cron expressions.
 ```python
-@app.function(schedule=modal.Cron("0 2 * * *"))  # Daily at 2 AM
+@app.function(
-def daily_backup():
+    cpu=4.0,              # Physical cores (not vCPUs)
-    # Backup data
+    memory=16384,         # MiB
-    pass
+    ephemeral_disk=51200, # MiB (up to 3 TiB)
-
+    timeout=3600,         # Seconds
-@app.function(schedule=modal.Period(hours=4))  # Every 4 hours
+)
-def refresh_cache():
+def heavy_computation():
-    # Update cache
+    ...
    pass
 ```
-Scheduled functions run automatically without manual invocation.
+Defaults: 0.125 CPU cores, 128 MiB memory. Billed on max(request, usage).
-See `references/scheduled-jobs.md` for cron syntax, timezone configuration, and monitoring.
+**Reference**: See `references/resources.md` for limits and billing details.
-## Common Workflows
+## Classes with Lifecycle Hooks
-### Deploy ML Model for Inference
+For stateful workloads (e.g., loading a model once and serving many requests):
 ```python
@app.cls(gpu="L40S", image=image)
 class Predictor:
    @modal.enter()
    def load_model(self):
        self.model = load_heavy_model()  # Runs once on container start
    @modal.method()
    def predict(self, text: str):
        return self.model(text)
    @modal.exit()
    def cleanup(self):
        ...  # Runs on container shutdown
 ```
 Call with: `Predictor().predict.remote("hello")`
 ## Common Workflow Patterns
 ### GPU Model Inference Service
 ```python
 import modal
-# Define dependencies
+app = modal.App("llm-service")
 image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
 app = modal.App("llm-inference", image=image)
-# Download model at build time
+image = (
-@app.function()
+    modal.Image.debian_slim(python_version="3.11")
-def download_model():
+    .uv_pip_install("vllm")
-    from transformers import AutoModel
+)
    AutoModel.from_pretrained("bert-base-uncased")
-# Serve model
+@app.cls(gpu="H100", image=image, min_containers=1)
-@app.cls(gpu="L40S")
+class LLMService:
 class Model:
    @modal.enter()
-    def load_model(self):
+    def load(self):
-        from transformers import pipeline
+        from vllm import LLM
-        self.pipe = pipeline("text-classification", device="cuda")
+        self.llm = LLM(model="meta-llama/Llama-3-70B")
    @modal.method()
-    def predict(self, text: str):
+    @modal.fastapi_endpoint(method="POST")
-        return self.pipe(text)
+    def generate(self, prompt: str, max_tokens: int = 256):
-
+        outputs = self.llm.generate([prompt], max_tokens=max_tokens)
-@app.local_entrypoint()
+        return {"text": outputs[0].outputs[0].text}
 def main():
    model = Model()
    result = model.predict.remote("Modal is great!")
    print(result)
 ```
-### Batch Process Large Dataset
+### Batch Processing Pipeline
 ```python
-@app.function(cpu=2.0, memory=4096)
+app = modal.App("batch-pipeline")
-def process_file(file_path: str):
+vol = modal.Volume.from_name("pipeline-data", create_if_missing=True)
@app.function(volumes={"/data": vol}, cpu=4.0, memory=8192)
 def process_chunk(chunk_id: int):
    import pandas as pd
-    df = pd.read_csv(file_path)
+    df = pd.read_parquet(f"/data/input/chunk_{chunk_id}.parquet")
-    # Process data
+    result = heavy_transform(df)
-    return df.shape[0]
+    result.to_parquet(f"/data/output/chunk_{chunk_id}.parquet")
    return len(result)
@app.local_entrypoint()
 def main():
-    files = ["file1.csv", "file2.csv", ...]  # 1000s of files
+    chunk_ids = list(range(100))
-    # Automatically parallelized across containers
+    results = list(process_chunk.map(chunk_ids))
-    for count in process_file.map(files):
+    print(f"Processed {sum(results)} total rows")
        print(f"Processed {count} rows")
 ```
-### Train Model on GPU
+### Scheduled Data Pipeline
 ```python
 app = modal.App("etl-pipeline")
@app.function(
-    gpu="A100:2",      # 2x A100 GPUs
+    schedule=modal.Cron("0 */6 * * *"),  # Every 6 hours
-    timeout=3600       # 1 hour timeout
+    secrets=[modal.Secret.from_name("db-credentials")],
 )
-def train_model(config: dict):
+def etl_job():
-    import torch
+    import os
-    # Multi-GPU training code
+    db_url = os.environ["DATABASE_URL"]
-    model = create_model(config)
+    # Extract, transform, load
-    train(model)
+    ...
    return metrics
 ```
-## Reference Documentation
+## CLI Reference
-Detailed documentation for specific features:
+| Command | Description |
 |---------|-------------|
 | `modal setup` | Authenticate with Modal |
 | `modal run script.py` | Run a script's local entrypoint |
 | `modal serve script.py` | Dev server with hot reload |
 | `modal deploy script.py` | Deploy to production |
 | `modal volume ls <name>` | List files in a volume |
 | `modal volume put <name> <file>` | Upload file to volume |
 | `modal volume get <name> <file>` | Download file from volume |
 | `modal secret create <name> K=V` | Create a secret |
 | `modal secret list` | List secrets |
 | `modal app list` | List deployed apps |
 | `modal app stop <name>` | Stop a deployed app |
- **`references/getting-started.md`** - Authentication, setup, basic concepts
+## Reference Files
 - **`references/images.md`** - Image building, dependencies, Dockerfiles
 - **`references/functions.md`** - Function patterns, deployment, parameters
 - **`references/gpu.md`** - GPU types, CUDA, multi-GPU configuration
 - **`references/resources.md`** - CPU, memory, disk management
 - **`references/scaling.md`** - Autoscaling, parallel execution, concurrency
 - **`references/volumes.md`** - Persistent storage, data management
 - **`references/secrets.md`** - Environment variables, authentication
 - **`references/web-endpoints.md`** - APIs, webhooks, endpoints
 - **`references/scheduled-jobs.md`** - Cron jobs, periodic tasks
 - **`references/examples.md`** - Common patterns for scientific computing
-## Best Practices
+Detailed documentation for each topic:
-1. **Pin dependencies** in `.uv_pip_install()` for reproducible builds
+- `references/getting-started.md` — Installation, authentication, first app
-2. **Use appropriate GPU types** - L40S for inference, H100/A100 for training
+- `references/functions.md` — Functions, classes, lifecycle hooks, remote execution
-3. **Leverage caching** - Use Volumes for model weights and datasets
+- `references/images.md` — Container images, package installation, caching
-4. **Configure autoscaling** - Set `max_containers` and `min_containers` based on workload
+- `references/gpu.md` — GPU types, selection, multi-GPU, training
-5. **Import packages in function body** if not available locally
+- `references/volumes.md` — Persistent storage, file management, v2 volumes
-6. **Use `.map()` for parallel processing** instead of sequential loops
+- `references/secrets.md` — Credentials, environment variables, dotenv
-7. **Store secrets securely** - Never hardcode API keys
+- `references/web-endpoints.md` — FastAPI, ASGI/WSGI, streaming, auth, WebSockets
-8. **Monitor costs** - Check Modal dashboard for usage and billing
+- `references/scheduled-jobs.md` — Cron, periodic schedules, management
-
+- `references/scaling.md` — Autoscaling, concurrency, .map(), limits
-## Troubleshooting
+- `references/resources.md` — CPU, memory, disk, timeout configuration
-
+- `references/examples.md` — Common use cases and patterns
-**"Module not found" errors:**
+- `references/api_reference.md` — Key API classes and methods
 - Add packages to image with `.uv_pip_install("package-name")`
 - Import packages inside function body if not available locally
 **GPU not detected:**
 - Verify GPU specification: `@app.function(gpu="A100")`
 - Check CUDA availability: `torch.cuda.is_available()`
 **Function timeout:**
 - Increase timeout: `@app.function(timeout=3600)`
 - Default timeout is 5 minutes
 **Volume changes not persisting:**
 - Call `volume.commit()` after writing files
 - Verify volume mounted correctly in function decorator
 For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.
 Read these files when detailed information is needed beyond this overview.
--- a/scientific-skills/modal/references/api_reference.md
+++ b/scientific-skills/modal/references/api_reference.md
@@ -1,34 +1,187 @@
-# Reference Documentation for Modal
+# Modal API Reference
-This is a placeholder for detailed reference documentation.
+## Core Classes
 Replace with actual reference content or delete if not needed.
-Example real reference docs from other skills:
+### modal.App
 - product-management/references/communication.md - Comprehensive guide for status updates
 - product-management/references/context_building.md - Deep-dive on gathering context
 - bigquery/references/ - API references and query examples
-## When Reference Docs Are Useful
+The main unit of deployment. Groups related functions.
-Reference docs are ideal for:
+```python
- Comprehensive API documentation
+app = modal.App("my-app")
- Detailed workflow guides
+```
 - Complex multi-step processes
 - Information too lengthy for main SKILL.md
 - Content that's only needed for specific use cases
-## Structure Suggestions
+| Method | Description |
 |--------|-------------|
 | `app.function(**kwargs)` | Decorator to register a function |
 | `app.cls(**kwargs)` | Decorator to register a class |
 | `app.local_entrypoint()` | Decorator for local entry point |
-### API Reference Example
+### modal.Function
 - Overview
 - Authentication
 - Endpoints with examples
 - Error codes
 - Rate limits
-### Workflow Guide Example
+A serverless function backed by an autoscaling container pool.
- Prerequisites
+
- Step-by-step instructions
+| Method | Description |
- Common patterns
+|--------|-------------|
- Troubleshooting
+| `.remote(*args)` | Execute in the cloud (sync) |
- Best practices
+| `.local(*args)` | Execute locally |
 | `.spawn(*args)` | Execute async, returns `FunctionCall` |
 | `.map(inputs)` | Parallel execution over inputs |
 | `.starmap(inputs)` | Parallel execution with multiple args |
 | `.from_name(app, fn)` | Reference a deployed function |
 | `.update_autoscaler(**kwargs)` | Dynamic scaling update |
 ### modal.Cls
 A serverless class with lifecycle hooks.
 ```python
@app.cls(gpu="L40S")
 class MyClass:
    @modal.enter()
    def setup(self): ...
    @modal.method()
    def run(self, data): ...
    @modal.exit()
    def cleanup(self): ...
 ```
 | Decorator | Description |
 |-----------|-------------|
 | `@modal.enter()` | Container startup hook |
 | `@modal.exit()` | Container shutdown hook |
 | `@modal.method()` | Expose as callable method |
 | `@modal.parameter()` | Class-level parameter |
 ## Image
 ### modal.Image
 Defines the container environment.
 | Method | Description |
 |--------|-------------|
 | `.debian_slim(python_version=)` | Debian base image |
 | `.from_registry(tag)` | Docker Hub image |
 | `.from_dockerfile(path)` | Build from Dockerfile |
 | `.micromamba(python_version=)` | Conda/mamba base |
 | `.uv_pip_install(*pkgs)` | Install with uv (recommended) |
 | `.pip_install(*pkgs)` | Install with pip |
 | `.pip_install_from_requirements(path)` | Install from file |
 | `.apt_install(*pkgs)` | Install system packages |
 | `.run_commands(*cmds)` | Run shell commands |
 | `.run_function(fn)` | Run Python during build |
 | `.add_local_dir(local, remote)` | Add directory |
 | `.add_local_file(local, remote)` | Add single file |
 | `.add_local_python_source(module)` | Add Python module |
 | `.env(dict)` | Set environment variables |
 | `.imports()` | Context manager for remote imports |
 ## Storage
 ### modal.Volume
 Distributed persistent file storage.
 ```python
 vol = modal.Volume.from_name("name", create_if_missing=True)
 ```
 | Method | Description |
 |--------|-------------|
 | `.from_name(name)` | Reference or create a volume |
 | `.commit()` | Force immediate commit |
 | `.reload()` | Refresh to see other containers' writes |
 Mount: `@app.function(volumes={"/path": vol})`
 ### modal.NetworkFileSystem
 Legacy shared storage (superseded by Volume).
 ## Secrets
 ### modal.Secret
 Secure credential injection.
 | Method | Description |
 |--------|-------------|
 | `.from_name(name)` | Reference a named secret |
 | `.from_dict(dict)` | Create inline (dev only) |
 | `.from_dotenv()` | Load from .env file |
 Usage: `@app.function(secrets=[modal.Secret.from_name("x")])`
 Access in function: `os.environ["KEY"]`
 ## Scheduling
 ### modal.Cron
 ```python
 schedule = modal.Cron("0 9 * * *")  # Cron syntax
 ```
 ### modal.Period
 ```python
 schedule = modal.Period(hours=6)  # Fixed interval
 ```
 Usage: `@app.function(schedule=modal.Cron("..."))`
 ## Web
 ### Decorators
 | Decorator | Description |
 |-----------|-------------|
 | `@modal.fastapi_endpoint()` | Simple FastAPI endpoint |
 | `@modal.asgi_app()` | Full ASGI app (FastAPI, Starlette) |
 | `@modal.wsgi_app()` | Full WSGI app (Flask, Django) |
 | `@modal.web_server(port=)` | Custom web server |
 ### Function Modifiers
 | Decorator | Description |
 |-----------|-------------|
 | `@modal.concurrent(max_inputs=)` | Handle multiple inputs per container |
 | `@modal.batched(max_batch_size=, wait_ms=)` | Dynamic input batching |
 ## GPU Strings
 | String | GPU |
 |--------|-----|
 | `"T4"` | NVIDIA T4 16GB |
 | `"L4"` | NVIDIA L4 24GB |
 | `"A10"` | NVIDIA A10 24GB |
 | `"L40S"` | NVIDIA L40S 48GB |
 | `"A100-40GB"` | NVIDIA A100 40GB |
 | `"A100-80GB"` | NVIDIA A100 80GB |
 | `"H100"` | NVIDIA H100 80GB |
 | `"H100!"` | H100 (no auto-upgrade) |
 | `"H200"` | NVIDIA H200 141GB |
 | `"B200"` | NVIDIA B200 192GB |
 | `"B200+"` | B200 or B300, B200 price |
 | `"H100:4"` | 4x H100 |
 ## CLI Commands
 | Command | Description |
 |---------|-------------|
 | `modal setup` | Authenticate |
 | `modal run <file>` | Run local entrypoint |
 | `modal serve <file>` | Dev server with hot reload |
 | `modal deploy <file>` | Production deployment |
 | `modal app list` | List deployed apps |
 | `modal app stop <name>` | Stop an app |
 | `modal volume create <name>` | Create volume |
 | `modal volume ls <name>` | List volume files |
 | `modal volume put <name> <file>` | Upload to volume |
 | `modal volume get <name> <file>` | Download from volume |
 | `modal secret create <name> K=V` | Create secret |
 | `modal secret list` | List secrets |
 | `modal secret delete <name>` | Delete secret |
 | `modal token set` | Set auth token |
--- a/scientific-skills/modal/references/examples.md
+++ b/scientific-skills/modal/references/examples.md
@@ -1,433 +1,266 @@
-# Common Patterns for Scientific Computing
+# Modal Common Examples
-## Machine Learning Model Inference
+## LLM Inference Service (vLLM)
 ### Basic Model Serving
 ```python
 import modal
-app = modal.App("ml-inference")
+app = modal.App("vllm-service")
 image = (
-    modal.Image.debian_slim()
+    modal.Image.debian_slim(python_version="3.11")
-    .uv_pip_install("torch", "transformers")
+    .uv_pip_install("vllm>=0.6.0")
 )
-@app.cls(
+@app.cls(gpu="H100", image=image, min_containers=1)
-    image=image,
+class LLMService:
    gpu="L40S",
 )
 class Model:
    @modal.enter()
-    def load_model(self):
+    def load(self):
-        from transformers import AutoModel, AutoTokenizer
+        from vllm import LLM
-        self.model = AutoModel.from_pretrained("bert-base-uncased")
+        self.llm = LLM(model="meta-llama/Llama-3-70B-Instruct")
        self.tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
    @modal.method()
-    def predict(self, text: str):
+    def generate(self, prompt: str, max_tokens: int = 512) -> str:
-        inputs = self.tokenizer(text, return_tensors="pt")
+        from vllm import SamplingParams
-        outputs = self.model(**inputs)
+        params = SamplingParams(max_tokens=max_tokens, temperature=0.7)
-        return outputs.last_hidden_state.mean(dim=1).tolist()
+        outputs = self.llm.generate([prompt], params)
        return outputs[0].outputs[0].text
-@app.local_entrypoint()
+    @modal.fastapi_endpoint(method="POST")
-def main():
+    def api(self, request: dict):
-    model = Model()
+        text = self.generate(request["prompt"], request.get("max_tokens", 512))
-    result = model.predict.remote("Hello world")
+        return {"text": text}
    print(result)
 ```
-### Model Serving with Volume
+## Image Generation (Flux)
 ```python
-volume = modal.Volume.from_name("models", create_if_missing=True)
+import modal
 MODEL_PATH = "/models"
-@app.cls(
+app = modal.App("image-gen")
-    image=image,
+
-    gpu="A100",
+image = (
-    volumes={MODEL_PATH: volume}
+    modal.Image.debian_slim(python_version="3.11")
    .uv_pip_install("diffusers", "torch", "transformers", "accelerate")
 )
-class ModelServer:
+
 vol = modal.Volume.from_name("flux-weights", create_if_missing=True)
@app.cls(gpu="L40S", image=image, volumes={"/models": vol})
 class ImageGenerator:
    @modal.enter()
    def load(self):
        import torch
-        self.model = torch.load(f"{MODEL_PATH}/model.pt")
+        from diffusers import FluxPipeline
-        self.model.eval()
+        self.pipe = FluxPipeline.from_pretrained(
            "black-forest-labs/FLUX.1-schnell",
            torch_dtype=torch.bfloat16,
            cache_dir="/models",
        ).to("cuda")
    @modal.method()
-    def infer(self, data):
+    def generate(self, prompt: str) -> bytes:
-        import torch
+        image = self.pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
-        with torch.no_grad():
+        import io
-            return self.model(torch.tensor(data)).tolist()
+        buf = io.BytesIO()
        image.save(buf, format="PNG")
        return buf.getvalue()
 ```
-## Batch Processing
+## Speech Transcription (Whisper)
 ### Parallel Data Processing
 ```python
-@app.function(
+import modal
-    image=modal.Image.debian_slim().uv_pip_install("pandas", "numpy"),
+
-    cpu=2.0,
+app = modal.App("transcription")
-    memory=8192
+
 image = (
    modal.Image.debian_slim(python_version="3.11")
    .apt_install("ffmpeg")
    .uv_pip_install("openai-whisper", "torch")
 )
 def process_batch(batch_id: int):
    import pandas as pd
-    # Load batch
+@app.cls(gpu="T4", image=image)
-    df = pd.read_csv(f"s3://bucket/batch_{batch_id}.csv")
+class Transcriber:
    # Process
    result = df.apply(lambda row: complex_calculation(row), axis=1)
    # Save result
    result.to_csv(f"s3://bucket/results_{batch_id}.csv")
    return batch_id
@app.local_entrypoint()
 def main():
    # Process 100 batches in parallel
    results = list(process_batch.map(range(100)))
    print(f"Processed {len(results)} batches")
 ```
 ### Batch Processing with Progress
 ```python
@app.function()
 def process_item(item_id: int):
    # Expensive processing
    result = compute_something(item_id)
    return result
@app.local_entrypoint()
 def main():
    items = list(range(1000))
    print(f"Processing {len(items)} items...")
    results = []
    for i, result in enumerate(process_item.map(items)):
        results.append(result)
        if (i + 1) % 100 == 0:
            print(f"Completed {i + 1}/{len(items)}")
    print("All items processed!")
 ```
 ## Data Analysis Pipeline
 ### ETL Pipeline
 ```python
 volume = modal.Volume.from_name("data-pipeline")
 DATA_PATH = "/data"
@app.function(
    image=modal.Image.debian_slim().uv_pip_install("pandas", "polars"),
    volumes={DATA_PATH: volume},
    cpu=4.0,
    memory=16384
 )
 def extract_transform_load():
    import polars as pl
    # Extract
    raw_data = pl.read_csv(f"{DATA_PATH}/raw/*.csv")
    # Transform
    transformed = (
        raw_data
        .filter(pl.col("value") > 0)
        .group_by("category")
        .agg([
            pl.col("value").mean().alias("avg_value"),
            pl.col("value").sum().alias("total_value")
        ])
    )
    # Load
    transformed.write_parquet(f"{DATA_PATH}/processed/data.parquet")
    volume.commit()
    return transformed.shape
@app.function(schedule=modal.Cron("0 2 * * *"))
 def daily_pipeline():
    result = extract_transform_load.remote()
    print(f"Processed data shape: {result}")
 ```
 ## GPU-Accelerated Computing
 ### Distributed Training
 ```python
@app.function(
    gpu="A100:2",
    image=modal.Image.debian_slim().uv_pip_install("torch", "accelerate"),
    timeout=7200,
 )
 def train_model():
    import torch
    from torch.nn.parallel import DataParallel
    # Load data
    train_loader = get_data_loader()
    # Initialize model
    model = MyModel()
    model = DataParallel(model)
    model = model.cuda()
    # Train
    optimizer = torch.optim.Adam(model.parameters())
    for epoch in range(10):
        for batch in train_loader:
            loss = train_step(model, batch, optimizer)
            print(f"Epoch {epoch}, Loss: {loss}")
    return "Training complete"
 ```
 ### GPU Batch Inference
 ```python
@app.function(
    gpu="L40S",
    image=modal.Image.debian_slim().uv_pip_install("torch", "transformers")
 )
 def batch_inference(texts: list[str]):
    from transformers import pipeline
    classifier = pipeline("sentiment-analysis", device=0)
    results = classifier(texts, batch_size=32)
    return results
@app.local_entrypoint()
 def main():
    # Process 10,000 texts
    texts = load_texts()
    # Split into chunks of 100
    chunks = [texts[i:i+100] for i in range(0, len(texts), 100)]
    # Process in parallel on multiple GPUs
    all_results = []
    for results in batch_inference.map(chunks):
        all_results.extend(results)
    print(f"Processed {len(all_results)} texts")
 ```
 ## Scientific Computing
 ### Molecular Dynamics Simulation
 ```python
@app.function(
    image=modal.Image.debian_slim().apt_install("openmpi-bin").uv_pip_install("mpi4py", "numpy"),
    cpu=16.0,
    memory=65536,
    timeout=7200,
 )
 def run_simulation(config: dict):
    import numpy as np
    # Initialize system
    positions = initialize_positions(config["n_particles"])
    velocities = initialize_velocities(config["temperature"])
    # Run MD steps
    for step in range(config["n_steps"]):
        forces = compute_forces(positions)
        velocities += forces * config["dt"]
        positions += velocities * config["dt"]
        if step % 1000 == 0:
            energy = compute_energy(positions, velocities)
            print(f"Step {step}, Energy: {energy}")
    return positions, velocities
 ```
 ### Distributed Monte Carlo
 ```python
@app.function(cpu=2.0)
 def monte_carlo_trial(trial_id: int, n_samples: int):
    import random
    count = sum(1 for _ in range(n_samples)
                if random.random()**2 + random.random()**2 <= 1)
    return count
@app.local_entrypoint()
 def estimate_pi():
    n_trials = 100
    n_samples_per_trial = 1_000_000
    # Run trials in parallel
    results = list(monte_carlo_trial.map(
        range(n_trials),
        [n_samples_per_trial] * n_trials
    ))
    total_count = sum(results)
    total_samples = n_trials * n_samples_per_trial
    pi_estimate = 4 * total_count / total_samples
    print(f"Estimated π = {pi_estimate}")
 ```
 ## Data Processing with Volumes
 ### Image Processing Pipeline
 ```python
 volume = modal.Volume.from_name("images")
 IMAGE_PATH = "/images"
@app.function(
    image=modal.Image.debian_slim().uv_pip_install("Pillow", "numpy"),
    volumes={IMAGE_PATH: volume}
 )
 def process_image(filename: str):
    from PIL import Image
    import numpy as np
    # Load image
    img = Image.open(f"{IMAGE_PATH}/raw/{filename}")
    # Process
    img_array = np.array(img)
    processed = apply_filters(img_array)
    # Save
    result_img = Image.fromarray(processed)
    result_img.save(f"{IMAGE_PATH}/processed/{filename}")
    return filename
@app.function(volumes={IMAGE_PATH: volume})
 def process_all_images():
    import os
    # Get all images
    filenames = os.listdir(f"{IMAGE_PATH}/raw")
    # Process in parallel
    results = list(process_image.map(filenames))
    volume.commit()
    return f"Processed {len(results)} images"
 ```
 ## Web API for Scientific Computing
 ```python
 image = modal.Image.debian_slim().uv_pip_install("fastapi[standard]", "numpy", "scipy")
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
 def compute_statistics(data: dict):
    import numpy as np
    from scipy import stats
    values = np.array(data["values"])
    return {
        "mean": float(np.mean(values)),
        "median": float(np.median(values)),
        "std": float(np.std(values)),
        "skewness": float(stats.skew(values)),
        "kurtosis": float(stats.kurtosis(values))
    }
 ```
 ## Scheduled Data Collection
 ```python
@app.function(
    schedule=modal.Cron("*/30 * * * *"),  # Every 30 minutes
    secrets=[modal.Secret.from_name("api-keys")],
    volumes={"/data": modal.Volume.from_name("sensor-data")}
 )
 def collect_sensor_data():
    import requests
    import json
    from datetime import datetime
    # Fetch from API
    response = requests.get(
        "https://api.example.com/sensors",
        headers={"Authorization": f"Bearer {os.environ['API_KEY']}"}
    )
    data = response.json()
    # Save with timestamp
    timestamp = datetime.now().isoformat()
    with open(f"/data/{timestamp}.json", "w") as f:
        json.dump(data, f)
    volume.commit()
    return f"Collected {len(data)} sensor readings"
 ```
 ## Best Practices
 ### Use Classes for Stateful Workloads
 ```python
@app.cls(gpu="A100")
 class ModelService:
    @modal.enter()
-    def setup(self):
+    def load(self):
-        # Load once, reuse across requests
+        import whisper
-        self.model = load_heavy_model()
+        self.model = whisper.load_model("large-v3")
    @modal.method()
-    def predict(self, x):
+    def transcribe(self, audio_path: str) -> dict:
-        return self.model(x)
+        return self.model.transcribe(audio_path)
 ```
-### Batch Similar Workloads
+## Batch Data Processing
 ```python
-@app.function()
+import modal
-def process_many(items: list):
+
-    # More efficient than processing one at a time
+app = modal.App("batch-processor")
-    return [process(item) for item in items]
+
 image = modal.Image.debian_slim().uv_pip_install("pandas", "pyarrow")
 vol = modal.Volume.from_name("batch-data", create_if_missing=True)
@app.function(image=image, volumes={"/data": vol}, cpu=4.0, memory=8192)
 def process_chunk(chunk_id: int) -> dict:
    import pandas as pd
    df = pd.read_parquet(f"/data/input/chunk_{chunk_id:04d}.parquet")
    result = df.groupby("category").agg({"value": ["sum", "mean", "count"]})
    result.to_parquet(f"/data/output/result_{chunk_id:04d}.parquet")
    return {"chunk_id": chunk_id, "rows": len(df)}
@app.local_entrypoint()
 def main():
    chunk_ids = list(range(500))
    results = list(process_chunk.map(chunk_ids))
    total = sum(r["rows"] for r in results)
    print(f"Processed {total} total rows across {len(results)} chunks")
 ```
-### Use Volumes for Large Datasets
+## Web Scraping at Scale
 ```python
-# Store large datasets in volumes, not in image
+import modal
 volume = modal.Volume.from_name("dataset")
-@app.function(volumes={"/data": volume})
+app = modal.App("scraper")
-def train():
+
-    data = load_from_volume("/data/training.parquet")
+image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
-    model = train_model(data)
+
@app.function(image=image, retries=3, timeout=60)
 def scrape_url(url: str) -> dict:
    import httpx
    from bs4 import BeautifulSoup
    response = httpx.get(url, follow_redirects=True, timeout=30)
    soup = BeautifulSoup(response.text, "html.parser")
    return {
        "url": url,
        "title": soup.title.string if soup.title else None,
        "text": soup.get_text()[:5000],
    }
@app.local_entrypoint()
 def main():
    urls = ["https://example.com", "https://example.org"]  # Your URL list
    results = list(scrape_url.map(urls))
    for r in results:
        print(f"{r['url']}: {r['title']}")
 ```
-### Profile Before Scaling to GPUs
+## Protein Structure Prediction
 ```python
-# Test on CPU first
+import modal
@app.function(cpu=4.0)
 def test_pipeline():
    ...
-# Then scale to GPU if needed
+app = modal.App("protein-folding")
-@app.function(gpu="A100")
+
-def gpu_pipeline():
+image = (
-    ...
+    modal.Image.debian_slim(python_version="3.11")
    .uv_pip_install("chai-lab")
 )
 vol = modal.Volume.from_name("protein-data", create_if_missing=True)
@app.function(gpu="A100-80GB", image=image, volumes={"/data": vol}, timeout=3600)
 def fold_protein(sequence: str) -> str:
    from chai_lab.chai1 import run_inference
    output = run_inference(
        fasta_file=write_fasta(sequence, "/data/input.fasta"),
        output_dir="/data/output/",
    )
    return str(output)
 ```
 ## Scheduled ETL Pipeline
 ```python
 import modal
 app = modal.App("etl")
 image = modal.Image.debian_slim().uv_pip_install("pandas", "sqlalchemy", "psycopg2-binary")
@app.function(
    image=image,
    schedule=modal.Cron("0 3 * * *"),  # 3 AM UTC daily
    secrets=[modal.Secret.from_name("database-creds")],
    timeout=7200,
 )
 def daily_etl():
    import os
    import pandas as pd
    from sqlalchemy import create_engine
    source = create_engine(os.environ["SOURCE_DB"])
    dest = create_engine(os.environ["DEST_DB"])
    df = pd.read_sql("SELECT * FROM events WHERE date = CURRENT_DATE - 1", source)
    df = transform(df)
    df.to_sql("daily_summary", dest, if_exists="append", index=False)
    print(f"Loaded {len(df)} rows")
 ```
 ## FastAPI with GPU Model
 ```python
 import modal
 app = modal.App("api-with-gpu")
 image = (
    modal.Image.debian_slim(python_version="3.11")
    .uv_pip_install("fastapi", "sentence-transformers", "torch")
 )
@app.cls(gpu="L40S", image=image, min_containers=1)
 class EmbeddingService:
    @modal.enter()
    def load(self):
        from sentence_transformers import SentenceTransformer
        self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
    @modal.asgi_app()
    def serve(self):
        from fastapi import FastAPI
        api = FastAPI()
        @api.post("/embed")
        async def embed(request: dict):
            embeddings = self.model.encode(request["texts"])
            return {"embeddings": embeddings.tolist()}
        @api.get("/health")
        async def health():
            return {"status": "ok"}
        return api
 ```
 ## Document OCR Job Queue
 ```python
 import modal
 app = modal.App("ocr-queue")
 image = modal.Image.debian_slim().uv_pip_install("pytesseract", "Pillow").apt_install("tesseract-ocr")
 vol = modal.Volume.from_name("ocr-data", create_if_missing=True)
@app.function(image=image, volumes={"/data": vol})
 def ocr_page(image_path: str) -> str:
    import pytesseract
    from PIL import Image
    img = Image.open(image_path)
    return pytesseract.image_to_string(img)
@app.function(volumes={"/data": vol})
 def process_document(doc_id: str):
    import os
    pages = sorted(os.listdir(f"/data/docs/{doc_id}/"))
    paths = [f"/data/docs/{doc_id}/{p}" for p in pages]
    texts = list(ocr_page.map(paths))
    full_text = "\n\n".join(texts)
    with open(f"/data/results/{doc_id}.txt", "w") as f:
        f.write(full_text)
    return {"doc_id": doc_id, "pages": len(texts)}
 ```
--- a/scientific-skills/modal/references/functions.md
+++ b/scientific-skills/modal/references/functions.md
@@ -1,274 +1,260 @@
-# Modal Functions
+# Modal Functions and Classes
-## Basic Function Definition
+## Table of Contents
-Decorate Python functions with `@app.function()`:
+- [Functions](#functions)
 - [Remote Execution](#remote-execution)
 - [Classes with Lifecycle Hooks](#classes-with-lifecycle-hooks)
 - [Parallel Execution](#parallel-execution)
 - [Async Functions](#async-functions)
 - [Local Entrypoints](#local-entrypoints)
 - [Generators](#generators)
 ## Functions
 ### Basic Function
 ```python
 import modal
-app = modal.App(name="my-app")
+app = modal.App("my-app")
@app.function()
-def my_function():
+def compute(x: int, y: int) -> int:
-    print("Hello from Modal!")
+    return x + y
    return "result"
 ```
-## Calling Functions
+### Function Parameters
-### Remote Execution
+The `@app.function()` decorator accepts:
-Call `.remote()` to run on Modal:
+| Parameter | Type | Description |
 |-----------|------|-------------|
 | `image` | `Image` | Container image |
 | `gpu` | `str` | GPU type (e.g., `"H100"`, `"A100:2"`) |
 | `cpu` | `float` | CPU cores |
 | `memory` | `int` | Memory in MiB |
 | `timeout` | `int` | Max execution time in seconds |
 | `secrets` | `list[Secret]` | Secrets to inject |
 | `volumes` | `dict[str, Volume]` | Volumes to mount |
 | `schedule` | `Schedule` | Cron or periodic schedule |
 | `max_containers` | `int` | Max container count |
 | `min_containers` | `int` | Minimum warm containers |
 | `retries` | `int` | Retry count on failure |
 | `concurrency_limit` | `int` | Max concurrent inputs |
 | `ephemeral_disk` | `int` | Disk in MiB |
 ## Remote Execution
 ### `.remote()` — Synchronous Call
 ```python
-@app.local_entrypoint()
+result = compute.remote(3, 4)  # Runs in the cloud, blocks until done
 def main():
    result = my_function.remote()
    print(result)
 ```
-### Local Execution
+### `.local()` — Local Execution
 Call `.local()` to run locally (useful for testing):
 ```python
-result = my_function.local()
+result = compute.local(3, 4)  # Runs locally (for testing)
 ```
-## Function Parameters
+### `.spawn()` — Async Fire-and-Forget
 Functions accept standard Python arguments:
 ```python
-@app.function()
+call = compute.spawn(3, 4)  # Returns immediately
-def process(x: int, y: str):
+# ... do other work ...
-    return f"{y}: {x * 2}"
+result = call.get()  # Retrieve result later
@app.local_entrypoint()
 def main():
    result = process.remote(42, "answer")
 ```
-## Deployment
+`.spawn()` supports up to 1 million pending inputs.
-### Ephemeral Apps
+## Classes with Lifecycle Hooks
-Run temporarily:
+Use `@app.cls()` for stateful workloads where you want to load resources once:
 ```bash
 modal run script.py
 ```
 ### Deployed Apps
 Deploy persistently:
 ```bash
 modal deploy script.py
 ```
 Access deployed functions from other code:
 ```python
-f = modal.Function.from_name("my-app", "my_function")
+@app.cls(gpu="L40S", image=image)
-result = f.remote(args)
+class Model:
    @modal.enter()
    def setup(self):
        """Runs once when the container starts."""
        import torch
        self.model = torch.load("/weights/model.pt")
        self.model.eval()
    @modal.method()
    def predict(self, text: str) -> dict:
        """Callable remotely."""
        return self.model(text)
    @modal.exit()
    def teardown(self):
        """Runs when the container shuts down."""
        cleanup_resources()
 ```
-## Entrypoints
+### Lifecycle Decorators
-### Local Entrypoint
+| Decorator | When It Runs |
 |-----------|-------------|
 | `@modal.enter()` | Once on container startup, before any inputs |
 | `@modal.method()` | For each remote call |
 | `@modal.exit()` | On container shutdown |
-Code that runs on local machine:
+### Calling Class Methods
 ```python
-@app.local_entrypoint()
+# Create instance and call method
-def main():
+model = Model()
-    result = my_function.remote()
+result = model.predict.remote("Hello world")
-    print(result)
+
 # Parallel calls
 results = list(model.predict.map(["text1", "text2", "text3"]))
 ```
-### Remote Entrypoint
+### Parameterized Classes
 Use `@app.function()` without local_entrypoint - runs entirely on Modal:
 ```python
-@app.function()
+@app.cls()
-def train_model():
+class Worker:
-    # All code runs in Modal
+    model_name: str = modal.parameter()
    ...
 ```
-Invoke with:
+    @modal.enter()
-```bash
+    def load(self):
-modal run script.py::app.train_model
+        self.model = load_model(self.model_name)
 ```
-## Argument Parsing
+    @modal.method()
    def run(self, data):
        return self.model(data)
-Entrypoints with primitive type arguments get automatic CLI parsing:
+# Different model instances autoscale independently
-
+gpt = Worker(model_name="gpt-4")
-```python
+llama = Worker(model_name="llama-3")
@app.local_entrypoint()
 def main(foo: int, bar: str):
    some_function.remote(foo, bar)
 ```
 Run with:
 ```bash
 modal run script.py --foo 1 --bar "hello"
 ```
 For custom parsing, accept variable-length arguments:
 ```python
 import argparse
@app.function()
 def train(*arglist):
    parser = argparse.ArgumentParser()
    parser.add_argument("--foo", type=int)
    args = parser.parse_args(args=arglist)
 ```
 ## Function Configuration
 Common parameters:
 ```python
@app.function(
    image=my_image,           # Custom environment
    gpu="A100",               # GPU type
    cpu=2.0,                  # CPU cores
    memory=4096,              # Memory in MB
    timeout=3600,             # Timeout in seconds
    retries=3,                # Number of retries
    secrets=[my_secret],      # Environment secrets
    volumes={"/data": vol},   # Persistent storage
 )
 def my_function():
    ...
 ```
 ## Parallel Execution
-### Map
+### `.map()` — Parallel Processing
-Run function on multiple inputs in parallel:
+Process multiple inputs across containers:
 ```python
@app.function()
-def evaluate_model(x):
+def process(item):
-    return x ** 2
+    return heavy_computation(item)
@app.local_entrypoint()
 def main():
-    inputs = list(range(100))
+    items = list(range(1000))
-    for result in evaluate_model.map(inputs):
+    results = list(process.map(items))
-        print(result)
+    print(f"Processed {len(results)} items")
 ```
-### Starmap
+- Results are returned in the same order as inputs
 - Modal autoscales containers to handle the workload
 - Use `return_exceptions=True` to collect errors instead of raising
-For functions with multiple arguments:
+### `.starmap()` — Multi-Argument Parallel
 ```python
@app.function()
-def add(a, b):
+def add(x, y):
-    return a + b
+    return x + y
-@app.local_entrypoint()
+results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
-def main():
+# [3, 7, 11]
    results = list(add.starmap([(1, 2), (3, 4)]))
    # [3, 7]
 ```
-### Exception Handling
+### `.map()` with `order_outputs=False`
 For faster throughput when order doesn't matter:
 ```python
-results = my_func.map(
+for result in process.map(items, order_outputs=False):
-    range(3),
+    handle(result)  # Results arrive as they complete
    return_exceptions=True,
    wrap_returned_exceptions=False
 )
 # [0, 1, Exception('error')]
 ```
 ## Async Functions
-Define async functions:
+Modal supports async/await natively:
 ```python
@app.function()
-async def async_function(x: int):
+async def fetch_data(url: str) -> str:
-    await asyncio.sleep(1)
+    import httpx
-    return x * 2
+    async with httpx.AsyncClient() as client:
-
+        response = await client.get(url)
-@app.local_entrypoint()
+        return response.text
 async def main():
    result = await async_function.remote.aio(42)
 ```
-## Generator Functions
+Async functions are especially useful with `@modal.concurrent()` for handling multiple requests per container.
-Return iterators for streaming results:
+## Local Entrypoints
 The `@app.local_entrypoint()` runs on your machine and orchestrates remote calls:
 ```python
@app.local_entrypoint()
 def main():
    # This code runs locally
    data = load_local_data()
    # These calls run in the cloud
    results = list(process.map(data))
    # Back to local
    save_results(results)
 ```
 You can also define multiple entrypoints and select by function name:
 ```bash
 modal run script.py::train
 modal run script.py::evaluate
 ```
 ## Generators
 Functions can yield results as they're produced:
 ```python
@app.function()
 def generate_data():
-    for i in range(10):
+    for i in range(100):
-        yield i
+        yield process(i)
@app.local_entrypoint()
 def main():
-    for value in generate_data.remote_gen():
+    for result in generate_data.remote_gen():
-        print(value)
+        print(result)
 ```
-## Spawning Functions
+## Retries
-Submit functions for background execution:
+Configure automatic retries on failure:
 ```python
-@app.function()
+@app.function(retries=3)
-def process_job(data):
+def flaky_operation():
-    # Long-running job
+    ...
    return result
@app.local_entrypoint()
 def main():
    # Spawn without waiting
    call = process_job.spawn(data)
    # Get result later
    result = call.get(timeout=60)
 ```
-## Programmatic Execution
+For more control, use `modal.Retries`:
 Run apps programmatically:
 ```python
-def main():
+@app.function(retries=modal.Retries(max_retries=3, backoff_coefficient=2.0))
-    with modal.enable_output():
+def api_call():
-        with app.run():
+    ...
            result = some_function.remote()
 ```
-## Specifying Entrypoint
+## Timeouts
-With multiple functions, specify which to run:
+Set maximum execution time:
 ```python
-@app.function()
+@app.function(timeout=3600)  # 1 hour
-def f():
+def long_training():
-    print("Function f")
+    ...
@app.function()
 def g():
    print("Function g")
 ```
-Run specific function:
+Default timeout is 300 seconds (5 minutes). Maximum is 86400 seconds (24 hours).
 ```bash
 modal run script.py::app.f
 modal run script.py::app.g
 ```
--- a/scientific-skills/modal/references/getting-started.md
+++ b/scientific-skills/modal/references/getting-started.md
@@ -1,92 +1,175 @@
-# Getting Started with Modal
+# Modal Getting Started Guide
-## Sign Up
+## Installation
-Sign up for free at https://modal.com and get $30/month of credits.
+Install Modal using uv (recommended) or pip:
 ```bash
 # Recommended
 uv pip install modal
 # Alternative
 pip install modal
 ```
 ## Authentication
-Set up authentication using the Modal CLI:
+### Interactive Setup
 ```bash
-modal token new
+modal setup
 ```
-This creates credentials in `~/.modal.toml`. Alternatively, set environment variables:
+This opens a browser for authentication and stores credentials locally.
 - `MODAL_TOKEN_ID`
 - `MODAL_TOKEN_SECRET`
-## Basic Concepts
+### Headless / CI/CD Setup
-### Modal is Serverless
+For environments without a browser, use token-based authentication:
-Modal is a serverless platform - only pay for resources used and spin up containers on demand in seconds.
+1. Generate tokens at https://modal.com/settings
 2. Set environment variables:
-### Core Components
+```bash
 export MODAL_TOKEN_ID=<your-token-id>
 export MODAL_TOKEN_SECRET=<your-token-secret>
 ```
-**App**: Represents an application running on Modal, grouping one or more Functions for atomic deployment.
+Or use the CLI:
-**Function**: Acts as an independent unit that scales up and down independently. No containers run (and no charges) when there are no live inputs.
+```bash
 modal token set --token-id <id> --token-secret <secret>
 ```
-**Image**: The environment code runs in - a container snapshot with dependencies installed.
+### Free Tier
-## First Modal App
+Modal provides $30/month in free credits. No credit card required for the free tier.
-Create a file `hello_modal.py`:
+## Your First App
 ### Hello World
 Create a file `hello.py`:
 ```python
 import modal
-app = modal.App(name="hello-modal")
+app = modal.App("hello-world")
@app.function()
-def hello():
+def greet(name: str) -> str:
-    print("Hello from Modal!")
+    return f"Hello, {name}! This ran in the cloud."
    return "success"
@app.local_entrypoint()
 def main():
-    hello.remote()
+    result = greet.remote("World")
    print(result)
 ```
-Run with:
+Run it:
 ```bash
-modal run hello_modal.py
+modal run hello.py
 ```
-## Running Apps
+What happens:
 1. Modal packages your code
 2. Creates a container in the cloud
 3. Executes `greet()` remotely
 4. Returns the result to your local machine
-### Ephemeral Apps (Development)
+### Understanding the Flow
-Run temporarily with `modal run`:
+- `modal.App("name")` — Creates a named application
-```bash
+- `@app.function()` — Marks a function for remote execution
-modal run script.py
+- `@app.local_entrypoint()` — Defines the local entry point (runs on your machine)
 - `.remote()` — Calls the function in the cloud
 - `.local()` — Calls the function locally (for testing)
 ### Running Modes
 | Command | Description |
 |---------|-------------|
 | `modal run script.py` | Run the `@app.local_entrypoint()` function |
 | `modal serve script.py` | Start a dev server with hot reload (for web endpoints) |
 | `modal deploy script.py` | Deploy to production (persistent) |
 ### A Simple Web Scraper
 ```python
 import modal
 app = modal.App("web-scraper")
 image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
@app.function(image=image)
 def scrape(url: str) -> str:
    import httpx
    from bs4 import BeautifulSoup
    response = httpx.get(url)
    soup = BeautifulSoup(response.text, "html.parser")
    return soup.get_text()[:1000]
@app.local_entrypoint()
 def main():
    result = scrape.remote("https://example.com")
    print(result)
 ```
-The app stops when the script exits. Use `--detach` to keep running after client exits.
+### GPU-Accelerated Inference
-### Deployed Apps (Production)
+```python
 import modal
-Deploy persistently with `modal deploy`:
+app = modal.App("gpu-inference")
-```bash
+
-modal deploy script.py
+image = (
    modal.Image.debian_slim(python_version="3.11")
    .uv_pip_install("torch", "transformers", "accelerate")
 )
@app.function(gpu="L40S", image=image)
 def generate(prompt: str) -> str:
    from transformers import pipeline
    pipe = pipeline("text-generation", model="gpt2", device="cuda")
    result = pipe(prompt, max_length=100)
    return result[0]["generated_text"]
@app.local_entrypoint()
 def main():
    print(generate.remote("The future of AI is"))
 ```
-View deployed apps at https://modal.com/apps or with:
+## Project Structure
-```bash
+
-modal app list
+Modal apps are typically single Python files, but can be organized into modules:
 ```
 my-project/
 ├── app.py           # Main app with @app.local_entrypoint()
 ├── inference.py     # Inference functions
 ├── training.py      # Training functions
 └── common.py        # Shared utilities
 ```
-Stop deployed apps:
+Use `modal.Image.add_local_python_source()` to include local modules in the container image.
 ```bash
 modal app stop app-name
 ```
-## Key Features
+## Key Concepts Summary
- **Fast prototyping**: Write Python, run on GPUs in seconds
+| Concept | What It Does |
- **Serverless APIs**: Create web endpoints with a decorator
+|---------|-------------|
- **Scheduled jobs**: Run cron jobs in the cloud
+| `App` | Groups related functions into a deployable unit |
- **GPU inference**: Access T4, L4, A10, A100, H100, H200, B200 GPUs
+| `Function` | A serverless function backed by autoscaling containers |
- **Distributed volumes**: Persistent storage for ML models
+| `Image` | Defines the container environment (packages, files) |
- **Sandboxes**: Secure containers for untrusted code
+| `Volume` | Persistent distributed file storage |
 | `Secret` | Secure credential injection |
 | `Schedule` | Cron or periodic job scheduling |
 | `gpu` | GPU type/count for the function |
 ## Next Steps
 - See `functions.md` for advanced function patterns
 - See `images.md` for custom container environments
 - See `gpu.md` for GPU selection and configuration
 - See `web-endpoints.md` for serving APIs
--- a/scientific-skills/modal/references/gpu.md
+++ b/scientific-skills/modal/references/gpu.md
@@ -1,168 +1,174 @@
-# GPU Acceleration on Modal
+# Modal GPU Compute
-## Quick Start
+## Table of Contents
-Run functions on GPUs with the `gpu` parameter:
+- [Available GPUs](#available-gpus)
 - [Requesting GPUs](#requesting-gpus)
 - [GPU Selection Guide](#gpu-selection-guide)
 - [Multi-GPU](#multi-gpu)
 - [GPU Fallback Chains](#gpu-fallback-chains)
 - [Auto-Upgrades](#auto-upgrades)
 - [Multi-GPU Training](#multi-gpu-training)
-```python
+## Available GPUs
 import modal
-image = modal.Image.debian_slim().pip_install("torch")
+| GPU | VRAM | Max per Container | Best For |
-app = modal.App(image=image)
+|-----|------|-------------------|----------|
 | T4 | 16 GB | 8 | Budget inference, small models |
 | L4 | 24 GB | 8 | Inference, video processing |
 | A10 | 24 GB | 4 | Inference, fine-tuning small models |
 | L40S | 48 GB | 8 | Inference (best cost/perf), medium models |
 | A100-40GB | 40 GB | 8 | Training, large model inference |
 | A100-80GB | 80 GB | 8 | Training, large models |
 | RTX-PRO-6000 | 48 GB | 8 | Rendering, inference |
 | H100 | 80 GB | 8 | Large-scale training, fast inference |
 | H200 | 141 GB | 8 | Very large models, training |
 | B200 | 192 GB | 8 | Largest models, maximum throughput |
 | B200+ | 192 GB | 8 | B200 or B300, B200 pricing |
-@app.function(gpu="A100")
+## Requesting GPUs
 def run():
    import torch
    assert torch.cuda.is_available()
 ```
-## Available GPU Types
+### Basic Request
 Modal supports the following GPUs:
 - `T4` - Entry-level GPU
 - `L4` - Balanced performance and cost
 - `A10` - Up to 4 GPUs, 96 GB total
 - `A100` - 40GB or 80GB variants
 - `A100-40GB` - Specific 40GB variant
 - `A100-80GB` - Specific 80GB variant
 - `L40S` - 48 GB, excellent for inference
 - `H100` / `H100!` - Top-tier Hopper architecture
 - `H200` - Improved Hopper with more memory
 - `B200` - Latest Blackwell architecture
 See https://modal.com/pricing for pricing.
 ## GPU Count
 Request multiple GPUs per container with `:n` syntax:
 ```python
@app.function(gpu="H100:8")
 def run_llama_405b():
    # 8 H100 GPUs available
    ...
 ```
 Supported counts:
 - B200, H200, H100, A100, L4, T4, L40S: up to 8 GPUs (up to 1,536 GB)
 - A10: up to 4 GPUs (up to 96 GB)
 Note: Requesting >2 GPUs may result in longer wait times.
 ## GPU Selection Guide
 **For Inference (Recommended)**: Start with L40S
 - Excellent cost/performance
 - 48 GB memory
 - Good for LLaMA, Stable Diffusion, etc.
 **For Training**: Consider H100 or A100
 - High compute throughput
 - Large memory for batch processing
 **For Memory-Bound Tasks**: H200 or A100-80GB
 - More memory capacity
 - Better for large models
 ## B200 GPUs
 NVIDIA's flagship Blackwell chip:
 ```python
@app.function(gpu="B200:8")
 def run_deepseek():
    # Most powerful option
    ...
 ```
 ## H200 and H100 GPUs
 Hopper architecture GPUs with excellent software support:
 ```python
@app.function(gpu="H100")
 def train():
-    ...
+    import torch
    assert torch.cuda.is_available()
    print(f"Using: {torch.cuda.get_device_name(0)}")
 ```
-### Automatic H200 Upgrades
+### String Shorthand
 Modal may upgrade `gpu="H100"` to H200 at no extra cost. H200 provides:
 - 141 GB memory (vs 80 GB for H100)
 - 4.8 TB/s bandwidth (vs 3.35 TB/s)
 To avoid automatic upgrades (e.g., for benchmarking):
 ```python
@app.function(gpu="H100!")
 def benchmark():
    ...
 ```
 ## A100 GPUs
 Ampere architecture with 40GB or 80GB variants:
 ```python
-# May be automatically upgraded to 80GB
+gpu="T4"           # Single T4
-@app.function(gpu="A100")
+gpu="A100-80GB"    # Single A100 80GB
-def qwen_7b():
+gpu="H100:4"       # Four H100s
    ...
 # Specific variants
@app.function(gpu="A100-40GB")
 def model_40gb():
    ...
@app.function(gpu="A100-80GB")
 def llama_70b():
    ...
 ```
-## GPU Fallbacks
+### GPU Object (Advanced)
 Specify multiple GPU types with fallback:
 ```python
-@app.function(gpu=["H100", "A100-40GB:2"])
+@app.function(gpu=modal.gpu.H100(count=2))
-def run_on_80gb():
+def multi_gpu():
    # Tries H100 first, falls back to 2x A100-40GB
    ...
 ```
-Modal respects ordering and allocates most preferred available GPU.
+## GPU Selection Guide
 ### For Inference
 | Model Size | Recommended GPU | Why |
 |-----------|----------------|-----|
 | < 7B params | T4, L4 | Cost-effective, sufficient VRAM |
 | 7B-13B params | L40S | Best cost/performance, 48 GB VRAM |
 | 13B-70B params | A100-80GB, H100 | Large VRAM, fast memory bandwidth |
 | 70B+ params | H100:2+, H200, B200 | Multi-GPU or very large VRAM |
 ### For Training
 | Task | Recommended GPU |
 |------|----------------|
 | Fine-tuning (LoRA) | L40S, A100-40GB |
 | Full fine-tuning small models | A100-80GB |
 | Full fine-tuning large models | H100:4+, H200 |
 | Pre-training | H100:8, B200:8 |
 ### General Recommendation
 L40S is the best default for inference workloads — it offers an excellent trade-off of cost and performance with 48 GB of GPU RAM.
 ## Multi-GPU
 Request multiple GPUs by appending `:count`:
 ```python
@app.function(gpu="H100:4")
 def distributed():
    import torch
    print(f"GPUs available: {torch.cuda.device_count()}")
    # All 4 GPUs are on the same physical machine
 ```
 - Up to 8 GPUs for most types (up to 4 for A10)
 - All GPUs attach to the same physical machine
 - Requesting more than 2 GPUs may result in longer wait times
 - Maximum VRAM: 8 x B200 = 1,536 GB
 ## GPU Fallback Chains
 Specify a prioritized list of GPU types:
 ```python
@app.function(gpu=["H100", "A100-80GB", "L40S"])
 def flexible():
    # Modal tries H100 first, then A100-80GB, then L40S
    ...
 ```
 Useful for reducing queue times when a specific GPU isn't available.
 ## Auto-Upgrades
 ### H100 → H200
 Modal may automatically upgrade H100 requests to H200 at no extra cost. To prevent this:
 ```python
@app.function(gpu="H100!")  # Exclamation mark prevents auto-upgrade
 def must_use_h100():
    ...
 ```
 ### A100 → A100-80GB
 A100-40GB requests may be upgraded to 80GB at no extra cost.
 ### B200+
 `gpu="B200+"` allows Modal to run on B200 or B300 GPUs at B200 pricing. Requires CUDA 13.0+.
 ## Multi-GPU Training
-Modal supports multi-GPU training on a single node. Multi-node training is in closed beta.
+Modal supports multi-GPU training on a single node. Multi-node training is in private beta.
-### PyTorch Example
+### PyTorch DDP Example
 For frameworks that re-execute entrypoints, use subprocess or specific strategies:
 ```python
-@app.function(gpu="A100:2")
+@app.function(gpu="H100:4", image=image, timeout=86400)
-def train():
+def train_distributed():
-    import subprocess
+    import torch
-    import sys
+    import torch.distributed as dist
-    subprocess.run(
+
-        ["python", "train.py"],
+    dist.init_process_group(backend="nccl")
-        stdout=sys.stdout,
+    local_rank = int(os.environ.get("LOCAL_RANK", 0))
-        stderr=sys.stderr,
+    device = torch.device(f"cuda:{local_rank}")
-        check=True,
+    # ... training loop with DDP ...
    )
 ```
-For PyTorch Lightning, set strategy to `ddp_spawn` or `ddp_notebook`.
+### PyTorch Lightning
-## Performance Considerations
+When using frameworks that re-execute Python entrypoints (like PyTorch Lightning), either:
-**Memory-Bound vs Compute-Bound**:
+1. Set strategy to `ddp_spawn` or `ddp_notebook`
- Running models with small batch sizes is memory-bound
+2. Or run training as a subprocess
 - Newer GPUs have faster arithmetic than memory access
 - Speedup from newer hardware may not justify cost for memory-bound workloads
-**Optimization**:
+```python
- Use batching when possible
+@app.function(gpu="H100:4", image=image)
- Consider L40S before jumping to H100/B200
+def train():
- Profile to identify bottlenecks
+    import subprocess
    subprocess.run(["python", "train_script.py"], check=True)
 ```
 ### Hugging Face Accelerate
 ```python
@app.function(gpu="A100-80GB:4", image=image)
 def finetune():
    import subprocess
    subprocess.run([
        "accelerate", "launch",
        "--num_processes", "4",
        "train.py"
    ], check=True)
 ```
--- a/scientific-skills/modal/references/images.md
+++ b/scientific-skills/modal/references/images.md
@@ -1,261 +1,259 @@
-# Modal Images
+# Modal Container Images
 ## Table of Contents
 - [Overview](#overview)
 - [Base Images](#base-images)
 - [Installing Packages](#installing-packages)
 - [System Packages](#system-packages)
 - [Shell Commands](#shell-commands)
 - [Running Python During Build](#running-python-during-build)
 - [Adding Local Files](#adding-local-files)
 - [Environment Variables](#environment-variables)
 - [Dockerfiles](#dockerfiles)
 - [Alternative Package Managers](#alternative-package-managers)
 - [Image Caching](#image-caching)
 - [Handling Remote-Only Imports](#handling-remote-only-imports)
 ## Overview
-Modal Images define the environment code runs in - containers with dependencies installed. Images are built from method chains starting from a base image.
+Every Modal function runs inside a container built from an `Image`. By default, Modal uses a Debian Linux image with the same Python minor version as your local interpreter.
 Images are built lazily — Modal only builds/pulls the image when a function using it is first invoked. Layers are cached for fast rebuilds.
 ## Base Images
-Start with a base image and chain methods:
+```python
 # Default: Debian slim with your local Python version
 image = modal.Image.debian_slim()
 # Specific Python version
 image = modal.Image.debian_slim(python_version="3.11")
 # From Docker Hub
 image = modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04")
 # From a Dockerfile
 image = modal.Image.from_dockerfile("./Dockerfile")
 ```
 ## Installing Packages
 ### uv (Recommended)
 `uv_pip_install` uses the uv package manager for fast, reliable installs:
 ```python
 image = (
-    modal.Image.debian_slim(python_version="3.13")
+    modal.Image.debian_slim(python_version="3.11")
-    .apt_install("git")
+    .uv_pip_install(
-    .uv_pip_install("torch<3")
+        "torch==2.8.0",
-    .env({"HALT_AND_CATCH_FIRE": "0"})
+        "transformers>=4.40",
-    .run_commands("git clone https://github.com/modal-labs/agi")
+        "accelerate",
-)
+        "scipy",
 ```
 Available base images:
 - `Image.debian_slim()` - Debian Linux with Python
 - `Image.micromamba()` - Base with Micromamba package manager
 - `Image.from_registry()` - Pull from Docker Hub, ECR, etc.
 - `Image.from_dockerfile()` - Build from existing Dockerfile
 ## Installing Python Packages
 ### With uv (Recommended)
 Use `.uv_pip_install()` for fast package installation:
 ```python
 image = (
    modal.Image.debian_slim()
    .uv_pip_install("pandas==2.2.0", "numpy")
 )
 ```
 ### With pip
 Fallback to standard pip if needed:
 ```python
 image = (
    modal.Image.debian_slim(python_version="3.13")
    .pip_install("pandas==2.2.0", "numpy")
 )
 ```
 Pin dependencies tightly (e.g., `"torch==2.8.0"`) for reproducibility.
 ## Installing System Packages
 Install Linux packages with apt:
 ```python
 image = modal.Image.debian_slim().apt_install("git", "curl")
 ```
 ## Setting Environment Variables
 Pass a dictionary to `.env()`:
 ```python
 image = modal.Image.debian_slim().env({"PORT": "6443"})
 ```
 ## Running Shell Commands
 Execute commands during image build:
 ```python
 image = (
    modal.Image.debian_slim()
    .apt_install("git")
    .run_commands("git clone https://github.com/modal-labs/gpu-glossary")
 )
 ```
 ## Running Python Functions at Build Time
 Download model weights or perform setup:
 ```python
 def download_models():
    import diffusers
    model_name = "segmind/small-sd"
    pipe = diffusers.StableDiffusionPipeline.from_pretrained(model_name)
 hf_cache = modal.Volume.from_name("hf-cache")
 image = (
    modal.Image.debian_slim()
    .pip_install("diffusers[torch]", "transformers")
    .run_function(
        download_models,
        secrets=[modal.Secret.from_name("huggingface-secret")],
        volumes={"/root/.cache/huggingface": hf_cache},
    )
 )
 ```
-## Adding Local Files
+Pin versions for reproducibility. uv resolves dependencies faster than pip.
-### Add Files or Directories
+### pip (Fallback)
 ```python
-image = modal.Image.debian_slim().add_local_dir(
+image = modal.Image.debian_slim().pip_install(
-    "/user/erikbern/.aws",
+    "numpy==1.26.0",
-    remote_path="/root/.aws"
+    "pandas==2.1.0",
 )
 ```
-By default, files are added at container startup. Use `copy=True` to include in built image.
+### From requirements.txt
 ### Add Python Source
 Add importable Python modules:
 ```python
-image = modal.Image.debian_slim().add_local_python_source("local_module")
+image = modal.Image.debian_slim().pip_install_from_requirements("requirements.txt")
@app.function(image=image)
 def f():
    import local_module
    local_module.do_stuff()
 ```
-## Using Existing Container Images
+### Private Packages
 ### From Public Registry
 ```python
 sklearn_image = modal.Image.from_registry("huanjason/scikit-learn")
@app.function(image=sklearn_image)
 def fit_knn():
    from sklearn.neighbors import KNeighborsClassifier
    ...
 ```
 Can pull from Docker Hub, Nvidia NGC, AWS ECR, GitHub ghcr.io.
 ### From Private Registry
 Use Modal Secrets for authentication:
 **Docker Hub**:
 ```python
 secret = modal.Secret.from_name("my-docker-secret")
 image = modal.Image.from_registry(
    "private-repo/image:tag",
    secret=secret
 )
 ```
 **AWS ECR**:
 ```python
 aws_secret = modal.Secret.from_name("my-aws-secret")
 image = modal.Image.from_aws_ecr(
    "000000000000.dkr.ecr.us-east-1.amazonaws.com/my-private-registry:latest",
    secret=aws_secret,
 )
 ```
 ### From Dockerfile
 ```python
 image = modal.Image.from_dockerfile("Dockerfile")
@app.function(image=image)
 def fit():
    import sklearn
    ...
 ```
 Can still extend with other image methods after importing.
 ## Using Micromamba
 For coordinated installation of Python and system packages:
 ```python
 numpyro_pymc_image = (
    modal.Image.micromamba()
    .micromamba_install("pymc==5.10.4", "numpyro==0.13.2", channels=["conda-forge"])
 )
 ```
 ## GPU Support at Build Time
 Run build steps on GPU instances:
 ```python
 image = (
    modal.Image.debian_slim()
-    .pip_install("bitsandbytes", gpu="H100")
+    .pip_install_private_repos(
        "github.com/org/private-repo",
        git_user="username",
        secrets=[modal.Secret.from_name("github-token")],
    )
 )
 ```
 ## System Packages
 Install Linux packages via apt:
 ```python
 image = (
    modal.Image.debian_slim()
    .apt_install("ffmpeg", "libsndfile1", "git", "curl")
    .uv_pip_install("librosa", "soundfile")
 )
 ```
 ## Shell Commands
 Run arbitrary commands during image build:
 ```python
 image = (
    modal.Image.debian_slim()
    .run_commands(
        "wget https://example.com/data.tar.gz",
        "tar -xzf data.tar.gz -C /opt/data",
        "rm data.tar.gz",
    )
 )
 ```
 ### With GPU
 Some build steps require GPU access (e.g., compiling CUDA kernels):
 ```python
 image = (
    modal.Image.debian_slim()
    .uv_pip_install("torch")
    .run_commands("python -c 'import torch; torch.cuda.is_available()'", gpu="A100")
 )
 ```
 ## Running Python During Build
 Execute Python functions as build steps — useful for downloading model weights:
 ```python
 def download_model():
    from huggingface_hub import snapshot_download
    snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
 image = (
    modal.Image.debian_slim(python_version="3.11")
    .uv_pip_install("huggingface_hub", "torch", "transformers")
    .run_function(download_model, secrets=[modal.Secret.from_name("huggingface")])
 )
 ```
 The resulting filesystem (including downloaded files) is snapshotted into the image.
 ## Adding Local Files
 ### Local Directories
 ```python
 image = modal.Image.debian_slim().add_local_dir(
    local_path="./config",
    remote_path="/root/config",
 )
 ```
 By default, files are added at container startup (not baked into the image layer). Use `copy=True` to bake them in.
 ### Local Python Modules
 ```python
 image = modal.Image.debian_slim().add_local_python_source("my_module")
 ```
 This uses Python's import system to find and include the module.
 ### Individual Files
 ```python
 image = modal.Image.debian_slim().add_local_file(
    local_path="./model_config.json",
    remote_path="/root/config.json",
 )
 ```
 ## Environment Variables
 ```python
 image = (
    modal.Image.debian_slim()
    .env({
        "TRANSFORMERS_CACHE": "/cache",
        "TOKENIZERS_PARALLELISM": "false",
        "HF_HOME": "/cache/huggingface",
    })
 )
 ```
 Names and values must be strings.
 ## Dockerfiles
 Build from existing Dockerfiles:
 ```python
 image = modal.Image.from_dockerfile("./Dockerfile")
 # With build context
 image = modal.Image.from_dockerfile("./Dockerfile", context_mount=modal.Mount.from_local_dir("."))
 ```
 ## Alternative Package Managers
 ### Micromamba / Conda
 For packages requiring coordinated system and Python package installs:
 ```python
 image = (
    modal.Image.micromamba(python_version="3.11")
    .micromamba_install("cudatoolkit=11.8", "cudnn=8.6", channels=["conda-forge"])
    .uv_pip_install("torch")
 )
 ```
 ## Image Caching
-Images are cached per layer. Breaking cache on one layer causes cascading rebuilds for subsequent layers.
+Modal caches images per layer (per method call). Breaking the cache on one layer cascades to all subsequent layers.
-Define frequently-changing layers last to maximize cache reuse.
+### Optimization Tips
 1. **Order layers by change frequency**: Put stable dependencies first, frequently changing code last
 2. **Pin versions**: Unpinned versions may resolve differently and break cache
 3. **Separate large installs**: Put heavy packages (torch, tensorflow) in early layers
 ### Force Rebuild
 ```python
-image = (
+# Single layer
-    modal.Image.debian_slim()
+image = modal.Image.debian_slim().apt_install("git", force_build=True)
    .apt_install("git")
    .pip_install("slack-sdk", force_build=True)
 )
 ```
 Or set environment variable:
 ```bash
-MODAL_FORCE_BUILD=1 modal run ...
+# All images in a run
 MODAL_FORCE_BUILD=1 modal run script.py
 # Rebuild without updating cache
 MODAL_IGNORE_CACHE=1 modal run script.py
 ```
-## Handling Different Local/Remote Packages
+## Handling Remote-Only Imports
-Import packages only available remotely inside function bodies:
+When packages are only available in the container (not locally), use conditional imports:
 ```python
@app.function(image=image)
-def my_function():
+def process():
-    import pandas as pd  # Only imported remotely
+    import torch  # Only available in the container
-    df = pd.DataFrame()
+    return torch.cuda.device_count()
    ...
 ```
-Or use the imports context manager:
+For module-level imports shared across functions, use the `Image.imports()` context manager:
 ```python
-pandas_image = modal.Image.debian_slim().pip_install("pandas")
+with image.imports():
-
+    import torch
-with pandas_image.imports():
+    import transformers
    import pandas as pd
@app.function(image=pandas_image)
 def my_function():
    df = pd.DataFrame()
 ```
-## Fast Pull from Registry with eStargz
+This prevents `ImportError` locally while making the imports available in the container.
 Improve pull performance with eStargz compression:
 ```bash
 docker buildx build --tag "<registry>/<namespace>/<repo>:<version>" \
  --output type=registry,compression=estargz,force-compression=true,oci-mediatypes=true \
  .
 ```
 Supported registries:
 - AWS ECR
 - Docker Hub
 - Google Artifact Registry
--- a/scientific-skills/modal/references/resources.md
+++ b/scientific-skills/modal/references/resources.md
@@ -1,129 +1,117 @@
-# CPU, Memory, and Disk Resources
+# Modal Resource Configuration
-## Default Resources
+## CPU
-Each Modal container has default reservations:
+### Requesting CPU
 - **CPU**: 0.125 cores
 - **Memory**: 128 MiB
 Containers can exceed minimum if worker has available resources.
 ## CPU Cores
 Request CPU cores as floating-point number:
 ```python
-@app.function(cpu=8.0)
+@app.function(cpu=4.0)
-def my_function():
+def compute():
    # Guaranteed access to at least 8 physical cores
    ...
 ```
-Values correspond to physical cores, not vCPUs.
+- Values are **physical cores**, not vCPUs
-
+- Default: 0.125 cores
-Modal sets multi-threading environment variables based on CPU reservation:
+- Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request
 - `OPENBLAS_NUM_THREADS`
 - `OMP_NUM_THREADS`
 - `MKL_NUM_THREADS`
 ## Memory
 Request memory in megabytes (integer):
 ```python
@app.function(memory=32768)
 def my_function():
    # Guaranteed access to at least 32 GiB RAM
    ...
 ```
 ## Resource Limits
 ### CPU Limits
-Default soft CPU limit: request + 16 cores
+- Default soft limit: 16 physical cores above the CPU request
- Default request: 0.125 cores → default limit: 16.125 cores
+- Set explicit limits to prevent noisy-neighbor effects:
 - Above limit, host throttles CPU usage
 Set explicit CPU limit:
 ```python
-cpu_request = 1.0
+@app.function(cpu=4.0)  # Request 4 cores
-cpu_limit = 4.0
+def bounded_compute():
@app.function(cpu=(cpu_request, cpu_limit))
 def f():
    ...
 ```
 ## Memory
 ### Requesting Memory
 ```python
@app.function(memory=16384)  # 16 GiB in MiB
 def large_data():
    ...
 ```
 - Value in **MiB** (megabytes)
 - Default: 128 MiB
 ### Memory Limits
-Set hard memory limit to OOM kill containers at threshold:
+Set hard memory limits to OOM-kill containers that exceed them:
 ```python
-mem_request = 1024  # MB
+@app.function(memory=8192)  # 8 GiB request and limit
-mem_limit = 2048    # MB
+def bounded_memory():
@app.function(memory=(mem_request, mem_limit))
 def f():
    # Container killed if exceeds 2048 MB
    ...
 ```
-Useful for catching memory leaks early.
+This prevents paying for runaway memory leaks.
-### Disk Limits
+## Ephemeral Disk
-Running containers have access to many GBs of SSD disk, limited by:
+For temporary storage within a container's lifetime:
 1. Underlying worker's SSD capacity
 2. Per-container disk quota (100s of GBs)
 Hitting limits causes `OSError` on disk writes.
 Request larger disk with `ephemeral_disk`:
 ```python
-@app.function(ephemeral_disk=10240)  # 10 GiB
+@app.function(ephemeral_disk=102400)  # 100 GiB in MiB
-def process_large_files():
+def process_dataset():
    # Temporary files at /tmp or anywhere in the container filesystem
    ...
 ```
-Maximum disk size: 3.0 TiB (3,145,728 MiB)
+- Value in **MiB**
-Intended use: dataset processing
+- Default: 512 GiB quota per container
 - Maximum: 3,145,728 MiB (3 TiB)
 - Data is lost when the container shuts down
 - Use Volumes for persistent storage
 Larger disk requests increase the memory request at a 20:1 ratio for billing purposes.
 ## Timeout
 ```python
@app.function(timeout=3600)  # 1 hour in seconds
 def long_running():
    ...
 ```
 - Default: 300 seconds (5 minutes)
 - Maximum: 86,400 seconds (24 hours)
 - Function is killed when timeout expires
 ## Billing
-Charged based on whichever is higher: reservation or actual usage.
+You are charged based on **whichever is higher**: your resource request or actual usage.
-Disk requests increase memory request at 20:1 ratio:
+| Resource | Billing Basis |
- Requesting 500 GiB disk → increases memory request to 25 GiB (if not already higher)
+|----------|--------------|
 | CPU | max(requested, used) |
 | Memory | max(requested, used) |
 | GPU | Time GPU is allocated |
 | Disk | Increases memory billing at 20:1 ratio |
-## Maximum Requests
+### Cost Optimization Tips
-Modal enforces maximums at Function creation time. Requests exceeding maximum will be rejected with `InvalidError`.
+- Request only what you need
 - Use appropriate GPU tiers (L40S over H100 for inference)
 - Set `scaledown_window` to minimize idle time
 - Use `min_containers=0` when cold starts are acceptable
 - Batch inputs with `.map()` instead of individual `.remote()` calls
-Contact support if you need higher limits.
+## Complete Example
 ## Example: Resource Configuration
 ```python
@app.function(
-    cpu=4.0,              # 4 physical cores
+    cpu=8.0,              # 8 physical cores
-    memory=16384,         # 16 GiB RAM
+    memory=32768,         # 32 GiB
-    ephemeral_disk=51200, # 50 GiB disk
+    gpu="L40S",           # L40S GPU
-    timeout=3600,         # 1 hour timeout
+    ephemeral_disk=204800, # 200 GiB temp disk
    timeout=7200,         # 2 hours
    max_containers=50,
    min_containers=1,
 )
-def process_data():
+def full_pipeline(data_path: str):
    # Heavy processing with large files
    ...
 ```
 ## Monitoring Resource Usage
 View resource usage in Modal dashboard:
 - CPU utilization
 - Memory usage
 - Disk usage
 - GPU metrics (if applicable)
 Access via https://modal.com/apps
--- a/scientific-skills/modal/references/scaling.md
+++ b/scientific-skills/modal/references/scaling.md
@@ -1,230 +1,173 @@
-# Scaling Out on Modal
+# Modal Scaling and Concurrency
-## Automatic Autoscaling
+## Table of Contents
-Every Modal Function corresponds to an autoscaling pool of containers. Modal's autoscaler:
+- [Autoscaling](#autoscaling)
- Spins up containers when no capacity available
+- [Configuration](#configuration)
- Spins down containers when resources idle
+- [Parallel Execution](#parallel-execution)
- Scales to zero by default when no inputs to process
+- [Concurrent Inputs](#concurrent-inputs)
 - [Dynamic Batching](#dynamic-batching)
 - [Dynamic Autoscaler Updates](#dynamic-autoscaler-updates)
 - [Limits](#limits)
-Autoscaling decisions are made quickly and frequently.
+## Autoscaling
-## Parallel Execution with `.map()`
+Modal automatically manages a pool of containers for each function:
 - Spins up containers when there's no capacity for new inputs
 - Spins down idle containers to save costs
 - Scales from zero (no cost when idle) to thousands of containers
-Run function repeatedly with different inputs in parallel:
+No configuration needed for basic autoscaling — it works out of the box.
-```python
+## Configuration
@app.function()
 def evaluate_model(x):
    return x ** 2
-@app.local_entrypoint()
+Fine-tune autoscaling behavior:
 def main():
    inputs = list(range(100))
    # Runs 100 inputs in parallel across containers
    for result in evaluate_model.map(inputs):
        print(result)
 ```
 ### Multiple Arguments with `.starmap()`
 For functions with multiple arguments:
 ```python
@app.function()
 def add(a, b):
    return a + b
@app.local_entrypoint()
 def main():
    results = list(add.starmap([(1, 2), (3, 4)]))
    # [3, 7]
 ```
 ### Exception Handling
 ```python
@app.function()
 def may_fail(a):
    if a == 2:
        raise Exception("error")
    return a ** 2
@app.local_entrypoint()
 def main():
    results = list(may_fail.map(
        range(3),
        return_exceptions=True,
        wrap_returned_exceptions=False
    ))
    # [0, 1, Exception('error')]
 ```
 ## Autoscaling Configuration
 Configure autoscaler behavior with parameters:
 ```python
@app.function(
-    max_containers=100,      # Upper limit on containers
+    max_containers=100,     # Upper limit on container count
-    min_containers=2,        # Keep warm even when inactive
+    min_containers=2,       # Keep 2 warm (reduces cold starts)
-    buffer_containers=5,     # Maintain buffer while active
+    buffer_containers=5,    # Reserve 5 extra for burst traffic
-    scaledown_window=60,     # Max idle time before scaling down (seconds)
+    scaledown_window=300,   # Wait 5 min idle before shutting down
 )
-def my_function():
+def handle_request(data):
    ...
 ```
-Parameters:
+| Parameter | Default | Description |
- **max_containers**: Upper limit on total containers
+|-----------|---------|-------------|
- **min_containers**: Minimum kept warm even when inactive
+| `max_containers` | Unlimited | Hard cap on total containers |
- **buffer_containers**: Buffer size while function active (additional inputs won't need to queue)
+| `min_containers` | 0 | Minimum warm containers (costs money even when idle) |
- **scaledown_window**: Maximum idle duration before scale down (seconds)
+| `buffer_containers` | 0 | Extra containers to prevent queuing |
 | `scaledown_window` | 60 | Seconds of idle time before shutdown |
-Trade-offs:
+### Trade-offs
- Larger warm pool/buffer → Higher cost, lower latency
+
- Longer scaledown window → Less churn for infrequent requests
+- Higher `min_containers` = lower latency, higher cost
 - Higher `buffer_containers` = less queuing, higher cost
 - Lower `scaledown_window` = faster cost savings, more cold starts
 ## Parallel Execution
 ### `.map()` — Process Many Inputs
 ```python
@app.function()
 def process(item):
    return heavy_computation(item)
@app.local_entrypoint()
 def main():
    items = list(range(10_000))
    results = list(process.map(items))
 ```
 Modal automatically scales containers to handle the workload. Results maintain input order.
 ### `.map()` Options
 ```python
 # Unordered results (faster)
 for result in process.map(items, order_outputs=False):
    handle(result)
 # Collect errors instead of raising
 results = list(process.map(items, return_exceptions=True))
 for r in results:
    if isinstance(r, Exception):
        print(f"Error: {r}")
 ```
 ### `.starmap()` — Multi-Argument
 ```python
@app.function()
 def add(x, y):
    return x + y
 results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
 # [3, 7, 11]
 ```
 ### `.spawn()` — Fire-and-Forget
 ```python
 # Returns immediately
 call = process.spawn(large_data)
 # Check status or get result later
 result = call.get()
 ```
 Up to 1 million pending `.spawn()` calls.
 ## Concurrent Inputs
 By default, each container handles one input at a time. Use `@modal.concurrent` to handle multiple:
 ```python
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=10)
 async def predict(text: str):
    result = await model.predict_async(text)
    return result
 ```
 This is ideal for I/O-bound workloads or async inference where a single GPU can handle multiple requests.
 ### With Web Endpoints
 ```python
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=20)
@modal.asgi_app()
 def web_service():
    return fastapi_app
 ```
 ## Dynamic Batching
 Collect inputs into batches for efficient GPU utilization:
 ```python
@app.function(gpu="L40S")
@modal.batched(max_batch_size=32, wait_ms=100)
 async def batch_predict(texts: list[str]):
    # Called with up to 32 texts at once
    embeddings = model.encode(texts)
    return list(embeddings)
 ```
 - `max_batch_size` — Maximum inputs per batch
 - `wait_ms` — How long to wait for more inputs before processing
 - The function receives a list and must return a list of the same length
 ## Dynamic Autoscaler Updates
-Update autoscaler settings without redeployment:
+Adjust autoscaling at runtime without redeploying:
 ```python
 f = modal.Function.from_name("my-app", "f")
 f.update_autoscaler(max_containers=100)
 ```
 Settings revert to decorator configuration on next deploy, or are overridden by further updates:
 ```python
 f.update_autoscaler(min_containers=2, max_containers=10)
 f.update_autoscaler(min_containers=4)  # max_containers=10 still in effect
 ```
 ### Time-Based Scaling
 Adjust warm pool based on time of day:
 ```python
@app.function()
-def inference_server():
+def scale_up_for_peak():
-    ...
+    process = modal.Function.from_name("my-app", "process")
    process.update_autoscaler(min_containers=10, buffer_containers=20)
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
 def increase_warm_pool():
    inference_server.update_autoscaler(min_containers=4)
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
 def decrease_warm_pool():
    inference_server.update_autoscaler(min_containers=0)
 ```
 ### For Classes
 Update autoscaler for specific parameter instances:
 ```python
 MyClass = modal.Cls.from_name("my-app", "MyClass")
 obj = MyClass(model_version="3.5")
 obj.update_autoscaler(buffer_containers=2)  # type: ignore
 ```
 ## Input Concurrency
 Process multiple inputs per container with `@modal.concurrent`:
 ```python
@app.function()
-@modal.concurrent(max_inputs=100)
+def scale_down_after_peak():
-def my_function(input: str):
+    process = modal.Function.from_name("my-app", "process")
-    # Container can handle up to 100 concurrent inputs
+    process.update_autoscaler(min_containers=1, buffer_containers=2)
    ...
 ```
-Ideal for I/O-bound workloads:
+Settings revert to the decorator values on the next deployment.
 - Database queries
 - External API requests
 - Remote Modal Function calls
-### Concurrency Mechanisms
+## Limits
-**Synchronous Functions**: Separate threads (must be thread-safe)
+| Resource | Limit |
 |----------|-------|
 | Pending inputs (unassigned) | 2,000 |
 | Total inputs (running + pending) | 25,000 |
 | Pending `.spawn()` inputs | 1,000,000 |
 | Concurrent inputs per `.map()` | 1,000 |
 | Rate limit (web endpoints) | 200 req/s |
-```python
+Exceeding these limits triggers `Resource Exhausted` errors. Implement retry logic for resilience.
@app.function()
@modal.concurrent(max_inputs=10)
 def sync_function():
    time.sleep(1)  # Must be thread-safe
 ```
 **Async Functions**: Separate asyncio tasks (must not block event loop)
 ```python
@app.function()
@modal.concurrent(max_inputs=10)
 async def async_function():
    await asyncio.sleep(1)  # Must not block event loop
 ```
 ### Target vs Max Inputs
 ```python
@app.function()
@modal.concurrent(
    max_inputs=120,    # Hard limit
    target_inputs=100  # Autoscaler target
 )
 def my_function(input: str):
    # Allow 20% burst above target
    ...
 ```
 Autoscaler aims for `target_inputs`, but containers can burst to `max_inputs` during scale-up.
 ## Scaling Limits
 Modal enforces limits per function:
 - 2,000 pending inputs (not yet assigned to containers)
 - 25,000 total inputs (running + pending)
 For `.spawn()` async jobs: up to 1 million pending inputs.
 Exceeding limits returns `Resource Exhausted` error - retry later.
 Each `.map()` invocation: max 1,000 concurrent inputs.
 ## Async Usage
 Use async APIs for arbitrary parallel execution patterns:
 ```python
@app.function()
 async def async_task(x):
    await asyncio.sleep(1)
    return x * 2
@app.local_entrypoint()
 async def main():
    tasks = [async_task.remote.aio(i) for i in range(100)]
    results = await asyncio.gather(*tasks)
 ```
 ## Common Gotchas
 **Incorrect**: Using Python's builtin map (runs sequentially)
 ```python
 # DON'T DO THIS
 results = map(evaluate_model, inputs)
 ```
 **Incorrect**: Calling function first
 ```python
 # DON'T DO THIS
 results = evaluate_model(inputs).map()
 ```
 **Correct**: Call .map() on Modal function object
 ```python
 # DO THIS
 results = evaluate_model.map(inputs)
 ```
--- a/scientific-skills/modal/references/scheduled-jobs.md
+++ b/scientific-skills/modal/references/scheduled-jobs.md
@@ -1,303 +1,143 @@
-# Scheduled Jobs and Cron
+# Modal Scheduled Jobs
-## Basic Scheduling
+## Overview
-Schedule functions to run automatically at regular intervals or specific times.
+Modal supports running functions automatically on a schedule, either using cron syntax or fixed intervals. Deploy scheduled functions with `modal deploy` and they run unattended in the cloud.
-### Simple Daily Schedule
+## Schedule Types
 ### modal.Cron
 Standard cron syntax — stable across deploys:
 ```python
 import modal
-app = modal.App()
+app = modal.App("scheduled-tasks")
-@app.function(schedule=modal.Period(days=1))
+# Daily at 9 AM UTC
-def daily_task():
+@app.function(schedule=modal.Cron("0 9 * * *"))
-    print("Running daily task")
+def daily_report():
-    # Process data, send reports, etc.
+    generate_and_send_report()
 # Every Monday at midnight
@app.function(schedule=modal.Cron("0 0 * * 1"))
 def weekly_cleanup():
    cleanup_old_data()
 # Every 15 minutes
@app.function(schedule=modal.Cron("*/15 * * * *"))
 def frequent_check():
    check_system_health()
 ```
-Deploy to activate:
+#### Cron Syntax Reference
-```bash
+
-modal deploy script.py
+```
 ┌───────────── minute (0-59)
 │ ┌───────────── hour (0-23)
 │ │ ┌───────────── day of month (1-31)
 │ │ │ ┌───────────── month (1-12)
 │ │ │ │ ┌───────────── day of week (0-6, Sun=0)
 │ │ │ │ │
 * * * * *
 ```
-Function runs every 24 hours from deployment time.
+| Pattern | Meaning |
 |---------|---------|
 | `0 9 * * *` | Daily at 9:00 AM UTC |
 | `0 */6 * * *` | Every 6 hours |
 | `*/30 * * * *` | Every 30 minutes |
 | `0 0 * * 1` | Every Monday at midnight |
 | `0 0 1 * *` | First day of every month |
 | `0 9 * * 1-5` | Weekdays at 9 AM |
-## Schedule Types
+### modal.Period
-### Period Schedules
+Fixed interval — resets on each deploy:
 Run at fixed intervals from deployment time:
 ```python
 # Every 5 hours
@app.function(schedule=modal.Period(hours=5))
-def every_5_hours():
+def periodic_sync():
-    ...
+    sync_data()
 # Every 30 minutes
@app.function(schedule=modal.Period(minutes=30))
-def every_30_minutes():
+def poll_updates():
-    ...
+    check_for_updates()
 # Every day
@app.function(schedule=modal.Period(days=1))
-def daily():
+def daily_task():
    ...
 ```
-**Note**: Redeploying resets the period timer.
+`modal.Period` resets its timer on each deployment. If you need a schedule that doesn't shift with deploys, use `modal.Cron`.
-### Cron Schedules
+## Deploying Scheduled Functions
-Run at specific times using cron syntax:
+Schedules only activate when deployed:
 ```python
 # Every Monday at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * 1"))
 def weekly_report():
    ...
 # Daily at 6 AM New York time
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
 def morning_report():
    ...
 # Every hour on the hour
@app.function(schedule=modal.Cron("0 * * * *"))
 def hourly():
    ...
 # Every 15 minutes
@app.function(schedule=modal.Cron("*/15 * * * *"))
 def quarter_hourly():
    ...
 ```
 **Cron syntax**: `minute hour day month day_of_week`
 - Minute: 0-59
 - Hour: 0-23
 - Day: 1-31
 - Month: 1-12
 - Day of week: 0-6 (0 = Sunday)
 ### Timezone Support
 Specify timezone for cron schedules:
 ```python
@app.function(schedule=modal.Cron("0 9 * * *", timezone="Europe/London"))
 def uk_morning_task():
    ...
@app.function(schedule=modal.Cron("0 17 * * 5", timezone="Asia/Tokyo"))
 def friday_evening_jp():
    ...
 ```
 ## Deployment
 ### Deploy Scheduled Functions
 ```bash
 modal deploy script.py
 ```
-Scheduled functions persist until explicitly stopped.
+`modal run` and `modal serve` do not activate schedules.
 ### Programmatic Deployment
 ```python
 if __name__ == "__main__":
    app.deploy()
 ```
 ## Monitoring
-### View Execution Logs
+- View scheduled runs in the **Apps** section of the Modal dashboard
 - Each run appears with its status, duration, and logs
 - Use the **"Run Now"** button on the dashboard to trigger manually
-Check https://modal.com/apps for:
+## Management
 - Past execution logs
 - Execution history
 - Failure notifications
-### Run Manually
+- Schedules cannot be paused — remove the schedule and redeploy to stop
-
+- To change a schedule, update the `schedule` parameter and redeploy
-Trigger scheduled function immediately via dashboard "Run now" button.
+- To stop entirely, either remove the `schedule` parameter or run `modal app stop <name>`
 ## Schedule Management
 ### Pausing Schedules
 Schedules cannot be paused. To stop:
 1. Remove `schedule` parameter
 2. Redeploy app
 ### Updating Schedules
 Change schedule parameters and redeploy:
 ```python
 # Update from daily to weekly
@app.function(schedule=modal.Period(days=7))
 def task():
    ...
 ```
 ```bash
 modal deploy script.py
 ```
 ## Common Patterns
-### Data Pipeline
+### ETL Pipeline
 ```python
@app.function(
-    schedule=modal.Cron("0 2 * * *"),  # 2 AM daily
+    schedule=modal.Cron("0 2 * * *"),  # 2 AM UTC daily
-    timeout=3600,                       # 1 hour timeout
+    secrets=[modal.Secret.from_name("db-creds")],
    timeout=7200,
 )
 def etl_pipeline():
-    # Extract data from sources
+    import os
-    data = extract_data()
+    data = extract(os.environ["SOURCE_DB_URL"])
-
+    transformed = transform(data)
-    # Transform data
+    load(transformed, os.environ["DEST_DB_URL"])
    transformed = transform_data(data)
    # Load to warehouse
    load_to_warehouse(transformed)
 ```
 ### Model Retraining
 ```python
 volume = modal.Volume.from_name("models")
@app.function(
-    schedule=modal.Cron("0 0 * * 0"),  # Weekly on Sunday midnight
+    schedule=modal.Cron("0 0 * * 0"),  # Weekly on Sunday
-    gpu="A100",
+    gpu="H100",
-    timeout=7200,                       # 2 hours
+    volumes={"/data": data_vol, "/models": model_vol},
-    volumes={"/models": volume}
+    timeout=86400,
 )
-def retrain_model():
+def retrain():
-    # Load latest data
+    model = train_on_latest_data("/data/training/")
-    data = load_training_data()
+    torch.save(model.state_dict(), "/models/latest.pt")
    # Train model
    model = train(data)
    # Save new model
    save_model(model, "/models/latest.pt")
    volume.commit()
 ```
-### Report Generation
+### Health Checks
 ```python
@app.function(
-    schedule=modal.Cron("0 9 * * 1"),  # Monday 9 AM
+    schedule=modal.Period(minutes=5),
-    secrets=[modal.Secret.from_name("email-creds")]
+    secrets=[modal.Secret.from_name("slack-webhook")],
 )
-def weekly_report():
+def health_check():
-    # Generate report
+    import os, requests
-    report = generate_analytics_report()
+    status = check_all_services()
-
+    if not status["healthy"]:
-    # Send email
+        requests.post(os.environ["SLACK_URL"], json={"text": f"Alert: {status}"})
    send_email(
        to="team@company.com",
        subject="Weekly Analytics Report",
        body=report
    )
 ```
 ### Data Cleanup
 ```python
@app.function(schedule=modal.Period(hours=6))
 def cleanup_old_data():
    # Remove data older than 30 days
    cutoff = datetime.now() - timedelta(days=30)
    delete_old_records(cutoff)
 ```
 ## Configuration with Secrets and Volumes
 Scheduled functions support all function parameters:
 ```python
 vol = modal.Volume.from_name("data")
 secret = modal.Secret.from_name("api-keys")
@app.function(
    schedule=modal.Cron("0 */6 * * *"),  # Every 6 hours
    secrets=[secret],
    volumes={"/data": vol},
    cpu=4.0,
    memory=16384,
 )
 def sync_data():
    import os
    api_key = os.environ["API_KEY"]
    # Fetch from external API
    data = fetch_external_data(api_key)
    # Save to volume
    with open("/data/latest.json", "w") as f:
        json.dump(data, f)
    vol.commit()
 ```
 ## Dynamic Scheduling
 Update schedules programmatically:
 ```python
@app.function()
 def main_task():
    ...
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
 def enable_high_traffic_mode():
    main_task.update_autoscaler(min_containers=5)
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
 def disable_high_traffic_mode():
    main_task.update_autoscaler(min_containers=0)
 ```
 ## Error Handling
 Scheduled functions that fail will:
 - Show failure in dashboard
 - Send notifications (configurable)
 - Retry on next scheduled run
 ```python
@app.function(
    schedule=modal.Cron("0 * * * *"),
    retries=3,  # Retry failed runs
    timeout=1800
 )
 def robust_task():
    try:
        perform_task()
    except Exception as e:
        # Log error
        print(f"Task failed: {e}")
        # Optionally send alert
        send_alert(f"Scheduled task failed: {e}")
        raise
 ```
 ## Best Practices
 1. **Set timeouts**: Always specify timeout for scheduled functions
 2. **Use appropriate schedules**: Period for relative timing, Cron for absolute
 3. **Monitor failures**: Check dashboard regularly for failed runs
 4. **Idempotent operations**: Design tasks to handle reruns safely
 5. **Resource limits**: Set appropriate CPU/memory for scheduled workloads
 6. **Timezone awareness**: Specify timezone for cron schedules
--- a/scientific-skills/modal/references/secrets.md
+++ b/scientific-skills/modal/references/secrets.md
@@ -1,180 +1,119 @@
-# Secrets and Environment Variables
+# Modal Secrets
 ## Overview
 Modal Secrets securely deliver credentials and sensitive data to functions as environment variables. Secrets are stored encrypted and only available to your workspace.
 ## Creating Secrets
 ### Via Dashboard
 Create secrets at https://modal.com/secrets
 Templates available for:
 - Database credentials (Postgres, MongoDB)
 - Cloud providers (AWS, GCP, Azure)
 - ML platforms (Weights & Biases, Hugging Face)
 - And more
 ### Via CLI
 ```bash
-# Create secret with key-value pairs
+# Create with key-value pairs
-modal secret create my-secret KEY1=value1 KEY2=value2
+modal secret create my-api-keys API_KEY=sk-xxx DB_PASSWORD=hunter2
-# Use environment variables
+# Create from existing environment variables
-modal secret create db-secret PGHOST=uri PGPASSWORD="$PGPASSWORD"
+modal secret create my-env-keys API_KEY=$API_KEY
-# List secrets
+# List all secrets
 modal secret list
-# Delete secret
+# Delete a secret
-modal secret delete my-secret
+modal secret delete my-api-keys
 ```
-### Programmatically
+### Via Dashboard
-From dictionary:
+Navigate to https://modal.com/secrets to create and manage secrets. Templates are available for common services (Postgres, MongoDB, Hugging Face, Weights & Biases, etc.).
 ### Programmatic (Inline)
 ```python
-if modal.is_local():
+# From a dictionary (useful for development)
-    local_secret = modal.Secret.from_dict({"FOO": os.environ["LOCAL_FOO"]})
+secret = modal.Secret.from_dict({"API_KEY": "sk-xxx"})
 else:
    local_secret = modal.Secret.from_dict({})
-@app.function(secrets=[local_secret])
+# From a .env file
-def some_function():
+secret = modal.Secret.from_dotenv()
-    import os
+
-    print(os.environ["FOO"])
+# From a named secret (created via CLI or dashboard)
 secret = modal.Secret.from_name("my-api-keys")
 ```
-From .env file:
+## Using Secrets in Functions
 ### Basic Usage
 ```python
-@app.function(secrets=[modal.Secret.from_dotenv()])
+@app.function(secrets=[modal.Secret.from_name("my-api-keys")])
-def some_function():
+def call_api():
    import os
-    print(os.environ["USERNAME"])
+    api_key = os.environ["API_KEY"]
-```
+    # Use the key
-
+    response = requests.get(url, headers={"Authorization": f"Bearer {api_key}"})
-## Using Secrets
+    return response.json()
 Inject secrets into functions:
 ```python
@app.function(secrets=[modal.Secret.from_name("my-secret")])
 def some_function():
    import os
    secret_key = os.environ["MY_PASSWORD"]
    # Use secret
    ...
 ```
 ### Multiple Secrets
 ```python
@app.function(secrets=[
    modal.Secret.from_name("openai-keys"),
    modal.Secret.from_name("database-creds"),
    modal.Secret.from_name("api-keys"),
 ])
-def other_function():
+def process():
-    # All keys from both secrets available
+    import os
    openai_key = os.environ["OPENAI_API_KEY"]
    db_url = os.environ["DATABASE_URL"]
    ...
 ```
-Later secrets override earlier ones if keys clash.
+Secrets are applied in order — if two secrets define the same key, the later one wins.
-## Environment Variables
+### With Classes
 ### Reserved Runtime Variables
 **All Containers**:
 - `MODAL_CLOUD_PROVIDER` - Cloud provider (AWS/GCP/OCI)
 - `MODAL_IMAGE_ID` - Image ID
 - `MODAL_REGION` - Region identifier (e.g., us-east-1)
 - `MODAL_TASK_ID` - Container task ID
 **Function Containers**:
 - `MODAL_ENVIRONMENT` - Modal Environment name
 - `MODAL_IS_REMOTE` - Set to '1' in remote containers
 - `MODAL_IDENTITY_TOKEN` - OIDC token for function identity
 **Sandbox Containers**:
 - `MODAL_SANDBOX_ID` - Sandbox ID
 ### Setting Environment Variables
 Via Image:
 ```python
-image = modal.Image.debian_slim().env({"PORT": "6443"})
+@app.cls(secrets=[modal.Secret.from_name("huggingface")])
-
+class ModelService:
-@app.function(image=image)
+    @modal.enter()
-def my_function():
+    def load(self):
        import os
-    port = os.environ["PORT"]
+        token = os.environ["HF_TOKEN"]
        self.model = AutoModel.from_pretrained("model-name", token=token)
 ```
-Via Secrets:
+### From .env File
 ```python
-secret = modal.Secret.from_dict({"API_KEY": "secret-value"})
+# Reads .env file from current directory
-
+@app.function(secrets=[modal.Secret.from_dotenv()])
-@app.function(secrets=[secret])
+def local_dev():
 def my_function():
    import os
    api_key = os.environ["API_KEY"]
 ```
-## Common Secret Patterns
+The `.env` file format:
-### AWS Credentials
+```
-
+API_KEY=sk-xxx
-```python
+DATABASE_URL=postgres://user:pass@host/db
-aws_secret = modal.Secret.from_name("my-aws-secret")
+DEBUG=false
@app.function(secrets=[aws_secret])
 def use_aws():
    import boto3
    s3 = boto3.client('s3')
    # AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY automatically used
 ```
-### Hugging Face Token
+## Common Secret Templates
-```python
+| Service | Typical Keys |
-hf_secret = modal.Secret.from_name("huggingface")
+|---------|-------------|
-
+| OpenAI | `OPENAI_API_KEY` |
-@app.function(secrets=[hf_secret])
+| Hugging Face | `HF_TOKEN` |
-def download_model():
+| AWS | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
-    from transformers import AutoModel
+| Postgres | `PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE` |
-    # HF_TOKEN automatically used for authentication
+| Weights & Biases | `WANDB_API_KEY` |
-    model = AutoModel.from_pretrained("private-model")
+| GitHub | `GITHUB_TOKEN` |
 ```
 ### Database Credentials
 ```python
 db_secret = modal.Secret.from_name("postgres-creds")
@app.function(secrets=[db_secret])
 def query_db():
    import psycopg2
    conn = psycopg2.connect(
        host=os.environ["PGHOST"],
        port=os.environ["PGPORT"],
        user=os.environ["PGUSER"],
        password=os.environ["PGPASSWORD"],
    )
 ```
 ## Best Practices
 1. **Never hardcode secrets** - Always use Modal Secrets
 2. **Use specific secrets** - Create separate secrets for different purposes
 3. **Rotate secrets regularly** - Update secrets periodically
 4. **Minimal scope** - Only attach secrets to functions that need them
 5. **Environment-specific** - Use different secrets for dev/staging/prod
 ## Security Notes
- Secrets are encrypted at rest
+- Secrets are encrypted at rest and in transit
- Only available to functions that explicitly request them
+- Only accessible to functions in your workspace
- Not logged or exposed in dashboards
+- Never log or print secret values
- Can be scoped to specific environments
+- Use `.from_name()` in production (not `.from_dict()`)
 - Rotate secrets regularly via the dashboard or CLI
--- a/scientific-skills/modal/references/volumes.md
+++ b/scientific-skills/modal/references/volumes.md
@@ -1,303 +1,247 @@
 # Modal Volumes
 ## Table of Contents
 - [Overview](#overview)
 - [Creating Volumes](#creating-volumes)
 - [Mounting Volumes](#mounting-volumes)
 - [Reading and Writing Files](#reading-and-writing-files)
 - [CLI Access](#cli-access)
 - [Commits and Reloads](#commits-and-reloads)
 - [Concurrent Access](#concurrent-access)
 - [Volumes v2](#volumes-v2)
 - [Common Patterns](#common-patterns)
 ## Overview
-Modal Volumes provide high-performance distributed file systems for Modal applications. Designed for write-once, read-many workloads like ML model weights and distributed data processing.
+Volumes are Modal's distributed file system, optimized for write-once, read-many workloads like storing model weights and distributing them across containers.
 Key characteristics:
 - Persistent across function invocations and deployments
 - Mountable by multiple functions simultaneously
 - Background auto-commits every few seconds
 - Final commit on container shutdown
 ## Creating Volumes
 ### In Code (Lazy Creation)
 ```python
 vol = modal.Volume.from_name("my-volume", create_if_missing=True)
 ```
 ### Via CLI
 ```bash
 modal volume create my-volume
 # v2 volume (beta)
 modal volume create my-volume --version=2
 ```
-For Volumes v2 (beta):
+### Programmatic v2
 ```bash
 modal volume create --version=2 my-volume
 ```
 ### From Code
 ```python
 vol = modal.Volume.from_name("my-volume", create_if_missing=True)
 # For v2
 vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
 ```
-## Using Volumes
+## Mounting Volumes
-Attach to functions via mount points:
+Mount volumes to functions via the `volumes` parameter:
 ```python
-vol = modal.Volume.from_name("my-volume")
+vol = modal.Volume.from_name("model-store", create_if_missing=True)
-@app.function(volumes={"/data": vol})
+@app.function(volumes={"/models": vol})
-def run():
+def use_model():
-    with open("/data/xyz.txt", "w") as f:
+    # Access files at /models/
-        f.write("hello")
+    with open("/models/config.json") as f:
-    vol.commit()  # Persist changes
+        config = json.load(f)
 ```
-## Commits and Reloads
+Mount multiple volumes:
 ### Commits
 Persist changes to Volume:
 ```python
-@app.function(volumes={"/data": vol})
+weights_vol = modal.Volume.from_name("weights")
-def write_data():
+data_vol = modal.Volume.from_name("datasets")
    with open("/data/file.txt", "w") as f:
        f.write("data")
    vol.commit()  # Make changes visible to other containers
 ```
-**Background commits**: Modal automatically commits Volume changes every few seconds and on container shutdown.
+@app.function(volumes={"/weights": weights_vol, "/data": data_vol})
-
+def train():
 ### Reloads
 Fetch latest changes from other containers:
 ```python
@app.function(volumes={"/data": vol})
 def read_data():
    vol.reload()  # Fetch latest changes
    with open("/data/file.txt", "r") as f:
        content = f.read()
 ```
 At container creation, latest Volume state is mounted. Reload needed to see subsequent commits from other containers.
 ## Uploading Files
 ### Batch Upload (Efficient)
 ```python
 vol = modal.Volume.from_name("my-volume")
 with vol.batch_upload() as batch:
    batch.put_file("local-path.txt", "/remote-path.txt")
    batch.put_directory("/local/directory/", "/remote/directory")
    batch.put_file(io.BytesIO(b"some data"), "/foobar")
 ```
 ### Via Image
 ```python
 image = modal.Image.debian_slim().add_local_dir(
    local_path="/home/user/my_dir",
    remote_path="/app"
 )
@app.function(image=image)
 def process():
    # Files available at /app
    ...
 ```
-## Downloading Files
+## Reading and Writing Files
-### Via CLI
+### Writing
 ```bash
 modal volume get my-volume remote.txt local.txt
 ```
 Max file size via CLI: No limit
 Max file size via dashboard: 16 MB
 ### Via Python SDK
 ```python
-vol = modal.Volume.from_name("my-volume")
+@app.function(volumes={"/data": vol})
 def save_results(results):
    import json
    import os
-for data in vol.read_file("path.txt"):
+    os.makedirs("/data/outputs", exist_ok=True)
-    print(data)
+    with open("/data/outputs/results.json", "w") as f:
        json.dump(results, f)
 ```
-## Volume Performance
+### Reading
 ### Volumes v1
 Best for:
 - <50,000 files (recommended)
 - <500,000 files (hard limit)
 - Sequential access patterns
 - <5 concurrent writers
 ### Volumes v2 (Beta)
 Improved for:
 - Unlimited files
 - Hundreds of concurrent writers
 - Random access patterns
 - Large files (up to 1 TiB)
 Current v2 limits:
 - Max file size: 1 TiB
 - Max files per directory: 32,768
 - Unlimited directory depth
 ## Model Storage
 ### Saving Model Weights
 ```python
-volume = modal.Volume.from_name("model-weights", create_if_missing=True)
+@app.function(volumes={"/data": vol})
-MODEL_DIR = "/models"
+def load_results():
    with open("/data/outputs/results.json") as f:
        return json.load(f)
 ```
-@app.function(volumes={MODEL_DIR: volume})
+### Large Files (Model Weights)
-def train():
+
 ```python
@app.function(volumes={"/models": vol}, gpu="L40S")
 def save_model():
    import torch
    model = train_model()
-    save_model(f"{MODEL_DIR}/my_model.pt", model)
+    torch.save(model.state_dict(), "/models/checkpoint.pt")
-    volume.commit()
+
@app.function(volumes={"/models": vol}, gpu="L40S")
 def load_model():
    import torch
    model = MyModel()
    model.load_state_dict(torch.load("/models/checkpoint.pt"))
    return model
 ```
-### Loading Model Weights
+## CLI Access
 ```python
@app.function(volumes={MODEL_DIR: volume})
 def inference(model_id: str):
    try:
        model = load_model(f"{MODEL_DIR}/{model_id}")
    except NotFound:
        volume.reload()  # Fetch latest models
        model = load_model(f"{MODEL_DIR}/{model_id}")
    return model.run(request)
 ```
 ## Model Checkpointing
 Save checkpoints during long training jobs:
 ```python
 volume = modal.Volume.from_name("checkpoints")
 VOL_PATH = "/vol"
@app.function(
    gpu="A10G",
    timeout=2*60*60,  # 2 hours
    volumes={VOL_PATH: volume}
 )
 def finetune():
    from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
    training_args = Seq2SeqTrainingArguments(
        output_dir=str(VOL_PATH / "model"),  # Checkpoints saved to Volume
        save_steps=100,
        # ... more args
    )
    trainer = Seq2SeqTrainer(model=model, args=training_args, ...)
    trainer.train()
 ```
 Background commits ensure checkpoints persist even if training is interrupted.
 ## CLI Commands
 ```bash
 # List files
 modal volume ls my-volume
 modal volume ls my-volume /subdir/
-# Upload
+# Upload files
-modal volume put my-volume local.txt remote.txt
+modal volume put my-volume local_file.txt
 modal volume put my-volume local_file.txt /remote/path/file.txt
-# Download
+# Download files
-modal volume get my-volume remote.txt local.txt
+modal volume get my-volume /remote/file.txt local_file.txt
-# Copy within Volume
+# Delete a volume
 modal volume cp my-volume src.txt dst.txt
 # Delete
 modal volume rm my-volume file.txt
 # List all volumes
 modal volume list
 # Delete volume
 modal volume delete my-volume
 ```
-## Ephemeral Volumes
+## Commits and Reloads
-Create temporary volumes that are garbage collected:
+Modal auto-commits volume changes in the background every few seconds and on container shutdown.
 ### Explicit Commit
 Force an immediate commit:
 ```python
-with modal.Volume.ephemeral() as vol:
+@app.function(volumes={"/data": vol})
-    sb = modal.Sandbox.create(
+def writer():
-        volumes={"/cache": vol},
+    with open("/data/file.txt", "w") as f:
-        app=my_app,
+        f.write("hello")
-    )
+    vol.commit()  # Make immediately visible to other containers
-    # Use volume
+```
-    # Automatically cleaned up when context exits
+
 ### Reload
 See changes from other containers:
 ```python
@app.function(volumes={"/data": vol})
 def reader():
    vol.reload()  # Refresh to see latest writes
    with open("/data/file.txt") as f:
        return f.read()
 ```
 ## Concurrent Access
-### Concurrent Reads
+### v1 Volumes
-Multiple containers can read simultaneously without issues.
+- Recommended max 5 concurrent commits
 - Last write wins for concurrent modifications of the same file
 - Avoid concurrent modification of identical files
 - Max 500,000 files (inodes)
-### Concurrent Writes
+### v2 Volumes
-Supported but:
+- Hundreds of concurrent writers (distinct files)
- Avoid modifying same files concurrently
+- No file count limit
- Last write wins (data loss possible)
+- Improved random access performance
- v1: Limit to ~5 concurrent writers
+- Up to 1 TiB per file, 262,144 files per directory
 - v2: Hundreds of concurrent writers supported
-## Volume Errors
+## Volumes v2
-### "Volume Busy"
+v2 Volumes (beta) offer significant improvements:
-Cannot reload when files are open:
+| Feature | v1 | v2 |
 |---------|----|----|
 | Max files | 500,000 | Unlimited |
 | Concurrent writes | ~5 | Hundreds |
 | Max file size | No limit | 1 TiB |
 | Random access | Limited | Full support |
 | HIPAA compliance | No | Yes |
 | Hard links | No | Yes |
 Enable v2:
 ```python
-# WRONG
+vol = modal.Volume.from_name("my-vol-v2", create_if_missing=True, version=2)
 f = open("/vol/data.txt", "r")
 volume.reload()  # ERROR: volume busy
 ```
 ## Common Patterns
 ### Model Weight Storage
 ```python
-# CORRECT
+vol = modal.Volume.from_name("model-weights", create_if_missing=True)
-with open("/vol/data.txt", "r") as f:
+
-    data = f.read()
+# Download once during image build
-# File closed before reload
+def download_weights():
-volume.reload()
+    from huggingface_hub import snapshot_download
    snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
 image = (
    modal.Image.debian_slim()
    .uv_pip_install("huggingface_hub")
    .run_function(download_weights, volumes={"/models": vol})
 )
 ```
-### "File Not Found"
+### Training Checkpoints
 Remember to use mount point:
 ```python
-# WRONG - file saved to local disk
+@app.function(volumes={"/checkpoints": vol}, gpu="H100", timeout=86400)
-with open("/xyz.txt", "w") as f:
+def train():
-    f.write("data")
+    for epoch in range(100):
-
+        train_one_epoch()
-# CORRECT - file saved to Volume
+        torch.save(model.state_dict(), f"/checkpoints/epoch_{epoch}.pt")
-with open("/data/xyz.txt", "w") as f:
+        vol.commit()  # Save checkpoint immediately
    f.write("data")
 ```
-## Upgrading from v1 to v2
+### Shared Data Between Functions
-No automated migration currently. Manual steps:
+```python
 data_vol = modal.Volume.from_name("shared-data", create_if_missing=True)
-1. Create new v2 Volume
+@app.function(volumes={"/data": data_vol})
-2. Copy data using `cp` or `rsync`
+def preprocess():
-3. Update app to use new Volume
+    # Write processed data
    df.to_parquet("/data/processed.parquet")
-```bash
+@app.function(volumes={"/data": data_vol})
-modal volume create --version=2 my-volume-v2
+def analyze():
-modal shell --volume my-volume --volume my-volume-v2
+    data_vol.reload()  # Ensure we see latest data
-
+    df = pd.read_parquet("/data/processed.parquet")
-# In shell:
+    return df.describe()
 cp -rp /mnt/my-volume/. /mnt/my-volume-v2/.
 sync /mnt/my-volume-v2
 ```
-Warning: Deployed apps reference Volumes by ID. Re-deploy after creating new Volume.
+### Performance Tips
 - Volumes are optimized for large files, not many small files
 - Keep under 50,000 files and directories for best v1 performance
 - Use Parquet or other columnar formats instead of many small CSVs
 - For truly temporary data, use `ephemeral_disk` instead of Volumes
--- a/scientific-skills/modal/references/web-endpoints.md
+++ b/scientific-skills/modal/references/web-endpoints.md
@@ -1,337 +1,254 @@
-# Web Endpoints
+# Modal Web Endpoints
-## Quick Start
+## Table of Contents
-Create web endpoint with single decorator:
+- [Simple Endpoints](#simple-endpoints)
-
+- [Deployment](#deployment)
-```python
+- [ASGI Apps](#asgi-apps-fastapi-starlette-fasthtml)
-image = modal.Image.debian_slim().pip_install("fastapi[standard]")
+- [WSGI Apps](#wsgi-apps-flask-django)
-
+- [Custom Web Servers](#custom-web-servers)
-@app.function(image=image)
+- [WebSockets](#websockets)
-@modal.fastapi_endpoint()
+- [Authentication](#authentication)
-def hello():
+- [Streaming](#streaming)
-    return "Hello world!"
+- [Concurrency](#concurrency)
-```
+- [Limits](#limits)
 ## Development and Deployment
 ### Development with `modal serve`
 ```bash
 modal serve server.py
 ```
 Creates ephemeral app with live-reloading. Changes to endpoints appear almost immediately.
 ### Deployment with `modal deploy`
 ```bash
 modal deploy server.py
 ```
 Creates persistent endpoint with stable URL.
 ## Simple Endpoints
-### Query Parameters
+The easiest way to create a web endpoint:
 ```python
-@app.function(image=image)
+import modal
@modal.fastapi_endpoint()
 def square(x: int):
    return {"square": x**2}
 ```
-Call with:
+app = modal.App("api-service")
 ```bash
 curl "https://workspace--app-square.modal.run?x=42"
 ```
 ### POST Requests
 ```python
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
 def square(item: dict):
    return {"square": item['x']**2}
 ```
 Call with:
 ```bash
 curl -X POST -H 'Content-Type: application/json' \
  --data '{"x": 42}' \
  https://workspace--app-square.modal.run
 ```
 ### Pydantic Models
 ```python
 from pydantic import BaseModel
 class Item(BaseModel):
    name: str
    qty: int = 42
@app.function()
-@modal.fastapi_endpoint(method="POST")
+@modal.fastapi_endpoint()
-def process(item: Item):
+def hello(name: str = "World"):
-    return {"processed": item.name, "quantity": item.qty}
+    return {"message": f"Hello, {name}!"}
 ```
 ### POST Endpoints
 ```python
@app.function()
@modal.fastapi_endpoint(method="POST")
 def predict(data: dict):
    result = model.predict(data["text"])
    return {"prediction": result}
 ```
 ### Query Parameters
 Parameters are automatically parsed from query strings:
 ```python
@app.function()
@modal.fastapi_endpoint()
 def search(query: str, limit: int = 10):
    return {"results": do_search(query, limit)}
 ```
 Access via: `https://your-app.modal.run?query=hello&limit=5`
 ## Deployment
 ### Development Mode
 ```bash
 modal serve script.py
 ```
 - Creates a temporary public URL
 - Hot-reloads on file changes
 - Perfect for development and testing
 - URL expires when you stop the command
 ### Production Deployment
 ```bash
 modal deploy script.py
 ```
 - Creates a permanent URL
 - Runs persistently in the cloud
 - Autoscales based on traffic
 - URL format: `https://<workspace>--<app-name>-<function-name>.modal.run`
 ## ASGI Apps (FastAPI, Starlette, FastHTML)
-Serve full ASGI applications:
+For full framework applications, use `@modal.asgi_app`:
 ```python
-image = modal.Image.debian_slim().pip_install("fastapi[standard]")
+from fastapi import FastAPI
-@app.function(image=image)
+web_app = FastAPI()
-@modal.concurrent(max_inputs=100)
+
@web_app.get("/")
 async def root():
    return {"status": "ok"}
@web_app.post("/predict")
 async def predict(request: dict):
    return {"result": model.run(request["input"])}
@app.function(image=image, gpu="L40S")
@modal.asgi_app()
 def fastapi_app():
    from fastapi import FastAPI
    web_app = FastAPI()
    @web_app.get("/")
    async def root():
        return {"message": "Hello"}
    @web_app.post("/echo")
    async def echo(request: Request):
        body = await request.json()
        return body
    return web_app
 ```
 ### With Class Lifecycle
 ```python
@app.cls(gpu="L40S", image=image)
 class InferenceService:
    @modal.enter()
    def load_model(self):
        self.model = load_model()
    @modal.asgi_app()
    def serve(self):
        from fastapi import FastAPI
        app = FastAPI()
        @app.post("/generate")
        async def generate(request: dict):
            return self.model.generate(request["prompt"])
        return app
 ```
 ## WSGI Apps (Flask, Django)
 Serve synchronous web frameworks:
 ```python
-image = modal.Image.debian_slim().pip_install("flask")
+from flask import Flask
 flask_app = Flask(__name__)
@flask_app.route("/")
 def index():
    return {"status": "ok"}
@app.function(image=image)
@modal.concurrent(max_inputs=100)
@modal.wsgi_app()
-def flask_app():
+def flask_server():
-    from flask import Flask, request
+    return flask_app
    web_app = Flask(__name__)
    @web_app.post("/echo")
    def echo():
        return request.json
    return web_app
 ```
-## Non-ASGI Web Servers
+WSGI is synchronous — concurrent inputs run on separate threads.
-For frameworks with custom network binding:
+## Custom Web Servers
-> ⚠️ **Security Note**: The example below uses `shell=True` for simplicity. In production environments, prefer using `subprocess.Popen()` with a list of arguments to prevent command injection vulnerabilities.
+For non-standard web frameworks (aiohttp, Tornado, TGI):
 ```python
-@app.function()
+@app.function(image=image, gpu="H100")
-@modal.concurrent(max_inputs=100)
+@modal.web_server(port=8000)
-@modal.web_server(8000)
+def serve():
 def my_server():
    import subprocess
-    # Must bind to 0.0.0.0, not 127.0.0.1
+    subprocess.Popen([
-    # Use list form instead of shell=True for security
+        "python", "-m", "vllm.entrypoints.openai.api_server",
-    subprocess.Popen(["python", "-m", "http.server", "-d", "/", "8000"])
+        "--model", "meta-llama/Llama-3-70B",
        "--host", "0.0.0.0",  # Must bind to 0.0.0.0, not localhost
        "--port", "8000",
    ])
 ```
-## Streaming Responses
+The application must bind to `0.0.0.0` (not `127.0.0.1`).
 Use FastAPI's `StreamingResponse`:
 ```python
 import time
 def event_generator():
    for i in range(10):
        yield f"data: event {i}\n\n".encode()
        time.sleep(0.5)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
 def stream():
    from fastapi.responses import StreamingResponse
    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream"
    )
 ```
 ### Streaming from Modal Functions
 ```python
@app.function(gpu="any")
 def process_gpu():
    for i in range(10):
        yield f"data: result {i}\n\n".encode()
        time.sleep(1)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
 def hook():
    from fastapi.responses import StreamingResponse
    return StreamingResponse(
        process_gpu.remote_gen(),
        media_type="text/event-stream"
    )
 ```
 ### With .map()
 ```python
@app.function()
 def process_segment(i):
    return f"segment {i}\n"
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
 def stream_parallel():
    from fastapi.responses import StreamingResponse
    return StreamingResponse(
        process_segment.map(range(10)),
        media_type="text/plain"
    )
 ```
 ## WebSockets
-Supported with `@web_server`, `@asgi_app`, and `@wsgi_app`. Maintains single function call per connection. Use with `@modal.concurrent` for multiple simultaneous connections.
+Supported with `@modal.asgi_app`, `@modal.wsgi_app`, and `@modal.web_server`:
-Full WebSocket protocol (RFC 6455) supported. Messages up to 2 MiB each.
+```python
 from fastapi import FastAPI, WebSocket
 web_app = FastAPI()
@web_app.websocket("/ws")
 async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    while True:
        data = await websocket.receive_text()
        result = process(data)
        await websocket.send_text(result)
@app.function()
@modal.asgi_app()
 def ws_app():
    return web_app
 ```
 - Full WebSocket protocol (RFC 6455)
 - Messages up to 2 MiB each
 - No RFC 8441 or RFC 7692 support yet
 ## Authentication
-### Proxy Auth Tokens
+### Proxy Auth Tokens (Built-in)
-First-class authentication via Modal:
+Modal provides first-class endpoint protection via proxy auth tokens:
 ```python
@app.function()
@modal.fastapi_endpoint()
-def protected():
+def protected(text: str):
-    return "authenticated!"
+    return {"result": process(text)}
 ```
-Protect with tokens in settings, pass in headers:
+Clients include `Modal-Key` and `Modal-Secret` headers to authenticate.
 - `Modal-Key`
 - `Modal-Secret`
-### Bearer Token Authentication
+### Custom Bearer Tokens
 ```python
-from fastapi import Depends, HTTPException, status
+from fastapi import Header, HTTPException
 from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
-auth_scheme = HTTPBearer()
+@app.function(secrets=[modal.Secret.from_name("auth-secret")])
-
+@modal.fastapi_endpoint(method="POST")
-@app.function(secrets=[modal.Secret.from_name("auth-token")])
+def secure_predict(data: dict, authorization: str = Header(None)):
@modal.fastapi_endpoint()
 async def protected(token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
    import os
-    if token.credentials != os.environ["AUTH_TOKEN"]:
+    expected = os.environ["AUTH_TOKEN"]
-        raise HTTPException(
+    if authorization != f"Bearer {expected}":
-            status_code=status.HTTP_401_UNAUTHORIZED,
+        raise HTTPException(status_code=401, detail="Unauthorized")
-            detail="Invalid token"
+    return {"result": model.predict(data["text"])}
        )
    return "success!"
 ```
-### Client IP Address
+### Client IP Access
 Available for geolocation, rate limiting, and access control.
 ## Streaming
 ### Server-Sent Events (SSE)
 ```python
-from fastapi import Request
+from fastapi.responses import StreamingResponse
-@app.function()
+@app.function(gpu="H100")
@modal.fastapi_endpoint()
-def get_ip(request: Request):
+def stream_generate(prompt: str):
-    return f"Your IP: {request.client.host}"
+    def generate():
        for token in model.stream(prompt):
            yield f"data: {token}\n\n"
    return StreamingResponse(generate(), media_type="text/event-stream")
 ```
-## Web Endpoint URLs
+## Concurrency
-### Auto-Generated URLs
+Handle multiple requests per container using `@modal.concurrent`:
 Format: `https://<workspace>--<app>-<function>.modal.run`
 With environment suffix: `https://<workspace>-<suffix>--<app>-<function>.modal.run`
 ### Custom Labels
 ```python
-@app.function()
+@app.function(gpu="L40S")
-@modal.fastapi_endpoint(label="api")
+@modal.concurrent(max_inputs=10)
-def handler():
+@modal.fastapi_endpoint(method="POST")
-    ...
+async def batch_predict(data: dict):
-# URL: https://workspace--api.modal.run
+    return {"result": await model.predict_async(data["text"])}
 ```
-### Programmatic URL Retrieval
+## Limits
 ```python
@app.function()
@modal.fastapi_endpoint()
 def my_endpoint():
    url = my_endpoint.get_web_url()
    return {"url": url}
 # From deployed function
 f = modal.Function.from_name("app-name", "my_endpoint")
 url = f.get_web_url()
 ```
 ### Custom Domains
 Available on Team and Enterprise plans:
 ```python
@app.function()
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
 def hello(message: str):
    return {"message": f"hello {message}"}
 ```
 Multiple domains:
 ```python
@modal.fastapi_endpoint(custom_domains=["api.example.com", "api.example.net"])
 ```
 Wildcard domains:
 ```python
@modal.fastapi_endpoint(custom_domains=["*.example.com"])
 ```
 TLS certificates automatically generated and renewed.
 ## Performance
 ### Cold Starts
 First request may experience cold start (few seconds). Modal keeps containers alive for subsequent requests.
 ### Scaling
 - Autoscaling based on traffic
 - Use `@modal.concurrent` for multiple requests per container
 - Beyond concurrency limit, additional containers spin up
 - Requests queue when at max containers
 ### Rate Limits
 Default: 200 requests/second with 5-second burst multiplier
 - Excess returns 429 status code
 - Contact support to increase limits
 ### Size Limits
 - Request body: up to 4 GiB
 - Response body: unlimited
- WebSocket messages: up to 2 MiB
+- Rate limit: 200 requests/second (5-second burst for new accounts)
 - Cold starts occur when no containers are active (use `min_containers` to avoid)