diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 2f68bec..1249037 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -6,7 +6,7 @@ }, "metadata": { "description": "Claude scientific skills from K-Dense Inc", - "version": "2.29.0" + "version": "2.30.0" }, "plugins": [ { diff --git a/docs/scientific-skills.md b/docs/scientific-skills.md index acd853b..608fb14 100644 --- a/docs/scientific-skills.md +++ b/docs/scientific-skills.md @@ -77,7 +77,7 @@ ### Data Management & Infrastructure - **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing -- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container), persistent storage via Volumes for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs, parallel execution with `.map()` for batch processing, input concurrency for I/O-bound workloads, and resource configuration (CPU cores, memory, disk). Supports custom Docker images, integration with Hugging Face/Weights & Biases, FastAPI for web endpoints, and distributed training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, embeddings), GPU-accelerated training, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation +- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200, B200+), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv (recommended)/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container, up to 1,536 GB VRAM), persistent storage via Volumes (v1 and v2) for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs (FastAPI, ASGI, WSGI, WebSockets), parallel execution with `.map()` for batch processing, input concurrency and dynamic batching for I/O-bound workloads, and resource configuration (CPU cores, memory, ephemeral disk up to 3 TiB). Supports custom Docker images, Micromamba/Conda environments, integration with Hugging Face/Weights & Biases, and distributed multi-GPU training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, speech, embeddings), GPU-accelerated training and fine-tuning, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, protein folding and computational biology, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation ### Cheminformatics & Drug Discovery - **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines diff --git a/scientific-skills/modal/SKILL.md b/scientific-skills/modal/SKILL.md index 84df5fd..60fb9d2 100644 --- a/scientific-skills/modal/SKILL.md +++ b/scientific-skills/modal/SKILL.md @@ -1,381 +1,400 @@ --- name: modal -description: Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling. -license: Apache-2.0 license +description: Cloud computing platform for running Python on GPUs and serverless infrastructure. Use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud. Use this skill whenever the user mentions Modal, serverless GPU compute, deploying ML models to the cloud, serving inference endpoints, running batch processing in the cloud, or needs to scale Python workloads beyond their local machine. Also use when the user wants to run code on H100s, A100s, or other cloud GPUs, or needs to create a web API for a model. +license: Apache-2.0 metadata: - skill-author: K-Dense Inc. + skill-author: K-Dense Inc. --- # Modal ## Overview -Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used. +Modal is a cloud platform for running Python code serverlessly, with a focus on AI/ML workloads. Key capabilities: +- **GPU compute** on demand (T4, L4, A10, L40S, A100, H100, H200, B200) +- **Serverless functions** with autoscaling from zero to thousands of containers +- **Custom container images** built entirely in Python code +- **Persistent storage** via Volumes for model weights and datasets +- **Web endpoints** for serving models and APIs +- **Scheduled jobs** via cron or fixed intervals +- **Sub-second cold starts** for low-latency inference -Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits. +Everything in Modal is defined as code — no YAML, no Dockerfiles required (though both are supported). ## When to Use This Skill -Use Modal for: -- Deploying and serving ML models (LLMs, image generation, embedding models) -- Running GPU-accelerated computation (training, inference, rendering) -- Batch processing large datasets in parallel -- Scheduling compute-intensive jobs (daily data processing, model training) -- Building serverless APIs that need automatic scaling -- Scientific computing requiring distributed compute or specialized hardware +Use this skill when: +- Deploy or serve AI/ML models in the cloud +- Run GPU-accelerated computations (training, inference, fine-tuning) +- Create serverless web APIs or endpoints +- Scale batch processing jobs in parallel +- Schedule recurring tasks (data pipelines, retraining, scraping) +- Need persistent cloud storage for model weights or datasets +- Want to run code in custom container environments +- Build job queues or async task processing systems -## Authentication and Setup +## Installation and Authentication -Modal requires authentication via API token. - -### Initial Setup +### Install ```bash -# Install Modal -uv uv pip install modal - -# Authenticate (opens browser for login) -modal token new +uv pip install modal ``` -This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations. +### Authenticate -### Verify Setup +```bash +modal setup +``` + +This opens a browser for authentication. For CI/CD or headless environments, set environment variables: + +```bash +export MODAL_TOKEN_ID= +export MODAL_TOKEN_SECRET= +``` + +Generate tokens at https://modal.com/settings + +Modal offers a free tier with $30/month in credits. + +**Reference**: See `references/getting-started.md` for detailed setup and first app walkthrough. + +## Core Concepts + +### App and Functions + +A Modal `App` groups related functions. Functions decorated with `@app.function()` run remotely in the cloud: ```python import modal -app = modal.App("test-app") +app = modal.App("my-app") @app.function() -def hello(): - print("Modal is working!") -``` +def square(x): + return x ** 2 -Run with: `modal run script.py` - -## Core Capabilities - -Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively. - -### 1. Define Container Images - -Specify dependencies and environment for functions using Modal Images. - -```python -import modal - -# Basic image with Python packages -image = ( - modal.Image.debian_slim(python_version="3.12") - .uv_pip_install("torch", "transformers", "numpy") -) - -app = modal.App("ml-app", image=image) -``` - -**Common patterns:** -- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")` -- Install system packages: `.apt_install("ffmpeg", "git")` -- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")` -- Add local code: `.add_local_python_source("my_module")` - -See `references/images.md` for comprehensive image building documentation. - -### 2. Create Functions - -Define functions that run in the cloud with the `@app.function()` decorator. - -```python -@app.function() -def process_data(file_path: str): - import pandas as pd - df = pd.read_csv(file_path) - return df.describe() -``` - -**Call functions:** -```python -# From local entrypoint @app.local_entrypoint() def main(): - result = process_data.remote("data.csv") - print(result) + # .remote() runs in the cloud + print(square.remote(42)) ``` -Run with: `modal run script.py` +Run with `modal run script.py`. Deploy with `modal deploy script.py`. -See `references/functions.md` for function patterns, deployment, and parameter handling. +**Reference**: See `references/functions.md` for lifecycle hooks, classes, `.map()`, `.spawn()`, and more. -### 3. Request GPUs +### Container Images -Attach GPUs to functions for accelerated computation. +Modal builds container images from Python code. The recommended package installer is `uv`: + +```python +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("torch==2.8.0", "transformers", "accelerate") + .apt_install("git") +) + +@app.function(image=image) +def inference(prompt): + from transformers import pipeline + pipe = pipeline("text-generation", model="meta-llama/Llama-3-8B") + return pipe(prompt) +``` + +Key image methods: +- `.uv_pip_install()` — Install Python packages with uv (recommended) +- `.pip_install()` — Install with pip (fallback) +- `.apt_install()` — Install system packages +- `.run_commands()` — Run shell commands during build +- `.run_function()` — Run Python during build (e.g., download model weights) +- `.add_local_python_source()` — Add local modules +- `.env()` — Set environment variables + +**Reference**: See `references/images.md` for Dockerfiles, micromamba, caching, GPU build steps. + +### GPU Compute + +Request GPUs via the `gpu` parameter: ```python @app.function(gpu="H100") def train_model(): import torch - assert torch.cuda.is_available() - # GPU-accelerated code here + device = torch.device("cuda") + # GPU training code here + +# Multiple GPUs +@app.function(gpu="H100:4") +def distributed_training(): + ... + +# GPU fallback chain +@app.function(gpu=["H100", "A100-80GB", "A100-40GB"]) +def flexible_inference(): + ... ``` -**Available GPU types:** -- `T4`, `L4` - Cost-effective inference -- `A10`, `A100`, `A100-80GB` - Standard training/inference -- `L40S` - Excellent cost/performance balance (48GB) -- `H100`, `H200` - High-performance training -- `B200` - Flagship performance (most powerful) +Available GPUs: T4, L4, A10, L40S, A100-40GB, A100-80GB, H100, H200, B200, B200+ + +- Up to 8 GPUs per container (except A10: up to 4) +- L40S is recommended for inference (cost/performance balance, 48 GB VRAM) +- H100/A100 can be auto-upgraded to H200/A100-80GB at no extra cost +- Use `gpu="H100!"` to prevent auto-upgrade + +**Reference**: See `references/gpu.md` for GPU selection guidance and multi-GPU training. + +### Volumes (Persistent Storage) + +Volumes provide distributed, persistent file storage: -**Request multiple GPUs:** ```python -@app.function(gpu="H100:8") # 8x H100 GPUs -def train_large_model(): - pass +vol = modal.Volume.from_name("model-weights", create_if_missing=True) + +@app.function(volumes={"/data": vol}) +def save_model(): + # Write to the mounted path + with open("/data/model.pt", "wb") as f: + torch.save(model.state_dict(), f) + +@app.function(volumes={"/data": vol}) +def load_model(): + model.load_state_dict(torch.load("/data/model.pt")) ``` -See `references/gpu.md` for GPU selection guidance, CUDA setup, and multi-GPU configuration. +- Optimized for write-once, read-many workloads (model weights, datasets) +- CLI access: `modal volume ls`, `modal volume put`, `modal volume get` +- Background auto-commits every few seconds -### 4. Configure Resources +**Reference**: See `references/volumes.md` for v2 volumes, concurrent writes, and best practices. -Request CPU cores, memory, and disk for functions. +### Secrets + +Securely pass credentials to functions: ```python -@app.function( - cpu=8.0, # 8 physical cores - memory=32768, # 32 GiB RAM - ephemeral_disk=10240 # 10 GiB disk -) -def memory_intensive_task(): - pass -``` - -Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher. - -See `references/resources.md` for resource limits and billing details. - -### 5. Scale Automatically - -Modal autoscales functions from zero to thousands of containers based on demand. - -**Process inputs in parallel:** -```python -@app.function() -def analyze_sample(sample_id: int): - # Process single sample - return result - -@app.local_entrypoint() -def main(): - sample_ids = range(1000) - # Automatically parallelized across containers - results = list(analyze_sample.map(sample_ids)) -``` - -**Configure autoscaling:** -```python -@app.function( - max_containers=100, # Upper limit - min_containers=2, # Keep warm - buffer_containers=5 # Idle buffer for bursts -) -def inference(): - pass -``` - -See `references/scaling.md` for autoscaling configuration, concurrency, and scaling limits. - -### 6. Store Data Persistently - -Use Volumes for persistent storage across function invocations. - -```python -volume = modal.Volume.from_name("my-data", create_if_missing=True) - -@app.function(volumes={"/data": volume}) -def save_results(data): - with open("/data/results.txt", "w") as f: - f.write(data) - volume.commit() # Persist changes -``` - -Volumes persist data between runs, store model weights, cache datasets, and share data between functions. - -See `references/volumes.md` for volume management, commits, and caching patterns. - -### 7. Manage Secrets - -Store API keys and credentials securely using Modal Secrets. - -```python -@app.function(secrets=[modal.Secret.from_name("huggingface")]) -def download_model(): +@app.function(secrets=[modal.Secret.from_name("my-api-keys")]) +def call_api(): import os - token = os.environ["HF_TOKEN"] - # Use token for authentication + api_key = os.environ["API_KEY"] + # Use the key ``` -**Create secrets in Modal dashboard or via CLI:** -```bash -modal secret create my-secret KEY=value API_TOKEN=xyz -``` +Create secrets via CLI: `modal secret create my-api-keys API_KEY=sk-xxx` -See `references/secrets.md` for secret management and authentication patterns. +Or from a `.env` file: `modal.Secret.from_dotenv()` -### 8. Deploy Web Endpoints +**Reference**: See `references/secrets.md` for dashboard setup, multiple secrets, and templates. -Serve HTTP endpoints, APIs, and webhooks with `@modal.web_endpoint()`. +### Web Endpoints + +Serve models and APIs as web endpoints: ```python @app.function() -@modal.web_endpoint(method="POST") -def predict(data: dict): - # Process request - result = model.predict(data["input"]) - return {"prediction": result} +@modal.fastapi_endpoint() +def predict(text: str): + return {"result": model.predict(text)} ``` -**Deploy with:** -```bash -modal deploy script.py -``` +- `modal serve script.py` — Development with hot reload and temporary URL +- `modal deploy script.py` — Production deployment with permanent URL +- Supports FastAPI, ASGI (Starlette, FastHTML), WSGI (Flask, Django), WebSockets +- Request bodies up to 4 GiB, unlimited response size -Modal provides HTTPS URL for the endpoint. +**Reference**: See `references/web-endpoints.md` for ASGI/WSGI apps, streaming, auth, and WebSockets. -See `references/web-endpoints.md` for FastAPI integration, streaming, authentication, and WebSocket support. +### Scheduled Jobs -### 9. Schedule Jobs - -Run functions on a schedule with cron expressions. +Run functions on a schedule: ```python -@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM -def daily_backup(): - # Backup data - pass +@app.function(schedule=modal.Cron("0 9 * * *")) # Daily at 9 AM UTC +def daily_pipeline(): + # ETL, retraining, scraping, etc. + ... -@app.function(schedule=modal.Period(hours=4)) # Every 4 hours -def refresh_cache(): - # Update cache - pass +@app.function(schedule=modal.Period(hours=6)) +def periodic_check(): + ... ``` -Scheduled functions run automatically without manual invocation. +Deploy with `modal deploy script.py` to activate the schedule. -See `references/scheduled-jobs.md` for cron syntax, timezone configuration, and monitoring. +- `modal.Cron("...")` — Standard cron syntax, stable across deploys +- `modal.Period(hours=N)` — Fixed interval, resets on redeploy +- Monitor runs in the Modal dashboard -## Common Workflows +**Reference**: See `references/scheduled-jobs.md` for cron syntax and management. -### Deploy ML Model for Inference +### Scaling and Concurrency + +Modal autoscales containers automatically. Configure limits: + +```python +@app.function( + max_containers=100, # Upper limit + min_containers=2, # Keep warm for low latency + buffer_containers=5, # Reserve capacity + scaledown_window=300, # Idle seconds before shutdown +) +def process(data): + ... +``` + +Process inputs in parallel with `.map()`: + +```python +results = list(process.map([item1, item2, item3, ...])) +``` + +Enable concurrent request handling per container: + +```python +@app.function() +@modal.concurrent(max_inputs=10) +async def handle_request(req): + ... +``` + +**Reference**: See `references/scaling.md` for `.map()`, `.starmap()`, `.spawn()`, and limits. + +### Resource Configuration + +```python +@app.function( + cpu=4.0, # Physical cores (not vCPUs) + memory=16384, # MiB + ephemeral_disk=51200, # MiB (up to 3 TiB) + timeout=3600, # Seconds +) +def heavy_computation(): + ... +``` + +Defaults: 0.125 CPU cores, 128 MiB memory. Billed on max(request, usage). + +**Reference**: See `references/resources.md` for limits and billing details. + +## Classes with Lifecycle Hooks + +For stateful workloads (e.g., loading a model once and serving many requests): + +```python +@app.cls(gpu="L40S", image=image) +class Predictor: + @modal.enter() + def load_model(self): + self.model = load_heavy_model() # Runs once on container start + + @modal.method() + def predict(self, text: str): + return self.model(text) + + @modal.exit() + def cleanup(self): + ... # Runs on container shutdown +``` + +Call with: `Predictor().predict.remote("hello")` + +## Common Workflow Patterns + +### GPU Model Inference Service ```python import modal -# Define dependencies -image = modal.Image.debian_slim().uv_pip_install("torch", "transformers") -app = modal.App("llm-inference", image=image) +app = modal.App("llm-service") -# Download model at build time -@app.function() -def download_model(): - from transformers import AutoModel - AutoModel.from_pretrained("bert-base-uncased") +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("vllm") +) -# Serve model -@app.cls(gpu="L40S") -class Model: +@app.cls(gpu="H100", image=image, min_containers=1) +class LLMService: @modal.enter() - def load_model(self): - from transformers import pipeline - self.pipe = pipeline("text-classification", device="cuda") + def load(self): + from vllm import LLM + self.llm = LLM(model="meta-llama/Llama-3-70B") @modal.method() - def predict(self, text: str): - return self.pipe(text) - -@app.local_entrypoint() -def main(): - model = Model() - result = model.predict.remote("Modal is great!") - print(result) + @modal.fastapi_endpoint(method="POST") + def generate(self, prompt: str, max_tokens: int = 256): + outputs = self.llm.generate([prompt], max_tokens=max_tokens) + return {"text": outputs[0].outputs[0].text} ``` -### Batch Process Large Dataset +### Batch Processing Pipeline ```python -@app.function(cpu=2.0, memory=4096) -def process_file(file_path: str): +app = modal.App("batch-pipeline") +vol = modal.Volume.from_name("pipeline-data", create_if_missing=True) + +@app.function(volumes={"/data": vol}, cpu=4.0, memory=8192) +def process_chunk(chunk_id: int): import pandas as pd - df = pd.read_csv(file_path) - # Process data - return df.shape[0] + df = pd.read_parquet(f"/data/input/chunk_{chunk_id}.parquet") + result = heavy_transform(df) + result.to_parquet(f"/data/output/chunk_{chunk_id}.parquet") + return len(result) @app.local_entrypoint() def main(): - files = ["file1.csv", "file2.csv", ...] # 1000s of files - # Automatically parallelized across containers - for count in process_file.map(files): - print(f"Processed {count} rows") + chunk_ids = list(range(100)) + results = list(process_chunk.map(chunk_ids)) + print(f"Processed {sum(results)} total rows") ``` -### Train Model on GPU +### Scheduled Data Pipeline ```python +app = modal.App("etl-pipeline") + @app.function( - gpu="A100:2", # 2x A100 GPUs - timeout=3600 # 1 hour timeout + schedule=modal.Cron("0 */6 * * *"), # Every 6 hours + secrets=[modal.Secret.from_name("db-credentials")], ) -def train_model(config: dict): - import torch - # Multi-GPU training code - model = create_model(config) - train(model) - return metrics +def etl_job(): + import os + db_url = os.environ["DATABASE_URL"] + # Extract, transform, load + ... ``` -## Reference Documentation +## CLI Reference -Detailed documentation for specific features: +| Command | Description | +|---------|-------------| +| `modal setup` | Authenticate with Modal | +| `modal run script.py` | Run a script's local entrypoint | +| `modal serve script.py` | Dev server with hot reload | +| `modal deploy script.py` | Deploy to production | +| `modal volume ls ` | List files in a volume | +| `modal volume put ` | Upload file to volume | +| `modal volume get ` | Download file from volume | +| `modal secret create K=V` | Create a secret | +| `modal secret list` | List secrets | +| `modal app list` | List deployed apps | +| `modal app stop ` | Stop a deployed app | -- **`references/getting-started.md`** - Authentication, setup, basic concepts -- **`references/images.md`** - Image building, dependencies, Dockerfiles -- **`references/functions.md`** - Function patterns, deployment, parameters -- **`references/gpu.md`** - GPU types, CUDA, multi-GPU configuration -- **`references/resources.md`** - CPU, memory, disk management -- **`references/scaling.md`** - Autoscaling, parallel execution, concurrency -- **`references/volumes.md`** - Persistent storage, data management -- **`references/secrets.md`** - Environment variables, authentication -- **`references/web-endpoints.md`** - APIs, webhooks, endpoints -- **`references/scheduled-jobs.md`** - Cron jobs, periodic tasks -- **`references/examples.md`** - Common patterns for scientific computing +## Reference Files -## Best Practices +Detailed documentation for each topic: -1. **Pin dependencies** in `.uv_pip_install()` for reproducible builds -2. **Use appropriate GPU types** - L40S for inference, H100/A100 for training -3. **Leverage caching** - Use Volumes for model weights and datasets -4. **Configure autoscaling** - Set `max_containers` and `min_containers` based on workload -5. **Import packages in function body** if not available locally -6. **Use `.map()` for parallel processing** instead of sequential loops -7. **Store secrets securely** - Never hardcode API keys -8. **Monitor costs** - Check Modal dashboard for usage and billing - -## Troubleshooting - -**"Module not found" errors:** -- Add packages to image with `.uv_pip_install("package-name")` -- Import packages inside function body if not available locally - -**GPU not detected:** -- Verify GPU specification: `@app.function(gpu="A100")` -- Check CUDA availability: `torch.cuda.is_available()` - -**Function timeout:** -- Increase timeout: `@app.function(timeout=3600)` -- Default timeout is 5 minutes - -**Volume changes not persisting:** -- Call `volume.commit()` after writing files -- Verify volume mounted correctly in function decorator - -For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community. +- `references/getting-started.md` — Installation, authentication, first app +- `references/functions.md` — Functions, classes, lifecycle hooks, remote execution +- `references/images.md` — Container images, package installation, caching +- `references/gpu.md` — GPU types, selection, multi-GPU, training +- `references/volumes.md` — Persistent storage, file management, v2 volumes +- `references/secrets.md` — Credentials, environment variables, dotenv +- `references/web-endpoints.md` — FastAPI, ASGI/WSGI, streaming, auth, WebSockets +- `references/scheduled-jobs.md` — Cron, periodic schedules, management +- `references/scaling.md` — Autoscaling, concurrency, .map(), limits +- `references/resources.md` — CPU, memory, disk, timeout configuration +- `references/examples.md` — Common use cases and patterns +- `references/api_reference.md` — Key API classes and methods +Read these files when detailed information is needed beyond this overview. diff --git a/scientific-skills/modal/references/api_reference.md b/scientific-skills/modal/references/api_reference.md index b04e194..668e2c8 100644 --- a/scientific-skills/modal/references/api_reference.md +++ b/scientific-skills/modal/references/api_reference.md @@ -1,34 +1,187 @@ -# Reference Documentation for Modal +# Modal API Reference -This is a placeholder for detailed reference documentation. -Replace with actual reference content or delete if not needed. +## Core Classes -Example real reference docs from other skills: -- product-management/references/communication.md - Comprehensive guide for status updates -- product-management/references/context_building.md - Deep-dive on gathering context -- bigquery/references/ - API references and query examples +### modal.App -## When Reference Docs Are Useful +The main unit of deployment. Groups related functions. -Reference docs are ideal for: -- Comprehensive API documentation -- Detailed workflow guides -- Complex multi-step processes -- Information too lengthy for main SKILL.md -- Content that's only needed for specific use cases +```python +app = modal.App("my-app") +``` -## Structure Suggestions +| Method | Description | +|--------|-------------| +| `app.function(**kwargs)` | Decorator to register a function | +| `app.cls(**kwargs)` | Decorator to register a class | +| `app.local_entrypoint()` | Decorator for local entry point | -### API Reference Example -- Overview -- Authentication -- Endpoints with examples -- Error codes -- Rate limits +### modal.Function -### Workflow Guide Example -- Prerequisites -- Step-by-step instructions -- Common patterns -- Troubleshooting -- Best practices +A serverless function backed by an autoscaling container pool. + +| Method | Description | +|--------|-------------| +| `.remote(*args)` | Execute in the cloud (sync) | +| `.local(*args)` | Execute locally | +| `.spawn(*args)` | Execute async, returns `FunctionCall` | +| `.map(inputs)` | Parallel execution over inputs | +| `.starmap(inputs)` | Parallel execution with multiple args | +| `.from_name(app, fn)` | Reference a deployed function | +| `.update_autoscaler(**kwargs)` | Dynamic scaling update | + +### modal.Cls + +A serverless class with lifecycle hooks. + +```python +@app.cls(gpu="L40S") +class MyClass: + @modal.enter() + def setup(self): ... + + @modal.method() + def run(self, data): ... + + @modal.exit() + def cleanup(self): ... +``` + +| Decorator | Description | +|-----------|-------------| +| `@modal.enter()` | Container startup hook | +| `@modal.exit()` | Container shutdown hook | +| `@modal.method()` | Expose as callable method | +| `@modal.parameter()` | Class-level parameter | + +## Image + +### modal.Image + +Defines the container environment. + +| Method | Description | +|--------|-------------| +| `.debian_slim(python_version=)` | Debian base image | +| `.from_registry(tag)` | Docker Hub image | +| `.from_dockerfile(path)` | Build from Dockerfile | +| `.micromamba(python_version=)` | Conda/mamba base | +| `.uv_pip_install(*pkgs)` | Install with uv (recommended) | +| `.pip_install(*pkgs)` | Install with pip | +| `.pip_install_from_requirements(path)` | Install from file | +| `.apt_install(*pkgs)` | Install system packages | +| `.run_commands(*cmds)` | Run shell commands | +| `.run_function(fn)` | Run Python during build | +| `.add_local_dir(local, remote)` | Add directory | +| `.add_local_file(local, remote)` | Add single file | +| `.add_local_python_source(module)` | Add Python module | +| `.env(dict)` | Set environment variables | +| `.imports()` | Context manager for remote imports | + +## Storage + +### modal.Volume + +Distributed persistent file storage. + +```python +vol = modal.Volume.from_name("name", create_if_missing=True) +``` + +| Method | Description | +|--------|-------------| +| `.from_name(name)` | Reference or create a volume | +| `.commit()` | Force immediate commit | +| `.reload()` | Refresh to see other containers' writes | + +Mount: `@app.function(volumes={"/path": vol})` + +### modal.NetworkFileSystem + +Legacy shared storage (superseded by Volume). + +## Secrets + +### modal.Secret + +Secure credential injection. + +| Method | Description | +|--------|-------------| +| `.from_name(name)` | Reference a named secret | +| `.from_dict(dict)` | Create inline (dev only) | +| `.from_dotenv()` | Load from .env file | + +Usage: `@app.function(secrets=[modal.Secret.from_name("x")])` + +Access in function: `os.environ["KEY"]` + +## Scheduling + +### modal.Cron + +```python +schedule = modal.Cron("0 9 * * *") # Cron syntax +``` + +### modal.Period + +```python +schedule = modal.Period(hours=6) # Fixed interval +``` + +Usage: `@app.function(schedule=modal.Cron("..."))` + +## Web + +### Decorators + +| Decorator | Description | +|-----------|-------------| +| `@modal.fastapi_endpoint()` | Simple FastAPI endpoint | +| `@modal.asgi_app()` | Full ASGI app (FastAPI, Starlette) | +| `@modal.wsgi_app()` | Full WSGI app (Flask, Django) | +| `@modal.web_server(port=)` | Custom web server | + +### Function Modifiers + +| Decorator | Description | +|-----------|-------------| +| `@modal.concurrent(max_inputs=)` | Handle multiple inputs per container | +| `@modal.batched(max_batch_size=, wait_ms=)` | Dynamic input batching | + +## GPU Strings + +| String | GPU | +|--------|-----| +| `"T4"` | NVIDIA T4 16GB | +| `"L4"` | NVIDIA L4 24GB | +| `"A10"` | NVIDIA A10 24GB | +| `"L40S"` | NVIDIA L40S 48GB | +| `"A100-40GB"` | NVIDIA A100 40GB | +| `"A100-80GB"` | NVIDIA A100 80GB | +| `"H100"` | NVIDIA H100 80GB | +| `"H100!"` | H100 (no auto-upgrade) | +| `"H200"` | NVIDIA H200 141GB | +| `"B200"` | NVIDIA B200 192GB | +| `"B200+"` | B200 or B300, B200 price | +| `"H100:4"` | 4x H100 | + +## CLI Commands + +| Command | Description | +|---------|-------------| +| `modal setup` | Authenticate | +| `modal run ` | Run local entrypoint | +| `modal serve ` | Dev server with hot reload | +| `modal deploy ` | Production deployment | +| `modal app list` | List deployed apps | +| `modal app stop ` | Stop an app | +| `modal volume create ` | Create volume | +| `modal volume ls ` | List volume files | +| `modal volume put ` | Upload to volume | +| `modal volume get ` | Download from volume | +| `modal secret create K=V` | Create secret | +| `modal secret list` | List secrets | +| `modal secret delete ` | Delete secret | +| `modal token set` | Set auth token | diff --git a/scientific-skills/modal/references/examples.md b/scientific-skills/modal/references/examples.md index 1e38654..f0c47c2 100644 --- a/scientific-skills/modal/references/examples.md +++ b/scientific-skills/modal/references/examples.md @@ -1,433 +1,266 @@ -# Common Patterns for Scientific Computing +# Modal Common Examples -## Machine Learning Model Inference - -### Basic Model Serving +## LLM Inference Service (vLLM) ```python import modal -app = modal.App("ml-inference") +app = modal.App("vllm-service") image = ( - modal.Image.debian_slim() - .uv_pip_install("torch", "transformers") + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("vllm>=0.6.0") ) -@app.cls( - image=image, - gpu="L40S", -) -class Model: +@app.cls(gpu="H100", image=image, min_containers=1) +class LLMService: @modal.enter() - def load_model(self): - from transformers import AutoModel, AutoTokenizer - self.model = AutoModel.from_pretrained("bert-base-uncased") - self.tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") + def load(self): + from vllm import LLM + self.llm = LLM(model="meta-llama/Llama-3-70B-Instruct") @modal.method() - def predict(self, text: str): - inputs = self.tokenizer(text, return_tensors="pt") - outputs = self.model(**inputs) - return outputs.last_hidden_state.mean(dim=1).tolist() + def generate(self, prompt: str, max_tokens: int = 512) -> str: + from vllm import SamplingParams + params = SamplingParams(max_tokens=max_tokens, temperature=0.7) + outputs = self.llm.generate([prompt], params) + return outputs[0].outputs[0].text -@app.local_entrypoint() -def main(): - model = Model() - result = model.predict.remote("Hello world") - print(result) + @modal.fastapi_endpoint(method="POST") + def api(self, request: dict): + text = self.generate(request["prompt"], request.get("max_tokens", 512)) + return {"text": text} ``` -### Model Serving with Volume +## Image Generation (Flux) ```python -volume = modal.Volume.from_name("models", create_if_missing=True) -MODEL_PATH = "/models" +import modal -@app.cls( - image=image, - gpu="A100", - volumes={MODEL_PATH: volume} +app = modal.App("image-gen") + +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("diffusers", "torch", "transformers", "accelerate") ) -class ModelServer: + +vol = modal.Volume.from_name("flux-weights", create_if_missing=True) + +@app.cls(gpu="L40S", image=image, volumes={"/models": vol}) +class ImageGenerator: @modal.enter() def load(self): import torch - self.model = torch.load(f"{MODEL_PATH}/model.pt") - self.model.eval() + from diffusers import FluxPipeline + self.pipe = FluxPipeline.from_pretrained( + "black-forest-labs/FLUX.1-schnell", + torch_dtype=torch.bfloat16, + cache_dir="/models", + ).to("cuda") @modal.method() - def infer(self, data): - import torch - with torch.no_grad(): - return self.model(torch.tensor(data)).tolist() + def generate(self, prompt: str) -> bytes: + image = self.pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0] + import io + buf = io.BytesIO() + image.save(buf, format="PNG") + return buf.getvalue() ``` -## Batch Processing - -### Parallel Data Processing +## Speech Transcription (Whisper) ```python -@app.function( - image=modal.Image.debian_slim().uv_pip_install("pandas", "numpy"), - cpu=2.0, - memory=8192 +import modal + +app = modal.App("transcription") + +image = ( + modal.Image.debian_slim(python_version="3.11") + .apt_install("ffmpeg") + .uv_pip_install("openai-whisper", "torch") ) -def process_batch(batch_id: int): - import pandas as pd - # Load batch - df = pd.read_csv(f"s3://bucket/batch_{batch_id}.csv") - - # Process - result = df.apply(lambda row: complex_calculation(row), axis=1) - - # Save result - result.to_csv(f"s3://bucket/results_{batch_id}.csv") - - return batch_id - -@app.local_entrypoint() -def main(): - # Process 100 batches in parallel - results = list(process_batch.map(range(100))) - print(f"Processed {len(results)} batches") -``` - -### Batch Processing with Progress - -```python -@app.function() -def process_item(item_id: int): - # Expensive processing - result = compute_something(item_id) - return result - -@app.local_entrypoint() -def main(): - items = list(range(1000)) - - print(f"Processing {len(items)} items...") - results = [] - for i, result in enumerate(process_item.map(items)): - results.append(result) - if (i + 1) % 100 == 0: - print(f"Completed {i + 1}/{len(items)}") - - print("All items processed!") -``` - -## Data Analysis Pipeline - -### ETL Pipeline - -```python -volume = modal.Volume.from_name("data-pipeline") -DATA_PATH = "/data" - -@app.function( - image=modal.Image.debian_slim().uv_pip_install("pandas", "polars"), - volumes={DATA_PATH: volume}, - cpu=4.0, - memory=16384 -) -def extract_transform_load(): - import polars as pl - - # Extract - raw_data = pl.read_csv(f"{DATA_PATH}/raw/*.csv") - - # Transform - transformed = ( - raw_data - .filter(pl.col("value") > 0) - .group_by("category") - .agg([ - pl.col("value").mean().alias("avg_value"), - pl.col("value").sum().alias("total_value") - ]) - ) - - # Load - transformed.write_parquet(f"{DATA_PATH}/processed/data.parquet") - volume.commit() - - return transformed.shape - -@app.function(schedule=modal.Cron("0 2 * * *")) -def daily_pipeline(): - result = extract_transform_load.remote() - print(f"Processed data shape: {result}") -``` - -## GPU-Accelerated Computing - -### Distributed Training - -```python -@app.function( - gpu="A100:2", - image=modal.Image.debian_slim().uv_pip_install("torch", "accelerate"), - timeout=7200, -) -def train_model(): - import torch - from torch.nn.parallel import DataParallel - - # Load data - train_loader = get_data_loader() - - # Initialize model - model = MyModel() - model = DataParallel(model) - model = model.cuda() - - # Train - optimizer = torch.optim.Adam(model.parameters()) - for epoch in range(10): - for batch in train_loader: - loss = train_step(model, batch, optimizer) - print(f"Epoch {epoch}, Loss: {loss}") - - return "Training complete" -``` - -### GPU Batch Inference - -```python -@app.function( - gpu="L40S", - image=modal.Image.debian_slim().uv_pip_install("torch", "transformers") -) -def batch_inference(texts: list[str]): - from transformers import pipeline - - classifier = pipeline("sentiment-analysis", device=0) - results = classifier(texts, batch_size=32) - - return results - -@app.local_entrypoint() -def main(): - # Process 10,000 texts - texts = load_texts() - - # Split into chunks of 100 - chunks = [texts[i:i+100] for i in range(0, len(texts), 100)] - - # Process in parallel on multiple GPUs - all_results = [] - for results in batch_inference.map(chunks): - all_results.extend(results) - - print(f"Processed {len(all_results)} texts") -``` - -## Scientific Computing - -### Molecular Dynamics Simulation - -```python -@app.function( - image=modal.Image.debian_slim().apt_install("openmpi-bin").uv_pip_install("mpi4py", "numpy"), - cpu=16.0, - memory=65536, - timeout=7200, -) -def run_simulation(config: dict): - import numpy as np - - # Initialize system - positions = initialize_positions(config["n_particles"]) - velocities = initialize_velocities(config["temperature"]) - - # Run MD steps - for step in range(config["n_steps"]): - forces = compute_forces(positions) - velocities += forces * config["dt"] - positions += velocities * config["dt"] - - if step % 1000 == 0: - energy = compute_energy(positions, velocities) - print(f"Step {step}, Energy: {energy}") - - return positions, velocities -``` - -### Distributed Monte Carlo - -```python -@app.function(cpu=2.0) -def monte_carlo_trial(trial_id: int, n_samples: int): - import random - - count = sum(1 for _ in range(n_samples) - if random.random()**2 + random.random()**2 <= 1) - - return count - -@app.local_entrypoint() -def estimate_pi(): - n_trials = 100 - n_samples_per_trial = 1_000_000 - - # Run trials in parallel - results = list(monte_carlo_trial.map( - range(n_trials), - [n_samples_per_trial] * n_trials - )) - - total_count = sum(results) - total_samples = n_trials * n_samples_per_trial - - pi_estimate = 4 * total_count / total_samples - print(f"Estimated π = {pi_estimate}") -``` - -## Data Processing with Volumes - -### Image Processing Pipeline - -```python -volume = modal.Volume.from_name("images") -IMAGE_PATH = "/images" - -@app.function( - image=modal.Image.debian_slim().uv_pip_install("Pillow", "numpy"), - volumes={IMAGE_PATH: volume} -) -def process_image(filename: str): - from PIL import Image - import numpy as np - - # Load image - img = Image.open(f"{IMAGE_PATH}/raw/{filename}") - - # Process - img_array = np.array(img) - processed = apply_filters(img_array) - - # Save - result_img = Image.fromarray(processed) - result_img.save(f"{IMAGE_PATH}/processed/{filename}") - - return filename - -@app.function(volumes={IMAGE_PATH: volume}) -def process_all_images(): - import os - - # Get all images - filenames = os.listdir(f"{IMAGE_PATH}/raw") - - # Process in parallel - results = list(process_image.map(filenames)) - - volume.commit() - return f"Processed {len(results)} images" -``` - -## Web API for Scientific Computing - -```python -image = modal.Image.debian_slim().uv_pip_install("fastapi[standard]", "numpy", "scipy") - -@app.function(image=image) -@modal.fastapi_endpoint(method="POST") -def compute_statistics(data: dict): - import numpy as np - from scipy import stats - - values = np.array(data["values"]) - - return { - "mean": float(np.mean(values)), - "median": float(np.median(values)), - "std": float(np.std(values)), - "skewness": float(stats.skew(values)), - "kurtosis": float(stats.kurtosis(values)) - } -``` - -## Scheduled Data Collection - -```python -@app.function( - schedule=modal.Cron("*/30 * * * *"), # Every 30 minutes - secrets=[modal.Secret.from_name("api-keys")], - volumes={"/data": modal.Volume.from_name("sensor-data")} -) -def collect_sensor_data(): - import requests - import json - from datetime import datetime - - # Fetch from API - response = requests.get( - "https://api.example.com/sensors", - headers={"Authorization": f"Bearer {os.environ['API_KEY']}"} - ) - - data = response.json() - - # Save with timestamp - timestamp = datetime.now().isoformat() - with open(f"/data/{timestamp}.json", "w") as f: - json.dump(data, f) - - volume.commit() - - return f"Collected {len(data)} sensor readings" -``` - -## Best Practices - -### Use Classes for Stateful Workloads - -```python -@app.cls(gpu="A100") -class ModelService: +@app.cls(gpu="T4", image=image) +class Transcriber: @modal.enter() - def setup(self): - # Load once, reuse across requests - self.model = load_heavy_model() + def load(self): + import whisper + self.model = whisper.load_model("large-v3") @modal.method() - def predict(self, x): - return self.model(x) + def transcribe(self, audio_path: str) -> dict: + return self.model.transcribe(audio_path) ``` -### Batch Similar Workloads +## Batch Data Processing ```python -@app.function() -def process_many(items: list): - # More efficient than processing one at a time - return [process(item) for item in items] +import modal + +app = modal.App("batch-processor") + +image = modal.Image.debian_slim().uv_pip_install("pandas", "pyarrow") +vol = modal.Volume.from_name("batch-data", create_if_missing=True) + +@app.function(image=image, volumes={"/data": vol}, cpu=4.0, memory=8192) +def process_chunk(chunk_id: int) -> dict: + import pandas as pd + df = pd.read_parquet(f"/data/input/chunk_{chunk_id:04d}.parquet") + result = df.groupby("category").agg({"value": ["sum", "mean", "count"]}) + result.to_parquet(f"/data/output/result_{chunk_id:04d}.parquet") + return {"chunk_id": chunk_id, "rows": len(df)} + +@app.local_entrypoint() +def main(): + chunk_ids = list(range(500)) + results = list(process_chunk.map(chunk_ids)) + total = sum(r["rows"] for r in results) + print(f"Processed {total} total rows across {len(results)} chunks") ``` -### Use Volumes for Large Datasets +## Web Scraping at Scale ```python -# Store large datasets in volumes, not in image -volume = modal.Volume.from_name("dataset") +import modal -@app.function(volumes={"/data": volume}) -def train(): - data = load_from_volume("/data/training.parquet") - model = train_model(data) +app = modal.App("scraper") + +image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4") + +@app.function(image=image, retries=3, timeout=60) +def scrape_url(url: str) -> dict: + import httpx + from bs4 import BeautifulSoup + response = httpx.get(url, follow_redirects=True, timeout=30) + soup = BeautifulSoup(response.text, "html.parser") + return { + "url": url, + "title": soup.title.string if soup.title else None, + "text": soup.get_text()[:5000], + } + +@app.local_entrypoint() +def main(): + urls = ["https://example.com", "https://example.org"] # Your URL list + results = list(scrape_url.map(urls)) + for r in results: + print(f"{r['url']}: {r['title']}") ``` -### Profile Before Scaling to GPUs +## Protein Structure Prediction ```python -# Test on CPU first -@app.function(cpu=4.0) -def test_pipeline(): - ... +import modal -# Then scale to GPU if needed -@app.function(gpu="A100") -def gpu_pipeline(): - ... +app = modal.App("protein-folding") + +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("chai-lab") +) + +vol = modal.Volume.from_name("protein-data", create_if_missing=True) + +@app.function(gpu="A100-80GB", image=image, volumes={"/data": vol}, timeout=3600) +def fold_protein(sequence: str) -> str: + from chai_lab.chai1 import run_inference + output = run_inference( + fasta_file=write_fasta(sequence, "/data/input.fasta"), + output_dir="/data/output/", + ) + return str(output) +``` + +## Scheduled ETL Pipeline + +```python +import modal + +app = modal.App("etl") + +image = modal.Image.debian_slim().uv_pip_install("pandas", "sqlalchemy", "psycopg2-binary") + +@app.function( + image=image, + schedule=modal.Cron("0 3 * * *"), # 3 AM UTC daily + secrets=[modal.Secret.from_name("database-creds")], + timeout=7200, +) +def daily_etl(): + import os + import pandas as pd + from sqlalchemy import create_engine + + source = create_engine(os.environ["SOURCE_DB"]) + dest = create_engine(os.environ["DEST_DB"]) + + df = pd.read_sql("SELECT * FROM events WHERE date = CURRENT_DATE - 1", source) + df = transform(df) + df.to_sql("daily_summary", dest, if_exists="append", index=False) + print(f"Loaded {len(df)} rows") +``` + +## FastAPI with GPU Model + +```python +import modal + +app = modal.App("api-with-gpu") + +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("fastapi", "sentence-transformers", "torch") +) + +@app.cls(gpu="L40S", image=image, min_containers=1) +class EmbeddingService: + @modal.enter() + def load(self): + from sentence_transformers import SentenceTransformer + self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda") + + @modal.asgi_app() + def serve(self): + from fastapi import FastAPI + api = FastAPI() + + @api.post("/embed") + async def embed(request: dict): + embeddings = self.model.encode(request["texts"]) + return {"embeddings": embeddings.tolist()} + + @api.get("/health") + async def health(): + return {"status": "ok"} + + return api +``` + +## Document OCR Job Queue + +```python +import modal + +app = modal.App("ocr-queue") + +image = modal.Image.debian_slim().uv_pip_install("pytesseract", "Pillow").apt_install("tesseract-ocr") +vol = modal.Volume.from_name("ocr-data", create_if_missing=True) + +@app.function(image=image, volumes={"/data": vol}) +def ocr_page(image_path: str) -> str: + import pytesseract + from PIL import Image + img = Image.open(image_path) + return pytesseract.image_to_string(img) + +@app.function(volumes={"/data": vol}) +def process_document(doc_id: str): + import os + pages = sorted(os.listdir(f"/data/docs/{doc_id}/")) + paths = [f"/data/docs/{doc_id}/{p}" for p in pages] + texts = list(ocr_page.map(paths)) + full_text = "\n\n".join(texts) + with open(f"/data/results/{doc_id}.txt", "w") as f: + f.write(full_text) + return {"doc_id": doc_id, "pages": len(texts)} ``` diff --git a/scientific-skills/modal/references/functions.md b/scientific-skills/modal/references/functions.md index 5e64a9f..bd9a364 100644 --- a/scientific-skills/modal/references/functions.md +++ b/scientific-skills/modal/references/functions.md @@ -1,274 +1,260 @@ -# Modal Functions +# Modal Functions and Classes -## Basic Function Definition +## Table of Contents -Decorate Python functions with `@app.function()`: +- [Functions](#functions) +- [Remote Execution](#remote-execution) +- [Classes with Lifecycle Hooks](#classes-with-lifecycle-hooks) +- [Parallel Execution](#parallel-execution) +- [Async Functions](#async-functions) +- [Local Entrypoints](#local-entrypoints) +- [Generators](#generators) + +## Functions + +### Basic Function ```python import modal -app = modal.App(name="my-app") +app = modal.App("my-app") @app.function() -def my_function(): - print("Hello from Modal!") - return "result" +def compute(x: int, y: int) -> int: + return x + y ``` -## Calling Functions +### Function Parameters -### Remote Execution +The `@app.function()` decorator accepts: -Call `.remote()` to run on Modal: +| Parameter | Type | Description | +|-----------|------|-------------| +| `image` | `Image` | Container image | +| `gpu` | `str` | GPU type (e.g., `"H100"`, `"A100:2"`) | +| `cpu` | `float` | CPU cores | +| `memory` | `int` | Memory in MiB | +| `timeout` | `int` | Max execution time in seconds | +| `secrets` | `list[Secret]` | Secrets to inject | +| `volumes` | `dict[str, Volume]` | Volumes to mount | +| `schedule` | `Schedule` | Cron or periodic schedule | +| `max_containers` | `int` | Max container count | +| `min_containers` | `int` | Minimum warm containers | +| `retries` | `int` | Retry count on failure | +| `concurrency_limit` | `int` | Max concurrent inputs | +| `ephemeral_disk` | `int` | Disk in MiB | + +## Remote Execution + +### `.remote()` — Synchronous Call ```python -@app.local_entrypoint() -def main(): - result = my_function.remote() - print(result) +result = compute.remote(3, 4) # Runs in the cloud, blocks until done ``` -### Local Execution - -Call `.local()` to run locally (useful for testing): +### `.local()` — Local Execution ```python -result = my_function.local() +result = compute.local(3, 4) # Runs locally (for testing) ``` -## Function Parameters - -Functions accept standard Python arguments: +### `.spawn()` — Async Fire-and-Forget ```python -@app.function() -def process(x: int, y: str): - return f"{y}: {x * 2}" - -@app.local_entrypoint() -def main(): - result = process.remote(42, "answer") +call = compute.spawn(3, 4) # Returns immediately +# ... do other work ... +result = call.get() # Retrieve result later ``` -## Deployment +`.spawn()` supports up to 1 million pending inputs. -### Ephemeral Apps +## Classes with Lifecycle Hooks -Run temporarily: -```bash -modal run script.py -``` - -### Deployed Apps - -Deploy persistently: -```bash -modal deploy script.py -``` - -Access deployed functions from other code: +Use `@app.cls()` for stateful workloads where you want to load resources once: ```python -f = modal.Function.from_name("my-app", "my_function") -result = f.remote(args) +@app.cls(gpu="L40S", image=image) +class Model: + @modal.enter() + def setup(self): + """Runs once when the container starts.""" + import torch + self.model = torch.load("/weights/model.pt") + self.model.eval() + + @modal.method() + def predict(self, text: str) -> dict: + """Callable remotely.""" + return self.model(text) + + @modal.exit() + def teardown(self): + """Runs when the container shuts down.""" + cleanup_resources() ``` -## Entrypoints +### Lifecycle Decorators -### Local Entrypoint +| Decorator | When It Runs | +|-----------|-------------| +| `@modal.enter()` | Once on container startup, before any inputs | +| `@modal.method()` | For each remote call | +| `@modal.exit()` | On container shutdown | -Code that runs on local machine: +### Calling Class Methods ```python -@app.local_entrypoint() -def main(): - result = my_function.remote() - print(result) +# Create instance and call method +model = Model() +result = model.predict.remote("Hello world") + +# Parallel calls +results = list(model.predict.map(["text1", "text2", "text3"])) ``` -### Remote Entrypoint - -Use `@app.function()` without local_entrypoint - runs entirely on Modal: +### Parameterized Classes ```python -@app.function() -def train_model(): - # All code runs in Modal - ... -``` +@app.cls() +class Worker: + model_name: str = modal.parameter() -Invoke with: -```bash -modal run script.py::app.train_model -``` + @modal.enter() + def load(self): + self.model = load_model(self.model_name) -## Argument Parsing + @modal.method() + def run(self, data): + return self.model(data) -Entrypoints with primitive type arguments get automatic CLI parsing: - -```python -@app.local_entrypoint() -def main(foo: int, bar: str): - some_function.remote(foo, bar) -``` - -Run with: -```bash -modal run script.py --foo 1 --bar "hello" -``` - -For custom parsing, accept variable-length arguments: - -```python -import argparse - -@app.function() -def train(*arglist): - parser = argparse.ArgumentParser() - parser.add_argument("--foo", type=int) - args = parser.parse_args(args=arglist) -``` - -## Function Configuration - -Common parameters: - -```python -@app.function( - image=my_image, # Custom environment - gpu="A100", # GPU type - cpu=2.0, # CPU cores - memory=4096, # Memory in MB - timeout=3600, # Timeout in seconds - retries=3, # Number of retries - secrets=[my_secret], # Environment secrets - volumes={"/data": vol}, # Persistent storage -) -def my_function(): - ... +# Different model instances autoscale independently +gpt = Worker(model_name="gpt-4") +llama = Worker(model_name="llama-3") ``` ## Parallel Execution -### Map +### `.map()` — Parallel Processing -Run function on multiple inputs in parallel: +Process multiple inputs across containers: ```python @app.function() -def evaluate_model(x): - return x ** 2 +def process(item): + return heavy_computation(item) @app.local_entrypoint() def main(): - inputs = list(range(100)) - for result in evaluate_model.map(inputs): - print(result) + items = list(range(1000)) + results = list(process.map(items)) + print(f"Processed {len(results)} items") ``` -### Starmap +- Results are returned in the same order as inputs +- Modal autoscales containers to handle the workload +- Use `return_exceptions=True` to collect errors instead of raising -For functions with multiple arguments: +### `.starmap()` — Multi-Argument Parallel ```python @app.function() -def add(a, b): - return a + b +def add(x, y): + return x + y -@app.local_entrypoint() -def main(): - results = list(add.starmap([(1, 2), (3, 4)])) - # [3, 7] +results = list(add.starmap([(1, 2), (3, 4), (5, 6)])) +# [3, 7, 11] ``` -### Exception Handling +### `.map()` with `order_outputs=False` + +For faster throughput when order doesn't matter: ```python -results = my_func.map( - range(3), - return_exceptions=True, - wrap_returned_exceptions=False -) -# [0, 1, Exception('error')] +for result in process.map(items, order_outputs=False): + handle(result) # Results arrive as they complete ``` ## Async Functions -Define async functions: +Modal supports async/await natively: ```python @app.function() -async def async_function(x: int): - await asyncio.sleep(1) - return x * 2 - -@app.local_entrypoint() -async def main(): - result = await async_function.remote.aio(42) +async def fetch_data(url: str) -> str: + import httpx + async with httpx.AsyncClient() as client: + response = await client.get(url) + return response.text ``` -## Generator Functions +Async functions are especially useful with `@modal.concurrent()` for handling multiple requests per container. -Return iterators for streaming results: +## Local Entrypoints + +The `@app.local_entrypoint()` runs on your machine and orchestrates remote calls: + +```python +@app.local_entrypoint() +def main(): + # This code runs locally + data = load_local_data() + + # These calls run in the cloud + results = list(process.map(data)) + + # Back to local + save_results(results) +``` + +You can also define multiple entrypoints and select by function name: + +```bash +modal run script.py::train +modal run script.py::evaluate +``` + +## Generators + +Functions can yield results as they're produced: ```python @app.function() def generate_data(): - for i in range(10): - yield i + for i in range(100): + yield process(i) @app.local_entrypoint() def main(): - for value in generate_data.remote_gen(): - print(value) + for result in generate_data.remote_gen(): + print(result) ``` -## Spawning Functions +## Retries -Submit functions for background execution: +Configure automatic retries on failure: ```python -@app.function() -def process_job(data): - # Long-running job - return result - -@app.local_entrypoint() -def main(): - # Spawn without waiting - call = process_job.spawn(data) - - # Get result later - result = call.get(timeout=60) +@app.function(retries=3) +def flaky_operation(): + ... ``` -## Programmatic Execution - -Run apps programmatically: +For more control, use `modal.Retries`: ```python -def main(): - with modal.enable_output(): - with app.run(): - result = some_function.remote() +@app.function(retries=modal.Retries(max_retries=3, backoff_coefficient=2.0)) +def api_call(): + ... ``` -## Specifying Entrypoint +## Timeouts -With multiple functions, specify which to run: +Set maximum execution time: ```python -@app.function() -def f(): - print("Function f") - -@app.function() -def g(): - print("Function g") +@app.function(timeout=3600) # 1 hour +def long_training(): + ... ``` -Run specific function: -```bash -modal run script.py::app.f -modal run script.py::app.g -``` +Default timeout is 300 seconds (5 minutes). Maximum is 86400 seconds (24 hours). diff --git a/scientific-skills/modal/references/getting-started.md b/scientific-skills/modal/references/getting-started.md index 628956d..ce70c24 100644 --- a/scientific-skills/modal/references/getting-started.md +++ b/scientific-skills/modal/references/getting-started.md @@ -1,92 +1,175 @@ -# Getting Started with Modal +# Modal Getting Started Guide -## Sign Up +## Installation -Sign up for free at https://modal.com and get $30/month of credits. +Install Modal using uv (recommended) or pip: + +```bash +# Recommended +uv pip install modal + +# Alternative +pip install modal +``` ## Authentication -Set up authentication using the Modal CLI: +### Interactive Setup ```bash -modal token new +modal setup ``` -This creates credentials in `~/.modal.toml`. Alternatively, set environment variables: -- `MODAL_TOKEN_ID` -- `MODAL_TOKEN_SECRET` +This opens a browser for authentication and stores credentials locally. -## Basic Concepts +### Headless / CI/CD Setup -### Modal is Serverless +For environments without a browser, use token-based authentication: -Modal is a serverless platform - only pay for resources used and spin up containers on demand in seconds. +1. Generate tokens at https://modal.com/settings +2. Set environment variables: -### Core Components +```bash +export MODAL_TOKEN_ID= +export MODAL_TOKEN_SECRET= +``` -**App**: Represents an application running on Modal, grouping one or more Functions for atomic deployment. +Or use the CLI: -**Function**: Acts as an independent unit that scales up and down independently. No containers run (and no charges) when there are no live inputs. +```bash +modal token set --token-id --token-secret +``` -**Image**: The environment code runs in - a container snapshot with dependencies installed. +### Free Tier -## First Modal App +Modal provides $30/month in free credits. No credit card required for the free tier. -Create a file `hello_modal.py`: +## Your First App + +### Hello World + +Create a file `hello.py`: ```python import modal -app = modal.App(name="hello-modal") +app = modal.App("hello-world") @app.function() -def hello(): - print("Hello from Modal!") - return "success" +def greet(name: str) -> str: + return f"Hello, {name}! This ran in the cloud." @app.local_entrypoint() def main(): - hello.remote() + result = greet.remote("World") + print(result) ``` -Run with: +Run it: + ```bash -modal run hello_modal.py +modal run hello.py ``` -## Running Apps +What happens: +1. Modal packages your code +2. Creates a container in the cloud +3. Executes `greet()` remotely +4. Returns the result to your local machine -### Ephemeral Apps (Development) +### Understanding the Flow -Run temporarily with `modal run`: -```bash -modal run script.py +- `modal.App("name")` — Creates a named application +- `@app.function()` — Marks a function for remote execution +- `@app.local_entrypoint()` — Defines the local entry point (runs on your machine) +- `.remote()` — Calls the function in the cloud +- `.local()` — Calls the function locally (for testing) + +### Running Modes + +| Command | Description | +|---------|-------------| +| `modal run script.py` | Run the `@app.local_entrypoint()` function | +| `modal serve script.py` | Start a dev server with hot reload (for web endpoints) | +| `modal deploy script.py` | Deploy to production (persistent) | + +### A Simple Web Scraper + +```python +import modal + +app = modal.App("web-scraper") + +image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4") + +@app.function(image=image) +def scrape(url: str) -> str: + import httpx + from bs4 import BeautifulSoup + + response = httpx.get(url) + soup = BeautifulSoup(response.text, "html.parser") + return soup.get_text()[:1000] + +@app.local_entrypoint() +def main(): + result = scrape.remote("https://example.com") + print(result) ``` -The app stops when the script exits. Use `--detach` to keep running after client exits. +### GPU-Accelerated Inference -### Deployed Apps (Production) +```python +import modal -Deploy persistently with `modal deploy`: -```bash -modal deploy script.py +app = modal.App("gpu-inference") + +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("torch", "transformers", "accelerate") +) + +@app.function(gpu="L40S", image=image) +def generate(prompt: str) -> str: + from transformers import pipeline + pipe = pipeline("text-generation", model="gpt2", device="cuda") + result = pipe(prompt, max_length=100) + return result[0]["generated_text"] + +@app.local_entrypoint() +def main(): + print(generate.remote("The future of AI is")) ``` -View deployed apps at https://modal.com/apps or with: -```bash -modal app list +## Project Structure + +Modal apps are typically single Python files, but can be organized into modules: + +``` +my-project/ +├── app.py # Main app with @app.local_entrypoint() +├── inference.py # Inference functions +├── training.py # Training functions +└── common.py # Shared utilities ``` -Stop deployed apps: -```bash -modal app stop app-name -``` +Use `modal.Image.add_local_python_source()` to include local modules in the container image. -## Key Features +## Key Concepts Summary -- **Fast prototyping**: Write Python, run on GPUs in seconds -- **Serverless APIs**: Create web endpoints with a decorator -- **Scheduled jobs**: Run cron jobs in the cloud -- **GPU inference**: Access T4, L4, A10, A100, H100, H200, B200 GPUs -- **Distributed volumes**: Persistent storage for ML models -- **Sandboxes**: Secure containers for untrusted code +| Concept | What It Does | +|---------|-------------| +| `App` | Groups related functions into a deployable unit | +| `Function` | A serverless function backed by autoscaling containers | +| `Image` | Defines the container environment (packages, files) | +| `Volume` | Persistent distributed file storage | +| `Secret` | Secure credential injection | +| `Schedule` | Cron or periodic job scheduling | +| `gpu` | GPU type/count for the function | + +## Next Steps + +- See `functions.md` for advanced function patterns +- See `images.md` for custom container environments +- See `gpu.md` for GPU selection and configuration +- See `web-endpoints.md` for serving APIs diff --git a/scientific-skills/modal/references/gpu.md b/scientific-skills/modal/references/gpu.md index 5f18baa..62d6194 100644 --- a/scientific-skills/modal/references/gpu.md +++ b/scientific-skills/modal/references/gpu.md @@ -1,168 +1,174 @@ -# GPU Acceleration on Modal +# Modal GPU Compute -## Quick Start +## Table of Contents -Run functions on GPUs with the `gpu` parameter: +- [Available GPUs](#available-gpus) +- [Requesting GPUs](#requesting-gpus) +- [GPU Selection Guide](#gpu-selection-guide) +- [Multi-GPU](#multi-gpu) +- [GPU Fallback Chains](#gpu-fallback-chains) +- [Auto-Upgrades](#auto-upgrades) +- [Multi-GPU Training](#multi-gpu-training) -```python -import modal +## Available GPUs -image = modal.Image.debian_slim().pip_install("torch") -app = modal.App(image=image) +| GPU | VRAM | Max per Container | Best For | +|-----|------|-------------------|----------| +| T4 | 16 GB | 8 | Budget inference, small models | +| L4 | 24 GB | 8 | Inference, video processing | +| A10 | 24 GB | 4 | Inference, fine-tuning small models | +| L40S | 48 GB | 8 | Inference (best cost/perf), medium models | +| A100-40GB | 40 GB | 8 | Training, large model inference | +| A100-80GB | 80 GB | 8 | Training, large models | +| RTX-PRO-6000 | 48 GB | 8 | Rendering, inference | +| H100 | 80 GB | 8 | Large-scale training, fast inference | +| H200 | 141 GB | 8 | Very large models, training | +| B200 | 192 GB | 8 | Largest models, maximum throughput | +| B200+ | 192 GB | 8 | B200 or B300, B200 pricing | -@app.function(gpu="A100") -def run(): - import torch - assert torch.cuda.is_available() -``` +## Requesting GPUs -## Available GPU Types - -Modal supports the following GPUs: - -- `T4` - Entry-level GPU -- `L4` - Balanced performance and cost -- `A10` - Up to 4 GPUs, 96 GB total -- `A100` - 40GB or 80GB variants -- `A100-40GB` - Specific 40GB variant -- `A100-80GB` - Specific 80GB variant -- `L40S` - 48 GB, excellent for inference -- `H100` / `H100!` - Top-tier Hopper architecture -- `H200` - Improved Hopper with more memory -- `B200` - Latest Blackwell architecture - -See https://modal.com/pricing for pricing. - -## GPU Count - -Request multiple GPUs per container with `:n` syntax: - -```python -@app.function(gpu="H100:8") -def run_llama_405b(): - # 8 H100 GPUs available - ... -``` - -Supported counts: -- B200, H200, H100, A100, L4, T4, L40S: up to 8 GPUs (up to 1,536 GB) -- A10: up to 4 GPUs (up to 96 GB) - -Note: Requesting >2 GPUs may result in longer wait times. - -## GPU Selection Guide - -**For Inference (Recommended)**: Start with L40S -- Excellent cost/performance -- 48 GB memory -- Good for LLaMA, Stable Diffusion, etc. - -**For Training**: Consider H100 or A100 -- High compute throughput -- Large memory for batch processing - -**For Memory-Bound Tasks**: H200 or A100-80GB -- More memory capacity -- Better for large models - -## B200 GPUs - -NVIDIA's flagship Blackwell chip: - -```python -@app.function(gpu="B200:8") -def run_deepseek(): - # Most powerful option - ... -``` - -## H200 and H100 GPUs - -Hopper architecture GPUs with excellent software support: +### Basic Request ```python @app.function(gpu="H100") def train(): - ... + import torch + assert torch.cuda.is_available() + print(f"Using: {torch.cuda.get_device_name(0)}") ``` -### Automatic H200 Upgrades - -Modal may upgrade `gpu="H100"` to H200 at no extra cost. H200 provides: -- 141 GB memory (vs 80 GB for H100) -- 4.8 TB/s bandwidth (vs 3.35 TB/s) - -To avoid automatic upgrades (e.g., for benchmarking): -```python -@app.function(gpu="H100!") -def benchmark(): - ... -``` - -## A100 GPUs - -Ampere architecture with 40GB or 80GB variants: +### String Shorthand ```python -# May be automatically upgraded to 80GB -@app.function(gpu="A100") -def qwen_7b(): - ... - -# Specific variants -@app.function(gpu="A100-40GB") -def model_40gb(): - ... - -@app.function(gpu="A100-80GB") -def llama_70b(): - ... +gpu="T4" # Single T4 +gpu="A100-80GB" # Single A100 80GB +gpu="H100:4" # Four H100s ``` -## GPU Fallbacks - -Specify multiple GPU types with fallback: +### GPU Object (Advanced) ```python -@app.function(gpu=["H100", "A100-40GB:2"]) -def run_on_80gb(): - # Tries H100 first, falls back to 2x A100-40GB +@app.function(gpu=modal.gpu.H100(count=2)) +def multi_gpu(): ... ``` -Modal respects ordering and allocates most preferred available GPU. +## GPU Selection Guide + +### For Inference + +| Model Size | Recommended GPU | Why | +|-----------|----------------|-----| +| < 7B params | T4, L4 | Cost-effective, sufficient VRAM | +| 7B-13B params | L40S | Best cost/performance, 48 GB VRAM | +| 13B-70B params | A100-80GB, H100 | Large VRAM, fast memory bandwidth | +| 70B+ params | H100:2+, H200, B200 | Multi-GPU or very large VRAM | + +### For Training + +| Task | Recommended GPU | +|------|----------------| +| Fine-tuning (LoRA) | L40S, A100-40GB | +| Full fine-tuning small models | A100-80GB | +| Full fine-tuning large models | H100:4+, H200 | +| Pre-training | H100:8, B200:8 | + +### General Recommendation + +L40S is the best default for inference workloads — it offers an excellent trade-off of cost and performance with 48 GB of GPU RAM. + +## Multi-GPU + +Request multiple GPUs by appending `:count`: + +```python +@app.function(gpu="H100:4") +def distributed(): + import torch + print(f"GPUs available: {torch.cuda.device_count()}") + # All 4 GPUs are on the same physical machine +``` + +- Up to 8 GPUs for most types (up to 4 for A10) +- All GPUs attach to the same physical machine +- Requesting more than 2 GPUs may result in longer wait times +- Maximum VRAM: 8 x B200 = 1,536 GB + +## GPU Fallback Chains + +Specify a prioritized list of GPU types: + +```python +@app.function(gpu=["H100", "A100-80GB", "L40S"]) +def flexible(): + # Modal tries H100 first, then A100-80GB, then L40S + ... +``` + +Useful for reducing queue times when a specific GPU isn't available. + +## Auto-Upgrades + +### H100 → H200 + +Modal may automatically upgrade H100 requests to H200 at no extra cost. To prevent this: + +```python +@app.function(gpu="H100!") # Exclamation mark prevents auto-upgrade +def must_use_h100(): + ... +``` + +### A100 → A100-80GB + +A100-40GB requests may be upgraded to 80GB at no extra cost. + +### B200+ + +`gpu="B200+"` allows Modal to run on B200 or B300 GPUs at B200 pricing. Requires CUDA 13.0+. ## Multi-GPU Training -Modal supports multi-GPU training on a single node. Multi-node training is in closed beta. +Modal supports multi-GPU training on a single node. Multi-node training is in private beta. -### PyTorch Example - -For frameworks that re-execute entrypoints, use subprocess or specific strategies: +### PyTorch DDP Example ```python -@app.function(gpu="A100:2") -def train(): - import subprocess - import sys - subprocess.run( - ["python", "train.py"], - stdout=sys.stdout, - stderr=sys.stderr, - check=True, - ) +@app.function(gpu="H100:4", image=image, timeout=86400) +def train_distributed(): + import torch + import torch.distributed as dist + + dist.init_process_group(backend="nccl") + local_rank = int(os.environ.get("LOCAL_RANK", 0)) + device = torch.device(f"cuda:{local_rank}") + # ... training loop with DDP ... ``` -For PyTorch Lightning, set strategy to `ddp_spawn` or `ddp_notebook`. +### PyTorch Lightning -## Performance Considerations +When using frameworks that re-execute Python entrypoints (like PyTorch Lightning), either: -**Memory-Bound vs Compute-Bound**: -- Running models with small batch sizes is memory-bound -- Newer GPUs have faster arithmetic than memory access -- Speedup from newer hardware may not justify cost for memory-bound workloads +1. Set strategy to `ddp_spawn` or `ddp_notebook` +2. Or run training as a subprocess -**Optimization**: -- Use batching when possible -- Consider L40S before jumping to H100/B200 -- Profile to identify bottlenecks +```python +@app.function(gpu="H100:4", image=image) +def train(): + import subprocess + subprocess.run(["python", "train_script.py"], check=True) +``` + +### Hugging Face Accelerate + +```python +@app.function(gpu="A100-80GB:4", image=image) +def finetune(): + import subprocess + subprocess.run([ + "accelerate", "launch", + "--num_processes", "4", + "train.py" + ], check=True) +``` diff --git a/scientific-skills/modal/references/images.md b/scientific-skills/modal/references/images.md index 476bbf4..2663085 100644 --- a/scientific-skills/modal/references/images.md +++ b/scientific-skills/modal/references/images.md @@ -1,261 +1,259 @@ -# Modal Images +# Modal Container Images + +## Table of Contents + +- [Overview](#overview) +- [Base Images](#base-images) +- [Installing Packages](#installing-packages) +- [System Packages](#system-packages) +- [Shell Commands](#shell-commands) +- [Running Python During Build](#running-python-during-build) +- [Adding Local Files](#adding-local-files) +- [Environment Variables](#environment-variables) +- [Dockerfiles](#dockerfiles) +- [Alternative Package Managers](#alternative-package-managers) +- [Image Caching](#image-caching) +- [Handling Remote-Only Imports](#handling-remote-only-imports) ## Overview -Modal Images define the environment code runs in - containers with dependencies installed. Images are built from method chains starting from a base image. +Every Modal function runs inside a container built from an `Image`. By default, Modal uses a Debian Linux image with the same Python minor version as your local interpreter. + +Images are built lazily — Modal only builds/pulls the image when a function using it is first invoked. Layers are cached for fast rebuilds. ## Base Images -Start with a base image and chain methods: +```python +# Default: Debian slim with your local Python version +image = modal.Image.debian_slim() + +# Specific Python version +image = modal.Image.debian_slim(python_version="3.11") + +# From Docker Hub +image = modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04") + +# From a Dockerfile +image = modal.Image.from_dockerfile("./Dockerfile") +``` + +## Installing Packages + +### uv (Recommended) + +`uv_pip_install` uses the uv package manager for fast, reliable installs: ```python image = ( - modal.Image.debian_slim(python_version="3.13") - .apt_install("git") - .uv_pip_install("torch<3") - .env({"HALT_AND_CATCH_FIRE": "0"}) - .run_commands("git clone https://github.com/modal-labs/agi") -) -``` - -Available base images: -- `Image.debian_slim()` - Debian Linux with Python -- `Image.micromamba()` - Base with Micromamba package manager -- `Image.from_registry()` - Pull from Docker Hub, ECR, etc. -- `Image.from_dockerfile()` - Build from existing Dockerfile - -## Installing Python Packages - -### With uv (Recommended) - -Use `.uv_pip_install()` for fast package installation: - -```python -image = ( - modal.Image.debian_slim() - .uv_pip_install("pandas==2.2.0", "numpy") -) -``` - -### With pip - -Fallback to standard pip if needed: - -```python -image = ( - modal.Image.debian_slim(python_version="3.13") - .pip_install("pandas==2.2.0", "numpy") -) -``` - -Pin dependencies tightly (e.g., `"torch==2.8.0"`) for reproducibility. - -## Installing System Packages - -Install Linux packages with apt: - -```python -image = modal.Image.debian_slim().apt_install("git", "curl") -``` - -## Setting Environment Variables - -Pass a dictionary to `.env()`: - -```python -image = modal.Image.debian_slim().env({"PORT": "6443"}) -``` - -## Running Shell Commands - -Execute commands during image build: - -```python -image = ( - modal.Image.debian_slim() - .apt_install("git") - .run_commands("git clone https://github.com/modal-labs/gpu-glossary") -) -``` - -## Running Python Functions at Build Time - -Download model weights or perform setup: - -```python -def download_models(): - import diffusers - model_name = "segmind/small-sd" - pipe = diffusers.StableDiffusionPipeline.from_pretrained(model_name) - -hf_cache = modal.Volume.from_name("hf-cache") - -image = ( - modal.Image.debian_slim() - .pip_install("diffusers[torch]", "transformers") - .run_function( - download_models, - secrets=[modal.Secret.from_name("huggingface-secret")], - volumes={"/root/.cache/huggingface": hf_cache}, + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install( + "torch==2.8.0", + "transformers>=4.40", + "accelerate", + "scipy", ) ) ``` -## Adding Local Files +Pin versions for reproducibility. uv resolves dependencies faster than pip. -### Add Files or Directories +### pip (Fallback) ```python -image = modal.Image.debian_slim().add_local_dir( - "/user/erikbern/.aws", - remote_path="/root/.aws" +image = modal.Image.debian_slim().pip_install( + "numpy==1.26.0", + "pandas==2.1.0", ) ``` -By default, files are added at container startup. Use `copy=True` to include in built image. - -### Add Python Source - -Add importable Python modules: +### From requirements.txt ```python -image = modal.Image.debian_slim().add_local_python_source("local_module") - -@app.function(image=image) -def f(): - import local_module - local_module.do_stuff() +image = modal.Image.debian_slim().pip_install_from_requirements("requirements.txt") ``` -## Using Existing Container Images - -### From Public Registry - -```python -sklearn_image = modal.Image.from_registry("huanjason/scikit-learn") - -@app.function(image=sklearn_image) -def fit_knn(): - from sklearn.neighbors import KNeighborsClassifier - ... -``` - -Can pull from Docker Hub, Nvidia NGC, AWS ECR, GitHub ghcr.io. - -### From Private Registry - -Use Modal Secrets for authentication: - -**Docker Hub**: -```python -secret = modal.Secret.from_name("my-docker-secret") -image = modal.Image.from_registry( - "private-repo/image:tag", - secret=secret -) -``` - -**AWS ECR**: -```python -aws_secret = modal.Secret.from_name("my-aws-secret") -image = modal.Image.from_aws_ecr( - "000000000000.dkr.ecr.us-east-1.amazonaws.com/my-private-registry:latest", - secret=aws_secret, -) -``` - -### From Dockerfile - -```python -image = modal.Image.from_dockerfile("Dockerfile") - -@app.function(image=image) -def fit(): - import sklearn - ... -``` - -Can still extend with other image methods after importing. - -## Using Micromamba - -For coordinated installation of Python and system packages: - -```python -numpyro_pymc_image = ( - modal.Image.micromamba() - .micromamba_install("pymc==5.10.4", "numpyro==0.13.2", channels=["conda-forge"]) -) -``` - -## GPU Support at Build Time - -Run build steps on GPU instances: +### Private Packages ```python image = ( modal.Image.debian_slim() - .pip_install("bitsandbytes", gpu="H100") + .pip_install_private_repos( + "github.com/org/private-repo", + git_user="username", + secrets=[modal.Secret.from_name("github-token")], + ) +) +``` + +## System Packages + +Install Linux packages via apt: + +```python +image = ( + modal.Image.debian_slim() + .apt_install("ffmpeg", "libsndfile1", "git", "curl") + .uv_pip_install("librosa", "soundfile") +) +``` + +## Shell Commands + +Run arbitrary commands during image build: + +```python +image = ( + modal.Image.debian_slim() + .run_commands( + "wget https://example.com/data.tar.gz", + "tar -xzf data.tar.gz -C /opt/data", + "rm data.tar.gz", + ) +) +``` + +### With GPU + +Some build steps require GPU access (e.g., compiling CUDA kernels): + +```python +image = ( + modal.Image.debian_slim() + .uv_pip_install("torch") + .run_commands("python -c 'import torch; torch.cuda.is_available()'", gpu="A100") +) +``` + +## Running Python During Build + +Execute Python functions as build steps — useful for downloading model weights: + +```python +def download_model(): + from huggingface_hub import snapshot_download + snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3") + +image = ( + modal.Image.debian_slim(python_version="3.11") + .uv_pip_install("huggingface_hub", "torch", "transformers") + .run_function(download_model, secrets=[modal.Secret.from_name("huggingface")]) +) +``` + +The resulting filesystem (including downloaded files) is snapshotted into the image. + +## Adding Local Files + +### Local Directories + +```python +image = modal.Image.debian_slim().add_local_dir( + local_path="./config", + remote_path="/root/config", +) +``` + +By default, files are added at container startup (not baked into the image layer). Use `copy=True` to bake them in. + +### Local Python Modules + +```python +image = modal.Image.debian_slim().add_local_python_source("my_module") +``` + +This uses Python's import system to find and include the module. + +### Individual Files + +```python +image = modal.Image.debian_slim().add_local_file( + local_path="./model_config.json", + remote_path="/root/config.json", +) +``` + +## Environment Variables + +```python +image = ( + modal.Image.debian_slim() + .env({ + "TRANSFORMERS_CACHE": "/cache", + "TOKENIZERS_PARALLELISM": "false", + "HF_HOME": "/cache/huggingface", + }) +) +``` + +Names and values must be strings. + +## Dockerfiles + +Build from existing Dockerfiles: + +```python +image = modal.Image.from_dockerfile("./Dockerfile") + +# With build context +image = modal.Image.from_dockerfile("./Dockerfile", context_mount=modal.Mount.from_local_dir(".")) +``` + +## Alternative Package Managers + +### Micromamba / Conda + +For packages requiring coordinated system and Python package installs: + +```python +image = ( + modal.Image.micromamba(python_version="3.11") + .micromamba_install("cudatoolkit=11.8", "cudnn=8.6", channels=["conda-forge"]) + .uv_pip_install("torch") ) ``` ## Image Caching -Images are cached per layer. Breaking cache on one layer causes cascading rebuilds for subsequent layers. +Modal caches images per layer (per method call). Breaking the cache on one layer cascades to all subsequent layers. -Define frequently-changing layers last to maximize cache reuse. +### Optimization Tips + +1. **Order layers by change frequency**: Put stable dependencies first, frequently changing code last +2. **Pin versions**: Unpinned versions may resolve differently and break cache +3. **Separate large installs**: Put heavy packages (torch, tensorflow) in early layers ### Force Rebuild ```python -image = ( - modal.Image.debian_slim() - .apt_install("git") - .pip_install("slack-sdk", force_build=True) -) +# Single layer +image = modal.Image.debian_slim().apt_install("git", force_build=True) ``` -Or set environment variable: ```bash -MODAL_FORCE_BUILD=1 modal run ... +# All images in a run +MODAL_FORCE_BUILD=1 modal run script.py + +# Rebuild without updating cache +MODAL_IGNORE_CACHE=1 modal run script.py ``` -## Handling Different Local/Remote Packages +## Handling Remote-Only Imports -Import packages only available remotely inside function bodies: +When packages are only available in the container (not locally), use conditional imports: ```python @app.function(image=image) -def my_function(): - import pandas as pd # Only imported remotely - df = pd.DataFrame() - ... +def process(): + import torch # Only available in the container + return torch.cuda.device_count() ``` -Or use the imports context manager: +For module-level imports shared across functions, use the `Image.imports()` context manager: ```python -pandas_image = modal.Image.debian_slim().pip_install("pandas") - -with pandas_image.imports(): - import pandas as pd - -@app.function(image=pandas_image) -def my_function(): - df = pd.DataFrame() +with image.imports(): + import torch + import transformers ``` -## Fast Pull from Registry with eStargz - -Improve pull performance with eStargz compression: - -```bash -docker buildx build --tag "//:" \ - --output type=registry,compression=estargz,force-compression=true,oci-mediatypes=true \ - . -``` - -Supported registries: -- AWS ECR -- Docker Hub -- Google Artifact Registry +This prevents `ImportError` locally while making the imports available in the container. diff --git a/scientific-skills/modal/references/resources.md b/scientific-skills/modal/references/resources.md index 94b3ed4..c0ec94e 100644 --- a/scientific-skills/modal/references/resources.md +++ b/scientific-skills/modal/references/resources.md @@ -1,129 +1,117 @@ -# CPU, Memory, and Disk Resources +# Modal Resource Configuration -## Default Resources +## CPU -Each Modal container has default reservations: -- **CPU**: 0.125 cores -- **Memory**: 128 MiB - -Containers can exceed minimum if worker has available resources. - -## CPU Cores - -Request CPU cores as floating-point number: +### Requesting CPU ```python -@app.function(cpu=8.0) -def my_function(): - # Guaranteed access to at least 8 physical cores +@app.function(cpu=4.0) +def compute(): ... ``` -Values correspond to physical cores, not vCPUs. - -Modal sets multi-threading environment variables based on CPU reservation: -- `OPENBLAS_NUM_THREADS` -- `OMP_NUM_THREADS` -- `MKL_NUM_THREADS` - -## Memory - -Request memory in megabytes (integer): - -```python -@app.function(memory=32768) -def my_function(): - # Guaranteed access to at least 32 GiB RAM - ... -``` - -## Resource Limits +- Values are **physical cores**, not vCPUs +- Default: 0.125 cores +- Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request ### CPU Limits -Default soft CPU limit: request + 16 cores -- Default request: 0.125 cores → default limit: 16.125 cores -- Above limit, host throttles CPU usage - -Set explicit CPU limit: +- Default soft limit: 16 physical cores above the CPU request +- Set explicit limits to prevent noisy-neighbor effects: ```python -cpu_request = 1.0 -cpu_limit = 4.0 - -@app.function(cpu=(cpu_request, cpu_limit)) -def f(): +@app.function(cpu=4.0) # Request 4 cores +def bounded_compute(): ... ``` +## Memory + +### Requesting Memory + +```python +@app.function(memory=16384) # 16 GiB in MiB +def large_data(): + ... +``` + +- Value in **MiB** (megabytes) +- Default: 128 MiB + ### Memory Limits -Set hard memory limit to OOM kill containers at threshold: +Set hard memory limits to OOM-kill containers that exceed them: ```python -mem_request = 1024 # MB -mem_limit = 2048 # MB - -@app.function(memory=(mem_request, mem_limit)) -def f(): - # Container killed if exceeds 2048 MB +@app.function(memory=8192) # 8 GiB request and limit +def bounded_memory(): ... ``` -Useful for catching memory leaks early. +This prevents paying for runaway memory leaks. -### Disk Limits +## Ephemeral Disk -Running containers have access to many GBs of SSD disk, limited by: -1. Underlying worker's SSD capacity -2. Per-container disk quota (100s of GBs) - -Hitting limits causes `OSError` on disk writes. - -Request larger disk with `ephemeral_disk`: +For temporary storage within a container's lifetime: ```python -@app.function(ephemeral_disk=10240) # 10 GiB -def process_large_files(): +@app.function(ephemeral_disk=102400) # 100 GiB in MiB +def process_dataset(): + # Temporary files at /tmp or anywhere in the container filesystem ... ``` -Maximum disk size: 3.0 TiB (3,145,728 MiB) -Intended use: dataset processing +- Value in **MiB** +- Default: 512 GiB quota per container +- Maximum: 3,145,728 MiB (3 TiB) +- Data is lost when the container shuts down +- Use Volumes for persistent storage + +Larger disk requests increase the memory request at a 20:1 ratio for billing purposes. + +## Timeout + +```python +@app.function(timeout=3600) # 1 hour in seconds +def long_running(): + ... +``` + +- Default: 300 seconds (5 minutes) +- Maximum: 86,400 seconds (24 hours) +- Function is killed when timeout expires ## Billing -Charged based on whichever is higher: reservation or actual usage. +You are charged based on **whichever is higher**: your resource request or actual usage. -Disk requests increase memory request at 20:1 ratio: -- Requesting 500 GiB disk → increases memory request to 25 GiB (if not already higher) +| Resource | Billing Basis | +|----------|--------------| +| CPU | max(requested, used) | +| Memory | max(requested, used) | +| GPU | Time GPU is allocated | +| Disk | Increases memory billing at 20:1 ratio | -## Maximum Requests +### Cost Optimization Tips -Modal enforces maximums at Function creation time. Requests exceeding maximum will be rejected with `InvalidError`. +- Request only what you need +- Use appropriate GPU tiers (L40S over H100 for inference) +- Set `scaledown_window` to minimize idle time +- Use `min_containers=0` when cold starts are acceptable +- Batch inputs with `.map()` instead of individual `.remote()` calls -Contact support if you need higher limits. - -## Example: Resource Configuration +## Complete Example ```python @app.function( - cpu=4.0, # 4 physical cores - memory=16384, # 16 GiB RAM - ephemeral_disk=51200, # 50 GiB disk - timeout=3600, # 1 hour timeout + cpu=8.0, # 8 physical cores + memory=32768, # 32 GiB + gpu="L40S", # L40S GPU + ephemeral_disk=204800, # 200 GiB temp disk + timeout=7200, # 2 hours + max_containers=50, + min_containers=1, ) -def process_data(): - # Heavy processing with large files +def full_pipeline(data_path: str): ... ``` - -## Monitoring Resource Usage - -View resource usage in Modal dashboard: -- CPU utilization -- Memory usage -- Disk usage -- GPU metrics (if applicable) - -Access via https://modal.com/apps diff --git a/scientific-skills/modal/references/scaling.md b/scientific-skills/modal/references/scaling.md index 6e74c0e..7b9ffaf 100644 --- a/scientific-skills/modal/references/scaling.md +++ b/scientific-skills/modal/references/scaling.md @@ -1,230 +1,173 @@ -# Scaling Out on Modal +# Modal Scaling and Concurrency -## Automatic Autoscaling +## Table of Contents -Every Modal Function corresponds to an autoscaling pool of containers. Modal's autoscaler: -- Spins up containers when no capacity available -- Spins down containers when resources idle -- Scales to zero by default when no inputs to process +- [Autoscaling](#autoscaling) +- [Configuration](#configuration) +- [Parallel Execution](#parallel-execution) +- [Concurrent Inputs](#concurrent-inputs) +- [Dynamic Batching](#dynamic-batching) +- [Dynamic Autoscaler Updates](#dynamic-autoscaler-updates) +- [Limits](#limits) -Autoscaling decisions are made quickly and frequently. +## Autoscaling -## Parallel Execution with `.map()` +Modal automatically manages a pool of containers for each function: +- Spins up containers when there's no capacity for new inputs +- Spins down idle containers to save costs +- Scales from zero (no cost when idle) to thousands of containers -Run function repeatedly with different inputs in parallel: +No configuration needed for basic autoscaling — it works out of the box. -```python -@app.function() -def evaluate_model(x): - return x ** 2 +## Configuration -@app.local_entrypoint() -def main(): - inputs = list(range(100)) - # Runs 100 inputs in parallel across containers - for result in evaluate_model.map(inputs): - print(result) -``` - -### Multiple Arguments with `.starmap()` - -For functions with multiple arguments: - -```python -@app.function() -def add(a, b): - return a + b - -@app.local_entrypoint() -def main(): - results = list(add.starmap([(1, 2), (3, 4)])) - # [3, 7] -``` - -### Exception Handling - -```python -@app.function() -def may_fail(a): - if a == 2: - raise Exception("error") - return a ** 2 - -@app.local_entrypoint() -def main(): - results = list(may_fail.map( - range(3), - return_exceptions=True, - wrap_returned_exceptions=False - )) - # [0, 1, Exception('error')] -``` - -## Autoscaling Configuration - -Configure autoscaler behavior with parameters: +Fine-tune autoscaling behavior: ```python @app.function( - max_containers=100, # Upper limit on containers - min_containers=2, # Keep warm even when inactive - buffer_containers=5, # Maintain buffer while active - scaledown_window=60, # Max idle time before scaling down (seconds) + max_containers=100, # Upper limit on container count + min_containers=2, # Keep 2 warm (reduces cold starts) + buffer_containers=5, # Reserve 5 extra for burst traffic + scaledown_window=300, # Wait 5 min idle before shutting down ) -def my_function(): +def handle_request(data): ... ``` -Parameters: -- **max_containers**: Upper limit on total containers -- **min_containers**: Minimum kept warm even when inactive -- **buffer_containers**: Buffer size while function active (additional inputs won't need to queue) -- **scaledown_window**: Maximum idle duration before scale down (seconds) +| Parameter | Default | Description | +|-----------|---------|-------------| +| `max_containers` | Unlimited | Hard cap on total containers | +| `min_containers` | 0 | Minimum warm containers (costs money even when idle) | +| `buffer_containers` | 0 | Extra containers to prevent queuing | +| `scaledown_window` | 60 | Seconds of idle time before shutdown | -Trade-offs: -- Larger warm pool/buffer → Higher cost, lower latency -- Longer scaledown window → Less churn for infrequent requests +### Trade-offs + +- Higher `min_containers` = lower latency, higher cost +- Higher `buffer_containers` = less queuing, higher cost +- Lower `scaledown_window` = faster cost savings, more cold starts + +## Parallel Execution + +### `.map()` — Process Many Inputs + +```python +@app.function() +def process(item): + return heavy_computation(item) + +@app.local_entrypoint() +def main(): + items = list(range(10_000)) + results = list(process.map(items)) +``` + +Modal automatically scales containers to handle the workload. Results maintain input order. + +### `.map()` Options + +```python +# Unordered results (faster) +for result in process.map(items, order_outputs=False): + handle(result) + +# Collect errors instead of raising +results = list(process.map(items, return_exceptions=True)) +for r in results: + if isinstance(r, Exception): + print(f"Error: {r}") +``` + +### `.starmap()` — Multi-Argument + +```python +@app.function() +def add(x, y): + return x + y + +results = list(add.starmap([(1, 2), (3, 4), (5, 6)])) +# [3, 7, 11] +``` + +### `.spawn()` — Fire-and-Forget + +```python +# Returns immediately +call = process.spawn(large_data) + +# Check status or get result later +result = call.get() +``` + +Up to 1 million pending `.spawn()` calls. + +## Concurrent Inputs + +By default, each container handles one input at a time. Use `@modal.concurrent` to handle multiple: + +```python +@app.function(gpu="L40S") +@modal.concurrent(max_inputs=10) +async def predict(text: str): + result = await model.predict_async(text) + return result +``` + +This is ideal for I/O-bound workloads or async inference where a single GPU can handle multiple requests. + +### With Web Endpoints + +```python +@app.function(gpu="L40S") +@modal.concurrent(max_inputs=20) +@modal.asgi_app() +def web_service(): + return fastapi_app +``` + +## Dynamic Batching + +Collect inputs into batches for efficient GPU utilization: + +```python +@app.function(gpu="L40S") +@modal.batched(max_batch_size=32, wait_ms=100) +async def batch_predict(texts: list[str]): + # Called with up to 32 texts at once + embeddings = model.encode(texts) + return list(embeddings) +``` + +- `max_batch_size` — Maximum inputs per batch +- `wait_ms` — How long to wait for more inputs before processing +- The function receives a list and must return a list of the same length ## Dynamic Autoscaler Updates -Update autoscaler settings without redeployment: - -```python -f = modal.Function.from_name("my-app", "f") -f.update_autoscaler(max_containers=100) -``` - -Settings revert to decorator configuration on next deploy, or are overridden by further updates: - -```python -f.update_autoscaler(min_containers=2, max_containers=10) -f.update_autoscaler(min_containers=4) # max_containers=10 still in effect -``` - -### Time-Based Scaling - -Adjust warm pool based on time of day: +Adjust autoscaling at runtime without redeploying: ```python @app.function() -def inference_server(): - ... +def scale_up_for_peak(): + process = modal.Function.from_name("my-app", "process") + process.update_autoscaler(min_containers=10, buffer_containers=20) -@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York")) -def increase_warm_pool(): - inference_server.update_autoscaler(min_containers=4) - -@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York")) -def decrease_warm_pool(): - inference_server.update_autoscaler(min_containers=0) -``` - -### For Classes - -Update autoscaler for specific parameter instances: - -```python -MyClass = modal.Cls.from_name("my-app", "MyClass") -obj = MyClass(model_version="3.5") -obj.update_autoscaler(buffer_containers=2) # type: ignore -``` - -## Input Concurrency - -Process multiple inputs per container with `@modal.concurrent`: - -```python @app.function() -@modal.concurrent(max_inputs=100) -def my_function(input: str): - # Container can handle up to 100 concurrent inputs - ... +def scale_down_after_peak(): + process = modal.Function.from_name("my-app", "process") + process.update_autoscaler(min_containers=1, buffer_containers=2) ``` -Ideal for I/O-bound workloads: -- Database queries -- External API requests -- Remote Modal Function calls +Settings revert to the decorator values on the next deployment. -### Concurrency Mechanisms +## Limits -**Synchronous Functions**: Separate threads (must be thread-safe) +| Resource | Limit | +|----------|-------| +| Pending inputs (unassigned) | 2,000 | +| Total inputs (running + pending) | 25,000 | +| Pending `.spawn()` inputs | 1,000,000 | +| Concurrent inputs per `.map()` | 1,000 | +| Rate limit (web endpoints) | 200 req/s | -```python -@app.function() -@modal.concurrent(max_inputs=10) -def sync_function(): - time.sleep(1) # Must be thread-safe -``` - -**Async Functions**: Separate asyncio tasks (must not block event loop) - -```python -@app.function() -@modal.concurrent(max_inputs=10) -async def async_function(): - await asyncio.sleep(1) # Must not block event loop -``` - -### Target vs Max Inputs - -```python -@app.function() -@modal.concurrent( - max_inputs=120, # Hard limit - target_inputs=100 # Autoscaler target -) -def my_function(input: str): - # Allow 20% burst above target - ... -``` - -Autoscaler aims for `target_inputs`, but containers can burst to `max_inputs` during scale-up. - -## Scaling Limits - -Modal enforces limits per function: -- 2,000 pending inputs (not yet assigned to containers) -- 25,000 total inputs (running + pending) - -For `.spawn()` async jobs: up to 1 million pending inputs. - -Exceeding limits returns `Resource Exhausted` error - retry later. - -Each `.map()` invocation: max 1,000 concurrent inputs. - -## Async Usage - -Use async APIs for arbitrary parallel execution patterns: - -```python -@app.function() -async def async_task(x): - await asyncio.sleep(1) - return x * 2 - -@app.local_entrypoint() -async def main(): - tasks = [async_task.remote.aio(i) for i in range(100)] - results = await asyncio.gather(*tasks) -``` - -## Common Gotchas - -**Incorrect**: Using Python's builtin map (runs sequentially) -```python -# DON'T DO THIS -results = map(evaluate_model, inputs) -``` - -**Incorrect**: Calling function first -```python -# DON'T DO THIS -results = evaluate_model(inputs).map() -``` - -**Correct**: Call .map() on Modal function object -```python -# DO THIS -results = evaluate_model.map(inputs) -``` +Exceeding these limits triggers `Resource Exhausted` errors. Implement retry logic for resilience. diff --git a/scientific-skills/modal/references/scheduled-jobs.md b/scientific-skills/modal/references/scheduled-jobs.md index ac9a0e1..5b1cb51 100644 --- a/scientific-skills/modal/references/scheduled-jobs.md +++ b/scientific-skills/modal/references/scheduled-jobs.md @@ -1,303 +1,143 @@ -# Scheduled Jobs and Cron +# Modal Scheduled Jobs -## Basic Scheduling +## Overview -Schedule functions to run automatically at regular intervals or specific times. +Modal supports running functions automatically on a schedule, either using cron syntax or fixed intervals. Deploy scheduled functions with `modal deploy` and they run unattended in the cloud. -### Simple Daily Schedule +## Schedule Types + +### modal.Cron + +Standard cron syntax — stable across deploys: ```python import modal -app = modal.App() +app = modal.App("scheduled-tasks") -@app.function(schedule=modal.Period(days=1)) -def daily_task(): - print("Running daily task") - # Process data, send reports, etc. +# Daily at 9 AM UTC +@app.function(schedule=modal.Cron("0 9 * * *")) +def daily_report(): + generate_and_send_report() + +# Every Monday at midnight +@app.function(schedule=modal.Cron("0 0 * * 1")) +def weekly_cleanup(): + cleanup_old_data() + +# Every 15 minutes +@app.function(schedule=modal.Cron("*/15 * * * *")) +def frequent_check(): + check_system_health() ``` -Deploy to activate: -```bash -modal deploy script.py +#### Cron Syntax Reference + +``` +┌───────────── minute (0-59) +│ ┌───────────── hour (0-23) +│ │ ┌───────────── day of month (1-31) +│ │ │ ┌───────────── month (1-12) +│ │ │ │ ┌───────────── day of week (0-6, Sun=0) +│ │ │ │ │ +* * * * * ``` -Function runs every 24 hours from deployment time. +| Pattern | Meaning | +|---------|---------| +| `0 9 * * *` | Daily at 9:00 AM UTC | +| `0 */6 * * *` | Every 6 hours | +| `*/30 * * * *` | Every 30 minutes | +| `0 0 * * 1` | Every Monday at midnight | +| `0 0 1 * *` | First day of every month | +| `0 9 * * 1-5` | Weekdays at 9 AM | -## Schedule Types +### modal.Period -### Period Schedules - -Run at fixed intervals from deployment time: +Fixed interval — resets on each deploy: ```python # Every 5 hours @app.function(schedule=modal.Period(hours=5)) -def every_5_hours(): - ... +def periodic_sync(): + sync_data() # Every 30 minutes @app.function(schedule=modal.Period(minutes=30)) -def every_30_minutes(): - ... +def poll_updates(): + check_for_updates() # Every day @app.function(schedule=modal.Period(days=1)) -def daily(): +def daily_task(): ... ``` -**Note**: Redeploying resets the period timer. +`modal.Period` resets its timer on each deployment. If you need a schedule that doesn't shift with deploys, use `modal.Cron`. -### Cron Schedules +## Deploying Scheduled Functions -Run at specific times using cron syntax: - -```python -# Every Monday at 8 AM UTC -@app.function(schedule=modal.Cron("0 8 * * 1")) -def weekly_report(): - ... - -# Daily at 6 AM New York time -@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York")) -def morning_report(): - ... - -# Every hour on the hour -@app.function(schedule=modal.Cron("0 * * * *")) -def hourly(): - ... - -# Every 15 minutes -@app.function(schedule=modal.Cron("*/15 * * * *")) -def quarter_hourly(): - ... -``` - -**Cron syntax**: `minute hour day month day_of_week` -- Minute: 0-59 -- Hour: 0-23 -- Day: 1-31 -- Month: 1-12 -- Day of week: 0-6 (0 = Sunday) - -### Timezone Support - -Specify timezone for cron schedules: - -```python -@app.function(schedule=modal.Cron("0 9 * * *", timezone="Europe/London")) -def uk_morning_task(): - ... - -@app.function(schedule=modal.Cron("0 17 * * 5", timezone="Asia/Tokyo")) -def friday_evening_jp(): - ... -``` - -## Deployment - -### Deploy Scheduled Functions +Schedules only activate when deployed: ```bash modal deploy script.py ``` -Scheduled functions persist until explicitly stopped. - -### Programmatic Deployment - -```python -if __name__ == "__main__": - app.deploy() -``` +`modal run` and `modal serve` do not activate schedules. ## Monitoring -### View Execution Logs +- View scheduled runs in the **Apps** section of the Modal dashboard +- Each run appears with its status, duration, and logs +- Use the **"Run Now"** button on the dashboard to trigger manually -Check https://modal.com/apps for: -- Past execution logs -- Execution history -- Failure notifications +## Management -### Run Manually - -Trigger scheduled function immediately via dashboard "Run now" button. - -## Schedule Management - -### Pausing Schedules - -Schedules cannot be paused. To stop: -1. Remove `schedule` parameter -2. Redeploy app - -### Updating Schedules - -Change schedule parameters and redeploy: - -```python -# Update from daily to weekly -@app.function(schedule=modal.Period(days=7)) -def task(): - ... -``` - -```bash -modal deploy script.py -``` +- Schedules cannot be paused — remove the schedule and redeploy to stop +- To change a schedule, update the `schedule` parameter and redeploy +- To stop entirely, either remove the `schedule` parameter or run `modal app stop ` ## Common Patterns -### Data Pipeline +### ETL Pipeline ```python @app.function( - schedule=modal.Cron("0 2 * * *"), # 2 AM daily - timeout=3600, # 1 hour timeout + schedule=modal.Cron("0 2 * * *"), # 2 AM UTC daily + secrets=[modal.Secret.from_name("db-creds")], + timeout=7200, ) def etl_pipeline(): - # Extract data from sources - data = extract_data() - - # Transform data - transformed = transform_data(data) - - # Load to warehouse - load_to_warehouse(transformed) + import os + data = extract(os.environ["SOURCE_DB_URL"]) + transformed = transform(data) + load(transformed, os.environ["DEST_DB_URL"]) ``` ### Model Retraining ```python -volume = modal.Volume.from_name("models") - @app.function( - schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday midnight - gpu="A100", - timeout=7200, # 2 hours - volumes={"/models": volume} + schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday + gpu="H100", + volumes={"/data": data_vol, "/models": model_vol}, + timeout=86400, ) -def retrain_model(): - # Load latest data - data = load_training_data() - - # Train model - model = train(data) - - # Save new model - save_model(model, "/models/latest.pt") - volume.commit() +def retrain(): + model = train_on_latest_data("/data/training/") + torch.save(model.state_dict(), "/models/latest.pt") ``` -### Report Generation +### Health Checks ```python @app.function( - schedule=modal.Cron("0 9 * * 1"), # Monday 9 AM - secrets=[modal.Secret.from_name("email-creds")] + schedule=modal.Period(minutes=5), + secrets=[modal.Secret.from_name("slack-webhook")], ) -def weekly_report(): - # Generate report - report = generate_analytics_report() - - # Send email - send_email( - to="team@company.com", - subject="Weekly Analytics Report", - body=report - ) +def health_check(): + import os, requests + status = check_all_services() + if not status["healthy"]: + requests.post(os.environ["SLACK_URL"], json={"text": f"Alert: {status}"}) ``` - -### Data Cleanup - -```python -@app.function(schedule=modal.Period(hours=6)) -def cleanup_old_data(): - # Remove data older than 30 days - cutoff = datetime.now() - timedelta(days=30) - delete_old_records(cutoff) -``` - -## Configuration with Secrets and Volumes - -Scheduled functions support all function parameters: - -```python -vol = modal.Volume.from_name("data") -secret = modal.Secret.from_name("api-keys") - -@app.function( - schedule=modal.Cron("0 */6 * * *"), # Every 6 hours - secrets=[secret], - volumes={"/data": vol}, - cpu=4.0, - memory=16384, -) -def sync_data(): - import os - - api_key = os.environ["API_KEY"] - - # Fetch from external API - data = fetch_external_data(api_key) - - # Save to volume - with open("/data/latest.json", "w") as f: - json.dump(data, f) - - vol.commit() -``` - -## Dynamic Scheduling - -Update schedules programmatically: - -```python -@app.function() -def main_task(): - ... - -@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York")) -def enable_high_traffic_mode(): - main_task.update_autoscaler(min_containers=5) - -@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York")) -def disable_high_traffic_mode(): - main_task.update_autoscaler(min_containers=0) -``` - -## Error Handling - -Scheduled functions that fail will: -- Show failure in dashboard -- Send notifications (configurable) -- Retry on next scheduled run - -```python -@app.function( - schedule=modal.Cron("0 * * * *"), - retries=3, # Retry failed runs - timeout=1800 -) -def robust_task(): - try: - perform_task() - except Exception as e: - # Log error - print(f"Task failed: {e}") - # Optionally send alert - send_alert(f"Scheduled task failed: {e}") - raise -``` - -## Best Practices - -1. **Set timeouts**: Always specify timeout for scheduled functions -2. **Use appropriate schedules**: Period for relative timing, Cron for absolute -3. **Monitor failures**: Check dashboard regularly for failed runs -4. **Idempotent operations**: Design tasks to handle reruns safely -5. **Resource limits**: Set appropriate CPU/memory for scheduled workloads -6. **Timezone awareness**: Specify timezone for cron schedules diff --git a/scientific-skills/modal/references/secrets.md b/scientific-skills/modal/references/secrets.md index 9aa58eb..cfacb24 100644 --- a/scientific-skills/modal/references/secrets.md +++ b/scientific-skills/modal/references/secrets.md @@ -1,180 +1,119 @@ -# Secrets and Environment Variables +# Modal Secrets + +## Overview + +Modal Secrets securely deliver credentials and sensitive data to functions as environment variables. Secrets are stored encrypted and only available to your workspace. ## Creating Secrets -### Via Dashboard - -Create secrets at https://modal.com/secrets - -Templates available for: -- Database credentials (Postgres, MongoDB) -- Cloud providers (AWS, GCP, Azure) -- ML platforms (Weights & Biases, Hugging Face) -- And more - ### Via CLI ```bash -# Create secret with key-value pairs -modal secret create my-secret KEY1=value1 KEY2=value2 +# Create with key-value pairs +modal secret create my-api-keys API_KEY=sk-xxx DB_PASSWORD=hunter2 -# Use environment variables -modal secret create db-secret PGHOST=uri PGPASSWORD="$PGPASSWORD" +# Create from existing environment variables +modal secret create my-env-keys API_KEY=$API_KEY -# List secrets +# List all secrets modal secret list -# Delete secret -modal secret delete my-secret +# Delete a secret +modal secret delete my-api-keys ``` -### Programmatically +### Via Dashboard -From dictionary: +Navigate to https://modal.com/secrets to create and manage secrets. Templates are available for common services (Postgres, MongoDB, Hugging Face, Weights & Biases, etc.). + +### Programmatic (Inline) ```python -if modal.is_local(): - local_secret = modal.Secret.from_dict({"FOO": os.environ["LOCAL_FOO"]}) -else: - local_secret = modal.Secret.from_dict({}) +# From a dictionary (useful for development) +secret = modal.Secret.from_dict({"API_KEY": "sk-xxx"}) -@app.function(secrets=[local_secret]) -def some_function(): - import os - print(os.environ["FOO"]) +# From a .env file +secret = modal.Secret.from_dotenv() + +# From a named secret (created via CLI or dashboard) +secret = modal.Secret.from_name("my-api-keys") ``` -From .env file: +## Using Secrets in Functions + +### Basic Usage ```python -@app.function(secrets=[modal.Secret.from_dotenv()]) -def some_function(): +@app.function(secrets=[modal.Secret.from_name("my-api-keys")]) +def call_api(): import os - print(os.environ["USERNAME"]) -``` - -## Using Secrets - -Inject secrets into functions: - -```python -@app.function(secrets=[modal.Secret.from_name("my-secret")]) -def some_function(): - import os - secret_key = os.environ["MY_PASSWORD"] - # Use secret - ... + api_key = os.environ["API_KEY"] + # Use the key + response = requests.get(url, headers={"Authorization": f"Bearer {api_key}"}) + return response.json() ``` ### Multiple Secrets ```python @app.function(secrets=[ + modal.Secret.from_name("openai-keys"), modal.Secret.from_name("database-creds"), - modal.Secret.from_name("api-keys"), ]) -def other_function(): - # All keys from both secrets available +def process(): + import os + openai_key = os.environ["OPENAI_API_KEY"] + db_url = os.environ["DATABASE_URL"] ... ``` -Later secrets override earlier ones if keys clash. +Secrets are applied in order — if two secrets define the same key, the later one wins. -## Environment Variables - -### Reserved Runtime Variables - -**All Containers**: -- `MODAL_CLOUD_PROVIDER` - Cloud provider (AWS/GCP/OCI) -- `MODAL_IMAGE_ID` - Image ID -- `MODAL_REGION` - Region identifier (e.g., us-east-1) -- `MODAL_TASK_ID` - Container task ID - -**Function Containers**: -- `MODAL_ENVIRONMENT` - Modal Environment name -- `MODAL_IS_REMOTE` - Set to '1' in remote containers -- `MODAL_IDENTITY_TOKEN` - OIDC token for function identity - -**Sandbox Containers**: -- `MODAL_SANDBOX_ID` - Sandbox ID - -### Setting Environment Variables - -Via Image: +### With Classes ```python -image = modal.Image.debian_slim().env({"PORT": "6443"}) - -@app.function(image=image) -def my_function(): - import os - port = os.environ["PORT"] +@app.cls(secrets=[modal.Secret.from_name("huggingface")]) +class ModelService: + @modal.enter() + def load(self): + import os + token = os.environ["HF_TOKEN"] + self.model = AutoModel.from_pretrained("model-name", token=token) ``` -Via Secrets: +### From .env File ```python -secret = modal.Secret.from_dict({"API_KEY": "secret-value"}) - -@app.function(secrets=[secret]) -def my_function(): +# Reads .env file from current directory +@app.function(secrets=[modal.Secret.from_dotenv()]) +def local_dev(): import os api_key = os.environ["API_KEY"] ``` -## Common Secret Patterns +The `.env` file format: -### AWS Credentials - -```python -aws_secret = modal.Secret.from_name("my-aws-secret") - -@app.function(secrets=[aws_secret]) -def use_aws(): - import boto3 - s3 = boto3.client('s3') - # AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY automatically used +``` +API_KEY=sk-xxx +DATABASE_URL=postgres://user:pass@host/db +DEBUG=false ``` -### Hugging Face Token +## Common Secret Templates -```python -hf_secret = modal.Secret.from_name("huggingface") - -@app.function(secrets=[hf_secret]) -def download_model(): - from transformers import AutoModel - # HF_TOKEN automatically used for authentication - model = AutoModel.from_pretrained("private-model") -``` - -### Database Credentials - -```python -db_secret = modal.Secret.from_name("postgres-creds") - -@app.function(secrets=[db_secret]) -def query_db(): - import psycopg2 - conn = psycopg2.connect( - host=os.environ["PGHOST"], - port=os.environ["PGPORT"], - user=os.environ["PGUSER"], - password=os.environ["PGPASSWORD"], - ) -``` - -## Best Practices - -1. **Never hardcode secrets** - Always use Modal Secrets -2. **Use specific secrets** - Create separate secrets for different purposes -3. **Rotate secrets regularly** - Update secrets periodically -4. **Minimal scope** - Only attach secrets to functions that need them -5. **Environment-specific** - Use different secrets for dev/staging/prod +| Service | Typical Keys | +|---------|-------------| +| OpenAI | `OPENAI_API_KEY` | +| Hugging Face | `HF_TOKEN` | +| AWS | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` | +| Postgres | `PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE` | +| Weights & Biases | `WANDB_API_KEY` | +| GitHub | `GITHUB_TOKEN` | ## Security Notes -- Secrets are encrypted at rest -- Only available to functions that explicitly request them -- Not logged or exposed in dashboards -- Can be scoped to specific environments +- Secrets are encrypted at rest and in transit +- Only accessible to functions in your workspace +- Never log or print secret values +- Use `.from_name()` in production (not `.from_dict()`) +- Rotate secrets regularly via the dashboard or CLI diff --git a/scientific-skills/modal/references/volumes.md b/scientific-skills/modal/references/volumes.md index 9cbb1b2..8a7a3d7 100644 --- a/scientific-skills/modal/references/volumes.md +++ b/scientific-skills/modal/references/volumes.md @@ -1,303 +1,247 @@ # Modal Volumes +## Table of Contents + +- [Overview](#overview) +- [Creating Volumes](#creating-volumes) +- [Mounting Volumes](#mounting-volumes) +- [Reading and Writing Files](#reading-and-writing-files) +- [CLI Access](#cli-access) +- [Commits and Reloads](#commits-and-reloads) +- [Concurrent Access](#concurrent-access) +- [Volumes v2](#volumes-v2) +- [Common Patterns](#common-patterns) + ## Overview -Modal Volumes provide high-performance distributed file systems for Modal applications. Designed for write-once, read-many workloads like ML model weights and distributed data processing. +Volumes are Modal's distributed file system, optimized for write-once, read-many workloads like storing model weights and distributing them across containers. + +Key characteristics: +- Persistent across function invocations and deployments +- Mountable by multiple functions simultaneously +- Background auto-commits every few seconds +- Final commit on container shutdown ## Creating Volumes +### In Code (Lazy Creation) + +```python +vol = modal.Volume.from_name("my-volume", create_if_missing=True) +``` + ### Via CLI ```bash modal volume create my-volume + +# v2 volume (beta) +modal volume create my-volume --version=2 ``` -For Volumes v2 (beta): -```bash -modal volume create --version=2 my-volume -``` - -### From Code +### Programmatic v2 ```python -vol = modal.Volume.from_name("my-volume", create_if_missing=True) - -# For v2 vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2) ``` -## Using Volumes +## Mounting Volumes -Attach to functions via mount points: +Mount volumes to functions via the `volumes` parameter: ```python -vol = modal.Volume.from_name("my-volume") +vol = modal.Volume.from_name("model-store", create_if_missing=True) -@app.function(volumes={"/data": vol}) -def run(): - with open("/data/xyz.txt", "w") as f: - f.write("hello") - vol.commit() # Persist changes +@app.function(volumes={"/models": vol}) +def use_model(): + # Access files at /models/ + with open("/models/config.json") as f: + config = json.load(f) ``` -## Commits and Reloads - -### Commits - -Persist changes to Volume: +Mount multiple volumes: ```python -@app.function(volumes={"/data": vol}) -def write_data(): - with open("/data/file.txt", "w") as f: - f.write("data") - vol.commit() # Make changes visible to other containers -``` +weights_vol = modal.Volume.from_name("weights") +data_vol = modal.Volume.from_name("datasets") -**Background commits**: Modal automatically commits Volume changes every few seconds and on container shutdown. - -### Reloads - -Fetch latest changes from other containers: - -```python -@app.function(volumes={"/data": vol}) -def read_data(): - vol.reload() # Fetch latest changes - with open("/data/file.txt", "r") as f: - content = f.read() -``` - -At container creation, latest Volume state is mounted. Reload needed to see subsequent commits from other containers. - -## Uploading Files - -### Batch Upload (Efficient) - -```python -vol = modal.Volume.from_name("my-volume") - -with vol.batch_upload() as batch: - batch.put_file("local-path.txt", "/remote-path.txt") - batch.put_directory("/local/directory/", "/remote/directory") - batch.put_file(io.BytesIO(b"some data"), "/foobar") -``` - -### Via Image - -```python -image = modal.Image.debian_slim().add_local_dir( - local_path="/home/user/my_dir", - remote_path="/app" -) - -@app.function(image=image) -def process(): - # Files available at /app +@app.function(volumes={"/weights": weights_vol, "/data": data_vol}) +def train(): ... ``` -## Downloading Files +## Reading and Writing Files -### Via CLI - -```bash -modal volume get my-volume remote.txt local.txt -``` - -Max file size via CLI: No limit -Max file size via dashboard: 16 MB - -### Via Python SDK +### Writing ```python -vol = modal.Volume.from_name("my-volume") +@app.function(volumes={"/data": vol}) +def save_results(results): + import json + import os -for data in vol.read_file("path.txt"): - print(data) + os.makedirs("/data/outputs", exist_ok=True) + with open("/data/outputs/results.json", "w") as f: + json.dump(results, f) ``` -## Volume Performance - -### Volumes v1 - -Best for: -- <50,000 files (recommended) -- <500,000 files (hard limit) -- Sequential access patterns -- <5 concurrent writers - -### Volumes v2 (Beta) - -Improved for: -- Unlimited files -- Hundreds of concurrent writers -- Random access patterns -- Large files (up to 1 TiB) - -Current v2 limits: -- Max file size: 1 TiB -- Max files per directory: 32,768 -- Unlimited directory depth - -## Model Storage - -### Saving Model Weights +### Reading ```python -volume = modal.Volume.from_name("model-weights", create_if_missing=True) -MODEL_DIR = "/models" +@app.function(volumes={"/data": vol}) +def load_results(): + with open("/data/outputs/results.json") as f: + return json.load(f) +``` -@app.function(volumes={MODEL_DIR: volume}) -def train(): +### Large Files (Model Weights) + +```python +@app.function(volumes={"/models": vol}, gpu="L40S") +def save_model(): + import torch model = train_model() - save_model(f"{MODEL_DIR}/my_model.pt", model) - volume.commit() + torch.save(model.state_dict(), "/models/checkpoint.pt") + +@app.function(volumes={"/models": vol}, gpu="L40S") +def load_model(): + import torch + model = MyModel() + model.load_state_dict(torch.load("/models/checkpoint.pt")) + return model ``` -### Loading Model Weights - -```python -@app.function(volumes={MODEL_DIR: volume}) -def inference(model_id: str): - try: - model = load_model(f"{MODEL_DIR}/{model_id}") - except NotFound: - volume.reload() # Fetch latest models - model = load_model(f"{MODEL_DIR}/{model_id}") - return model.run(request) -``` - -## Model Checkpointing - -Save checkpoints during long training jobs: - -```python -volume = modal.Volume.from_name("checkpoints") -VOL_PATH = "/vol" - -@app.function( - gpu="A10G", - timeout=2*60*60, # 2 hours - volumes={VOL_PATH: volume} -) -def finetune(): - from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments - - training_args = Seq2SeqTrainingArguments( - output_dir=str(VOL_PATH / "model"), # Checkpoints saved to Volume - save_steps=100, - # ... more args - ) - - trainer = Seq2SeqTrainer(model=model, args=training_args, ...) - trainer.train() -``` - -Background commits ensure checkpoints persist even if training is interrupted. - -## CLI Commands +## CLI Access ```bash # List files modal volume ls my-volume +modal volume ls my-volume /subdir/ -# Upload -modal volume put my-volume local.txt remote.txt +# Upload files +modal volume put my-volume local_file.txt +modal volume put my-volume local_file.txt /remote/path/file.txt -# Download -modal volume get my-volume remote.txt local.txt +# Download files +modal volume get my-volume /remote/file.txt local_file.txt -# Copy within Volume -modal volume cp my-volume src.txt dst.txt - -# Delete -modal volume rm my-volume file.txt - -# List all volumes -modal volume list - -# Delete volume +# Delete a volume modal volume delete my-volume ``` -## Ephemeral Volumes +## Commits and Reloads -Create temporary volumes that are garbage collected: +Modal auto-commits volume changes in the background every few seconds and on container shutdown. + +### Explicit Commit + +Force an immediate commit: ```python -with modal.Volume.ephemeral() as vol: - sb = modal.Sandbox.create( - volumes={"/cache": vol}, - app=my_app, - ) - # Use volume - # Automatically cleaned up when context exits +@app.function(volumes={"/data": vol}) +def writer(): + with open("/data/file.txt", "w") as f: + f.write("hello") + vol.commit() # Make immediately visible to other containers +``` + +### Reload + +See changes from other containers: + +```python +@app.function(volumes={"/data": vol}) +def reader(): + vol.reload() # Refresh to see latest writes + with open("/data/file.txt") as f: + return f.read() ``` ## Concurrent Access -### Concurrent Reads +### v1 Volumes -Multiple containers can read simultaneously without issues. +- Recommended max 5 concurrent commits +- Last write wins for concurrent modifications of the same file +- Avoid concurrent modification of identical files +- Max 500,000 files (inodes) -### Concurrent Writes +### v2 Volumes -Supported but: -- Avoid modifying same files concurrently -- Last write wins (data loss possible) -- v1: Limit to ~5 concurrent writers -- v2: Hundreds of concurrent writers supported +- Hundreds of concurrent writers (distinct files) +- No file count limit +- Improved random access performance +- Up to 1 TiB per file, 262,144 files per directory -## Volume Errors +## Volumes v2 -### "Volume Busy" +v2 Volumes (beta) offer significant improvements: -Cannot reload when files are open: +| Feature | v1 | v2 | +|---------|----|----| +| Max files | 500,000 | Unlimited | +| Concurrent writes | ~5 | Hundreds | +| Max file size | No limit | 1 TiB | +| Random access | Limited | Full support | +| HIPAA compliance | No | Yes | +| Hard links | No | Yes | + +Enable v2: ```python -# WRONG -f = open("/vol/data.txt", "r") -volume.reload() # ERROR: volume busy +vol = modal.Volume.from_name("my-vol-v2", create_if_missing=True, version=2) ``` +## Common Patterns + +### Model Weight Storage + ```python -# CORRECT -with open("/vol/data.txt", "r") as f: - data = f.read() -# File closed before reload -volume.reload() +vol = modal.Volume.from_name("model-weights", create_if_missing=True) + +# Download once during image build +def download_weights(): + from huggingface_hub import snapshot_download + snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3") + +image = ( + modal.Image.debian_slim() + .uv_pip_install("huggingface_hub") + .run_function(download_weights, volumes={"/models": vol}) +) ``` -### "File Not Found" - -Remember to use mount point: +### Training Checkpoints ```python -# WRONG - file saved to local disk -with open("/xyz.txt", "w") as f: - f.write("data") - -# CORRECT - file saved to Volume -with open("/data/xyz.txt", "w") as f: - f.write("data") +@app.function(volumes={"/checkpoints": vol}, gpu="H100", timeout=86400) +def train(): + for epoch in range(100): + train_one_epoch() + torch.save(model.state_dict(), f"/checkpoints/epoch_{epoch}.pt") + vol.commit() # Save checkpoint immediately ``` -## Upgrading from v1 to v2 +### Shared Data Between Functions -No automated migration currently. Manual steps: +```python +data_vol = modal.Volume.from_name("shared-data", create_if_missing=True) -1. Create new v2 Volume -2. Copy data using `cp` or `rsync` -3. Update app to use new Volume +@app.function(volumes={"/data": data_vol}) +def preprocess(): + # Write processed data + df.to_parquet("/data/processed.parquet") -```bash -modal volume create --version=2 my-volume-v2 -modal shell --volume my-volume --volume my-volume-v2 - -# In shell: -cp -rp /mnt/my-volume/. /mnt/my-volume-v2/. -sync /mnt/my-volume-v2 +@app.function(volumes={"/data": data_vol}) +def analyze(): + data_vol.reload() # Ensure we see latest data + df = pd.read_parquet("/data/processed.parquet") + return df.describe() ``` -Warning: Deployed apps reference Volumes by ID. Re-deploy after creating new Volume. +### Performance Tips + +- Volumes are optimized for large files, not many small files +- Keep under 50,000 files and directories for best v1 performance +- Use Parquet or other columnar formats instead of many small CSVs +- For truly temporary data, use `ephemeral_disk` instead of Volumes diff --git a/scientific-skills/modal/references/web-endpoints.md b/scientific-skills/modal/references/web-endpoints.md index 5b08f86..63036b5 100644 --- a/scientific-skills/modal/references/web-endpoints.md +++ b/scientific-skills/modal/references/web-endpoints.md @@ -1,337 +1,254 @@ -# Web Endpoints +# Modal Web Endpoints -## Quick Start +## Table of Contents -Create web endpoint with single decorator: - -```python -image = modal.Image.debian_slim().pip_install("fastapi[standard]") - -@app.function(image=image) -@modal.fastapi_endpoint() -def hello(): - return "Hello world!" -``` - -## Development and Deployment - -### Development with `modal serve` - -```bash -modal serve server.py -``` - -Creates ephemeral app with live-reloading. Changes to endpoints appear almost immediately. - -### Deployment with `modal deploy` - -```bash -modal deploy server.py -``` - -Creates persistent endpoint with stable URL. +- [Simple Endpoints](#simple-endpoints) +- [Deployment](#deployment) +- [ASGI Apps](#asgi-apps-fastapi-starlette-fasthtml) +- [WSGI Apps](#wsgi-apps-flask-django) +- [Custom Web Servers](#custom-web-servers) +- [WebSockets](#websockets) +- [Authentication](#authentication) +- [Streaming](#streaming) +- [Concurrency](#concurrency) +- [Limits](#limits) ## Simple Endpoints -### Query Parameters +The easiest way to create a web endpoint: ```python -@app.function(image=image) -@modal.fastapi_endpoint() -def square(x: int): - return {"square": x**2} -``` +import modal -Call with: -```bash -curl "https://workspace--app-square.modal.run?x=42" -``` - -### POST Requests - -```python -@app.function(image=image) -@modal.fastapi_endpoint(method="POST") -def square(item: dict): - return {"square": item['x']**2} -``` - -Call with: -```bash -curl -X POST -H 'Content-Type: application/json' \ - --data '{"x": 42}' \ - https://workspace--app-square.modal.run -``` - -### Pydantic Models - -```python -from pydantic import BaseModel - -class Item(BaseModel): - name: str - qty: int = 42 +app = modal.App("api-service") @app.function() -@modal.fastapi_endpoint(method="POST") -def process(item: Item): - return {"processed": item.name, "quantity": item.qty} +@modal.fastapi_endpoint() +def hello(name: str = "World"): + return {"message": f"Hello, {name}!"} ``` +### POST Endpoints + +```python +@app.function() +@modal.fastapi_endpoint(method="POST") +def predict(data: dict): + result = model.predict(data["text"]) + return {"prediction": result} +``` + +### Query Parameters + +Parameters are automatically parsed from query strings: + +```python +@app.function() +@modal.fastapi_endpoint() +def search(query: str, limit: int = 10): + return {"results": do_search(query, limit)} +``` + +Access via: `https://your-app.modal.run?query=hello&limit=5` + +## Deployment + +### Development Mode + +```bash +modal serve script.py +``` + +- Creates a temporary public URL +- Hot-reloads on file changes +- Perfect for development and testing +- URL expires when you stop the command + +### Production Deployment + +```bash +modal deploy script.py +``` + +- Creates a permanent URL +- Runs persistently in the cloud +- Autoscales based on traffic +- URL format: `https://---.modal.run` + ## ASGI Apps (FastAPI, Starlette, FastHTML) -Serve full ASGI applications: +For full framework applications, use `@modal.asgi_app`: ```python -image = modal.Image.debian_slim().pip_install("fastapi[standard]") +from fastapi import FastAPI -@app.function(image=image) -@modal.concurrent(max_inputs=100) +web_app = FastAPI() + +@web_app.get("/") +async def root(): + return {"status": "ok"} + +@web_app.post("/predict") +async def predict(request: dict): + return {"result": model.run(request["input"])} + +@app.function(image=image, gpu="L40S") @modal.asgi_app() def fastapi_app(): - from fastapi import FastAPI - - web_app = FastAPI() - - @web_app.get("/") - async def root(): - return {"message": "Hello"} - - @web_app.post("/echo") - async def echo(request: Request): - body = await request.json() - return body - return web_app ``` +### With Class Lifecycle + +```python +@app.cls(gpu="L40S", image=image) +class InferenceService: + @modal.enter() + def load_model(self): + self.model = load_model() + + @modal.asgi_app() + def serve(self): + from fastapi import FastAPI + app = FastAPI() + + @app.post("/generate") + async def generate(request: dict): + return self.model.generate(request["prompt"]) + + return app +``` + ## WSGI Apps (Flask, Django) -Serve synchronous web frameworks: - ```python -image = modal.Image.debian_slim().pip_install("flask") +from flask import Flask + +flask_app = Flask(__name__) + +@flask_app.route("/") +def index(): + return {"status": "ok"} @app.function(image=image) -@modal.concurrent(max_inputs=100) @modal.wsgi_app() -def flask_app(): - from flask import Flask, request - - web_app = Flask(__name__) - - @web_app.post("/echo") - def echo(): - return request.json - - return web_app +def flask_server(): + return flask_app ``` -## Non-ASGI Web Servers +WSGI is synchronous — concurrent inputs run on separate threads. -For frameworks with custom network binding: +## Custom Web Servers -> ⚠️ **Security Note**: The example below uses `shell=True` for simplicity. In production environments, prefer using `subprocess.Popen()` with a list of arguments to prevent command injection vulnerabilities. +For non-standard web frameworks (aiohttp, Tornado, TGI): ```python -@app.function() -@modal.concurrent(max_inputs=100) -@modal.web_server(8000) -def my_server(): +@app.function(image=image, gpu="H100") +@modal.web_server(port=8000) +def serve(): import subprocess - # Must bind to 0.0.0.0, not 127.0.0.1 - # Use list form instead of shell=True for security - subprocess.Popen(["python", "-m", "http.server", "-d", "/", "8000"]) + subprocess.Popen([ + "python", "-m", "vllm.entrypoints.openai.api_server", + "--model", "meta-llama/Llama-3-70B", + "--host", "0.0.0.0", # Must bind to 0.0.0.0, not localhost + "--port", "8000", + ]) ``` -## Streaming Responses - -Use FastAPI's `StreamingResponse`: - -```python -import time - -def event_generator(): - for i in range(10): - yield f"data: event {i}\n\n".encode() - time.sleep(0.5) - -@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) -@modal.fastapi_endpoint() -def stream(): - from fastapi.responses import StreamingResponse - return StreamingResponse( - event_generator(), - media_type="text/event-stream" - ) -``` - -### Streaming from Modal Functions - -```python -@app.function(gpu="any") -def process_gpu(): - for i in range(10): - yield f"data: result {i}\n\n".encode() - time.sleep(1) - -@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) -@modal.fastapi_endpoint() -def hook(): - from fastapi.responses import StreamingResponse - return StreamingResponse( - process_gpu.remote_gen(), - media_type="text/event-stream" - ) -``` - -### With .map() - -```python -@app.function() -def process_segment(i): - return f"segment {i}\n" - -@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) -@modal.fastapi_endpoint() -def stream_parallel(): - from fastapi.responses import StreamingResponse - return StreamingResponse( - process_segment.map(range(10)), - media_type="text/plain" - ) -``` +The application must bind to `0.0.0.0` (not `127.0.0.1`). ## WebSockets -Supported with `@web_server`, `@asgi_app`, and `@wsgi_app`. Maintains single function call per connection. Use with `@modal.concurrent` for multiple simultaneous connections. +Supported with `@modal.asgi_app`, `@modal.wsgi_app`, and `@modal.web_server`: -Full WebSocket protocol (RFC 6455) supported. Messages up to 2 MiB each. +```python +from fastapi import FastAPI, WebSocket + +web_app = FastAPI() + +@web_app.websocket("/ws") +async def websocket_endpoint(websocket: WebSocket): + await websocket.accept() + while True: + data = await websocket.receive_text() + result = process(data) + await websocket.send_text(result) + +@app.function() +@modal.asgi_app() +def ws_app(): + return web_app +``` + +- Full WebSocket protocol (RFC 6455) +- Messages up to 2 MiB each +- No RFC 8441 or RFC 7692 support yet ## Authentication -### Proxy Auth Tokens +### Proxy Auth Tokens (Built-in) -First-class authentication via Modal: +Modal provides first-class endpoint protection via proxy auth tokens: ```python @app.function() @modal.fastapi_endpoint() -def protected(): - return "authenticated!" +def protected(text: str): + return {"result": process(text)} ``` -Protect with tokens in settings, pass in headers: -- `Modal-Key` -- `Modal-Secret` +Clients include `Modal-Key` and `Modal-Secret` headers to authenticate. -### Bearer Token Authentication +### Custom Bearer Tokens ```python -from fastapi import Depends, HTTPException, status -from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials +from fastapi import Header, HTTPException -auth_scheme = HTTPBearer() - -@app.function(secrets=[modal.Secret.from_name("auth-token")]) -@modal.fastapi_endpoint() -async def protected(token: HTTPAuthorizationCredentials = Depends(auth_scheme)): +@app.function(secrets=[modal.Secret.from_name("auth-secret")]) +@modal.fastapi_endpoint(method="POST") +def secure_predict(data: dict, authorization: str = Header(None)): import os - if token.credentials != os.environ["AUTH_TOKEN"]: - raise HTTPException( - status_code=status.HTTP_401_UNAUTHORIZED, - detail="Invalid token" - ) - return "success!" + expected = os.environ["AUTH_TOKEN"] + if authorization != f"Bearer {expected}": + raise HTTPException(status_code=401, detail="Unauthorized") + return {"result": model.predict(data["text"])} ``` -### Client IP Address +### Client IP Access + +Available for geolocation, rate limiting, and access control. + +## Streaming + +### Server-Sent Events (SSE) ```python -from fastapi import Request +from fastapi.responses import StreamingResponse -@app.function() +@app.function(gpu="H100") @modal.fastapi_endpoint() -def get_ip(request: Request): - return f"Your IP: {request.client.host}" +def stream_generate(prompt: str): + def generate(): + for token in model.stream(prompt): + yield f"data: {token}\n\n" + return StreamingResponse(generate(), media_type="text/event-stream") ``` -## Web Endpoint URLs +## Concurrency -### Auto-Generated URLs - -Format: `https://---.modal.run` - -With environment suffix: `https://----.modal.run` - -### Custom Labels +Handle multiple requests per container using `@modal.concurrent`: ```python -@app.function() -@modal.fastapi_endpoint(label="api") -def handler(): - ... -# URL: https://workspace--api.modal.run +@app.function(gpu="L40S") +@modal.concurrent(max_inputs=10) +@modal.fastapi_endpoint(method="POST") +async def batch_predict(data: dict): + return {"result": await model.predict_async(data["text"])} ``` -### Programmatic URL Retrieval - -```python -@app.function() -@modal.fastapi_endpoint() -def my_endpoint(): - url = my_endpoint.get_web_url() - return {"url": url} - -# From deployed function -f = modal.Function.from_name("app-name", "my_endpoint") -url = f.get_web_url() -``` - -### Custom Domains - -Available on Team and Enterprise plans: - -```python -@app.function() -@modal.fastapi_endpoint(custom_domains=["api.example.com"]) -def hello(message: str): - return {"message": f"hello {message}"} -``` - -Multiple domains: -```python -@modal.fastapi_endpoint(custom_domains=["api.example.com", "api.example.net"]) -``` - -Wildcard domains: -```python -@modal.fastapi_endpoint(custom_domains=["*.example.com"]) -``` - -TLS certificates automatically generated and renewed. - -## Performance - -### Cold Starts - -First request may experience cold start (few seconds). Modal keeps containers alive for subsequent requests. - -### Scaling - -- Autoscaling based on traffic -- Use `@modal.concurrent` for multiple requests per container -- Beyond concurrency limit, additional containers spin up -- Requests queue when at max containers - -### Rate Limits - -Default: 200 requests/second with 5-second burst multiplier -- Excess returns 429 status code -- Contact support to increase limits - -### Size Limits +## Limits - Request body: up to 4 GiB - Response body: unlimited -- WebSocket messages: up to 2 MiB +- Rate limit: 200 requests/second (5-second burst for new accounts) +- Cold starts occur when no containers are active (use `min_containers` to avoid)