Update Modal skill

This commit is contained in:
Timothy Kassis
2026-03-23 16:21:31 -07:00
parent 71e26ffa6d
commit b75f4e8d08
15 changed files with 2062 additions and 2413 deletions

View File

@@ -6,7 +6,7 @@
},
"metadata": {
"description": "Claude scientific skills from K-Dense Inc",
"version": "2.29.0"
"version": "2.30.0"
},
"plugins": [
{

View File

@@ -77,7 +77,7 @@
### Data Management & Infrastructure
- **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing
- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container), persistent storage via Volumes for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs, parallel execution with `.map()` for batch processing, input concurrency for I/O-bound workloads, and resource configuration (CPU cores, memory, disk). Supports custom Docker images, integration with Hugging Face/Weights & Biases, FastAPI for web endpoints, and distributed training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, embeddings), GPU-accelerated training, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200, B200+), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv (recommended)/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container, up to 1,536 GB VRAM), persistent storage via Volumes (v1 and v2) for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs (FastAPI, ASGI, WSGI, WebSockets), parallel execution with `.map()` for batch processing, input concurrency and dynamic batching for I/O-bound workloads, and resource configuration (CPU cores, memory, ephemeral disk up to 3 TiB). Supports custom Docker images, Micromamba/Conda environments, integration with Hugging Face/Weights & Biases, and distributed multi-GPU training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, speech, embeddings), GPU-accelerated training and fine-tuning, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, protein folding and computational biology, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
### Cheminformatics & Drug Discovery
- **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines

View File

@@ -1,7 +1,7 @@
---
name: modal
description: Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.
license: Apache-2.0 license
description: Cloud computing platform for running Python on GPUs and serverless infrastructure. Use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud. Use this skill whenever the user mentions Modal, serverless GPU compute, deploying ML models to the cloud, serving inference endpoints, running batch processing in the cloud, or needs to scale Python workloads beyond their local machine. Also use when the user wants to run code on H100s, A100s, or other cloud GPUs, or needs to create a web API for a model.
license: Apache-2.0
metadata:
skill-author: K-Dense Inc.
---
@@ -10,372 +10,391 @@ metadata:
## Overview
Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.
Modal is a cloud platform for running Python code serverlessly, with a focus on AI/ML workloads. Key capabilities:
- **GPU compute** on demand (T4, L4, A10, L40S, A100, H100, H200, B200)
- **Serverless functions** with autoscaling from zero to thousands of containers
- **Custom container images** built entirely in Python code
- **Persistent storage** via Volumes for model weights and datasets
- **Web endpoints** for serving models and APIs
- **Scheduled jobs** via cron or fixed intervals
- **Sub-second cold starts** for low-latency inference
Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.
Everything in Modal is defined as code — no YAML, no Dockerfiles required (though both are supported).
## When to Use This Skill
Use Modal for:
- Deploying and serving ML models (LLMs, image generation, embedding models)
- Running GPU-accelerated computation (training, inference, rendering)
- Batch processing large datasets in parallel
- Scheduling compute-intensive jobs (daily data processing, model training)
- Building serverless APIs that need automatic scaling
- Scientific computing requiring distributed compute or specialized hardware
Use this skill when:
- Deploy or serve AI/ML models in the cloud
- Run GPU-accelerated computations (training, inference, fine-tuning)
- Create serverless web APIs or endpoints
- Scale batch processing jobs in parallel
- Schedule recurring tasks (data pipelines, retraining, scraping)
- Need persistent cloud storage for model weights or datasets
- Want to run code in custom container environments
- Build job queues or async task processing systems
## Authentication and Setup
## Installation and Authentication
Modal requires authentication via API token.
### Initial Setup
### Install
```bash
# Install Modal
uv uv pip install modal
# Authenticate (opens browser for login)
modal token new
uv pip install modal
```
This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations.
### Authenticate
### Verify Setup
```bash
modal setup
```
This opens a browser for authentication. For CI/CD or headless environments, set environment variables:
```bash
export MODAL_TOKEN_ID=<your-token-id>
export MODAL_TOKEN_SECRET=<your-token-secret>
```
Generate tokens at https://modal.com/settings
Modal offers a free tier with $30/month in credits.
**Reference**: See `references/getting-started.md` for detailed setup and first app walkthrough.
## Core Concepts
### App and Functions
A Modal `App` groups related functions. Functions decorated with `@app.function()` run remotely in the cloud:
```python
import modal
app = modal.App("test-app")
app = modal.App("my-app")
@app.function()
def hello():
print("Modal is working!")
```
def square(x):
return x ** 2
Run with: `modal run script.py`
## Core Capabilities
Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.
### 1. Define Container Images
Specify dependencies and environment for functions using Modal Images.
```python
import modal
# Basic image with Python packages
image = (
modal.Image.debian_slim(python_version="3.12")
.uv_pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
```
**Common patterns:**
- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
- Install system packages: `.apt_install("ffmpeg", "git")`
- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- Add local code: `.add_local_python_source("my_module")`
See `references/images.md` for comprehensive image building documentation.
### 2. Create Functions
Define functions that run in the cloud with the `@app.function()` decorator.
```python
@app.function()
def process_data(file_path: str):
import pandas as pd
df = pd.read_csv(file_path)
return df.describe()
```
**Call functions:**
```python
# From local entrypoint
@app.local_entrypoint()
def main():
result = process_data.remote("data.csv")
print(result)
# .remote() runs in the cloud
print(square.remote(42))
```
Run with: `modal run script.py`
Run with `modal run script.py`. Deploy with `modal deploy script.py`.
See `references/functions.md` for function patterns, deployment, and parameter handling.
**Reference**: See `references/functions.md` for lifecycle hooks, classes, `.map()`, `.spawn()`, and more.
### 3. Request GPUs
### Container Images
Attach GPUs to functions for accelerated computation.
Modal builds container images from Python code. The recommended package installer is `uv`:
```python
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("torch==2.8.0", "transformers", "accelerate")
.apt_install("git")
)
@app.function(image=image)
def inference(prompt):
from transformers import pipeline
pipe = pipeline("text-generation", model="meta-llama/Llama-3-8B")
return pipe(prompt)
```
Key image methods:
- `.uv_pip_install()` — Install Python packages with uv (recommended)
- `.pip_install()` — Install with pip (fallback)
- `.apt_install()` — Install system packages
- `.run_commands()` — Run shell commands during build
- `.run_function()` — Run Python during build (e.g., download model weights)
- `.add_local_python_source()` — Add local modules
- `.env()` — Set environment variables
**Reference**: See `references/images.md` for Dockerfiles, micromamba, caching, GPU build steps.
### GPU Compute
Request GPUs via the `gpu` parameter:
```python
@app.function(gpu="H100")
def train_model():
import torch
assert torch.cuda.is_available()
# GPU-accelerated code here
device = torch.device("cuda")
# GPU training code here
# Multiple GPUs
@app.function(gpu="H100:4")
def distributed_training():
...
# GPU fallback chain
@app.function(gpu=["H100", "A100-80GB", "A100-40GB"])
def flexible_inference():
...
```
**Available GPU types:**
- `T4`, `L4` - Cost-effective inference
- `A10`, `A100`, `A100-80GB` - Standard training/inference
- `L40S` - Excellent cost/performance balance (48GB)
- `H100`, `H200` - High-performance training
- `B200` - Flagship performance (most powerful)
Available GPUs: T4, L4, A10, L40S, A100-40GB, A100-80GB, H100, H200, B200, B200+
**Request multiple GPUs:**
```python
@app.function(gpu="H100:8") # 8x H100 GPUs
def train_large_model():
pass
```
- Up to 8 GPUs per container (except A10: up to 4)
- L40S is recommended for inference (cost/performance balance, 48 GB VRAM)
- H100/A100 can be auto-upgraded to H200/A100-80GB at no extra cost
- Use `gpu="H100!"` to prevent auto-upgrade
See `references/gpu.md` for GPU selection guidance, CUDA setup, and multi-GPU configuration.
**Reference**: See `references/gpu.md` for GPU selection guidance and multi-GPU training.
### 4. Configure Resources
### Volumes (Persistent Storage)
Request CPU cores, memory, and disk for functions.
Volumes provide distributed, persistent file storage:
```python
@app.function(
cpu=8.0, # 8 physical cores
memory=32768, # 32 GiB RAM
ephemeral_disk=10240 # 10 GiB disk
)
def memory_intensive_task():
pass
vol = modal.Volume.from_name("model-weights", create_if_missing=True)
@app.function(volumes={"/data": vol})
def save_model():
# Write to the mounted path
with open("/data/model.pt", "wb") as f:
torch.save(model.state_dict(), f)
@app.function(volumes={"/data": vol})
def load_model():
model.load_state_dict(torch.load("/data/model.pt"))
```
Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.
- Optimized for write-once, read-many workloads (model weights, datasets)
- CLI access: `modal volume ls`, `modal volume put`, `modal volume get`
- Background auto-commits every few seconds
See `references/resources.md` for resource limits and billing details.
**Reference**: See `references/volumes.md` for v2 volumes, concurrent writes, and best practices.
### 5. Scale Automatically
### Secrets
Modal autoscales functions from zero to thousands of containers based on demand.
Securely pass credentials to functions:
```python
@app.function(secrets=[modal.Secret.from_name("my-api-keys")])
def call_api():
import os
api_key = os.environ["API_KEY"]
# Use the key
```
Create secrets via CLI: `modal secret create my-api-keys API_KEY=sk-xxx`
Or from a `.env` file: `modal.Secret.from_dotenv()`
**Reference**: See `references/secrets.md` for dashboard setup, multiple secrets, and templates.
### Web Endpoints
Serve models and APIs as web endpoints:
**Process inputs in parallel:**
```python
@app.function()
def analyze_sample(sample_id: int):
# Process single sample
return result
@app.local_entrypoint()
def main():
sample_ids = range(1000)
# Automatically parallelized across containers
results = list(analyze_sample.map(sample_ids))
@modal.fastapi_endpoint()
def predict(text: str):
return {"result": model.predict(text)}
```
**Configure autoscaling:**
- `modal serve script.py` — Development with hot reload and temporary URL
- `modal deploy script.py` — Production deployment with permanent URL
- Supports FastAPI, ASGI (Starlette, FastHTML), WSGI (Flask, Django), WebSockets
- Request bodies up to 4 GiB, unlimited response size
**Reference**: See `references/web-endpoints.md` for ASGI/WSGI apps, streaming, auth, and WebSockets.
### Scheduled Jobs
Run functions on a schedule:
```python
@app.function(schedule=modal.Cron("0 9 * * *")) # Daily at 9 AM UTC
def daily_pipeline():
# ETL, retraining, scraping, etc.
...
@app.function(schedule=modal.Period(hours=6))
def periodic_check():
...
```
Deploy with `modal deploy script.py` to activate the schedule.
- `modal.Cron("...")` — Standard cron syntax, stable across deploys
- `modal.Period(hours=N)` — Fixed interval, resets on redeploy
- Monitor runs in the Modal dashboard
**Reference**: See `references/scheduled-jobs.md` for cron syntax and management.
### Scaling and Concurrency
Modal autoscales containers automatically. Configure limits:
```python
@app.function(
max_containers=100, # Upper limit
min_containers=2, # Keep warm
buffer_containers=5 # Idle buffer for bursts
min_containers=2, # Keep warm for low latency
buffer_containers=5, # Reserve capacity
scaledown_window=300, # Idle seconds before shutdown
)
def inference():
pass
def process(data):
...
```
See `references/scaling.md` for autoscaling configuration, concurrency, and scaling limits.
### 6. Store Data Persistently
Use Volumes for persistent storage across function invocations.
Process inputs in parallel with `.map()`:
```python
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # Persist changes
results = list(process.map([item1, item2, item3, ...]))
```
Volumes persist data between runs, store model weights, cache datasets, and share data between functions.
See `references/volumes.md` for volume management, commits, and caching patterns.
### 7. Manage Secrets
Store API keys and credentials securely using Modal Secrets.
```python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os
token = os.environ["HF_TOKEN"]
# Use token for authentication
```
**Create secrets in Modal dashboard or via CLI:**
```bash
modal secret create my-secret KEY=value API_TOKEN=xyz
```
See `references/secrets.md` for secret management and authentication patterns.
### 8. Deploy Web Endpoints
Serve HTTP endpoints, APIs, and webhooks with `@modal.web_endpoint()`.
Enable concurrent request handling per container:
```python
@app.function()
@modal.web_endpoint(method="POST")
def predict(data: dict):
# Process request
result = model.predict(data["input"])
return {"prediction": result}
@modal.concurrent(max_inputs=10)
async def handle_request(req):
...
```
**Deploy with:**
```bash
modal deploy script.py
```
**Reference**: See `references/scaling.md` for `.map()`, `.starmap()`, `.spawn()`, and limits.
Modal provides HTTPS URL for the endpoint.
See `references/web-endpoints.md` for FastAPI integration, streaming, authentication, and WebSocket support.
### 9. Schedule Jobs
Run functions on a schedule with cron expressions.
### Resource Configuration
```python
@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM
def daily_backup():
# Backup data
pass
@app.function(schedule=modal.Period(hours=4)) # Every 4 hours
def refresh_cache():
# Update cache
pass
@app.function(
cpu=4.0, # Physical cores (not vCPUs)
memory=16384, # MiB
ephemeral_disk=51200, # MiB (up to 3 TiB)
timeout=3600, # Seconds
)
def heavy_computation():
...
```
Scheduled functions run automatically without manual invocation.
Defaults: 0.125 CPU cores, 128 MiB memory. Billed on max(request, usage).
See `references/scheduled-jobs.md` for cron syntax, timezone configuration, and monitoring.
**Reference**: See `references/resources.md` for limits and billing details.
## Common Workflows
## Classes with Lifecycle Hooks
### Deploy ML Model for Inference
For stateful workloads (e.g., loading a model once and serving many requests):
```python
@app.cls(gpu="L40S", image=image)
class Predictor:
@modal.enter()
def load_model(self):
self.model = load_heavy_model() # Runs once on container start
@modal.method()
def predict(self, text: str):
return self.model(text)
@modal.exit()
def cleanup(self):
... # Runs on container shutdown
```
Call with: `Predictor().predict.remote("hello")`
## Common Workflow Patterns
### GPU Model Inference Service
```python
import modal
# Define dependencies
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
app = modal.App("llm-inference", image=image)
app = modal.App("llm-service")
# Download model at build time
@app.function()
def download_model():
from transformers import AutoModel
AutoModel.from_pretrained("bert-base-uncased")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("vllm")
)
# Serve model
@app.cls(gpu="L40S")
class Model:
@app.cls(gpu="H100", image=image, min_containers=1)
class LLMService:
@modal.enter()
def load_model(self):
from transformers import pipeline
self.pipe = pipeline("text-classification", device="cuda")
def load(self):
from vllm import LLM
self.llm = LLM(model="meta-llama/Llama-3-70B")
@modal.method()
def predict(self, text: str):
return self.pipe(text)
@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Modal is great!")
print(result)
@modal.fastapi_endpoint(method="POST")
def generate(self, prompt: str, max_tokens: int = 256):
outputs = self.llm.generate([prompt], max_tokens=max_tokens)
return {"text": outputs[0].outputs[0].text}
```
### Batch Process Large Dataset
### Batch Processing Pipeline
```python
@app.function(cpu=2.0, memory=4096)
def process_file(file_path: str):
app = modal.App("batch-pipeline")
vol = modal.Volume.from_name("pipeline-data", create_if_missing=True)
@app.function(volumes={"/data": vol}, cpu=4.0, memory=8192)
def process_chunk(chunk_id: int):
import pandas as pd
df = pd.read_csv(file_path)
# Process data
return df.shape[0]
df = pd.read_parquet(f"/data/input/chunk_{chunk_id}.parquet")
result = heavy_transform(df)
result.to_parquet(f"/data/output/chunk_{chunk_id}.parquet")
return len(result)
@app.local_entrypoint()
def main():
files = ["file1.csv", "file2.csv", ...] # 1000s of files
# Automatically parallelized across containers
for count in process_file.map(files):
print(f"Processed {count} rows")
chunk_ids = list(range(100))
results = list(process_chunk.map(chunk_ids))
print(f"Processed {sum(results)} total rows")
```
### Train Model on GPU
### Scheduled Data Pipeline
```python
app = modal.App("etl-pipeline")
@app.function(
gpu="A100:2", # 2x A100 GPUs
timeout=3600 # 1 hour timeout
schedule=modal.Cron("0 */6 * * *"), # Every 6 hours
secrets=[modal.Secret.from_name("db-credentials")],
)
def train_model(config: dict):
import torch
# Multi-GPU training code
model = create_model(config)
train(model)
return metrics
def etl_job():
import os
db_url = os.environ["DATABASE_URL"]
# Extract, transform, load
...
```
## Reference Documentation
## CLI Reference
Detailed documentation for specific features:
| Command | Description |
|---------|-------------|
| `modal setup` | Authenticate with Modal |
| `modal run script.py` | Run a script's local entrypoint |
| `modal serve script.py` | Dev server with hot reload |
| `modal deploy script.py` | Deploy to production |
| `modal volume ls <name>` | List files in a volume |
| `modal volume put <name> <file>` | Upload file to volume |
| `modal volume get <name> <file>` | Download file from volume |
| `modal secret create <name> K=V` | Create a secret |
| `modal secret list` | List secrets |
| `modal app list` | List deployed apps |
| `modal app stop <name>` | Stop a deployed app |
- **`references/getting-started.md`** - Authentication, setup, basic concepts
- **`references/images.md`** - Image building, dependencies, Dockerfiles
- **`references/functions.md`** - Function patterns, deployment, parameters
- **`references/gpu.md`** - GPU types, CUDA, multi-GPU configuration
- **`references/resources.md`** - CPU, memory, disk management
- **`references/scaling.md`** - Autoscaling, parallel execution, concurrency
- **`references/volumes.md`** - Persistent storage, data management
- **`references/secrets.md`** - Environment variables, authentication
- **`references/web-endpoints.md`** - APIs, webhooks, endpoints
- **`references/scheduled-jobs.md`** - Cron jobs, periodic tasks
- **`references/examples.md`** - Common patterns for scientific computing
## Reference Files
## Best Practices
Detailed documentation for each topic:
1. **Pin dependencies** in `.uv_pip_install()` for reproducible builds
2. **Use appropriate GPU types** - L40S for inference, H100/A100 for training
3. **Leverage caching** - Use Volumes for model weights and datasets
4. **Configure autoscaling** - Set `max_containers` and `min_containers` based on workload
5. **Import packages in function body** if not available locally
6. **Use `.map()` for parallel processing** instead of sequential loops
7. **Store secrets securely** - Never hardcode API keys
8. **Monitor costs** - Check Modal dashboard for usage and billing
## Troubleshooting
**"Module not found" errors:**
- Add packages to image with `.uv_pip_install("package-name")`
- Import packages inside function body if not available locally
**GPU not detected:**
- Verify GPU specification: `@app.function(gpu="A100")`
- Check CUDA availability: `torch.cuda.is_available()`
**Function timeout:**
- Increase timeout: `@app.function(timeout=3600)`
- Default timeout is 5 minutes
**Volume changes not persisting:**
- Call `volume.commit()` after writing files
- Verify volume mounted correctly in function decorator
For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.
- `references/getting-started.md` — Installation, authentication, first app
- `references/functions.md` — Functions, classes, lifecycle hooks, remote execution
- `references/images.md` — Container images, package installation, caching
- `references/gpu.md` — GPU types, selection, multi-GPU, training
- `references/volumes.md` — Persistent storage, file management, v2 volumes
- `references/secrets.md` — Credentials, environment variables, dotenv
- `references/web-endpoints.md` — FastAPI, ASGI/WSGI, streaming, auth, WebSockets
- `references/scheduled-jobs.md` Cron, periodic schedules, management
- `references/scaling.md` — Autoscaling, concurrency, .map(), limits
- `references/resources.md` — CPU, memory, disk, timeout configuration
- `references/examples.md` — Common use cases and patterns
- `references/api_reference.md` — Key API classes and methods
Read these files when detailed information is needed beyond this overview.

View File

@@ -1,34 +1,187 @@
# Reference Documentation for Modal
# Modal API Reference
This is a placeholder for detailed reference documentation.
Replace with actual reference content or delete if not needed.
## Core Classes
Example real reference docs from other skills:
- product-management/references/communication.md - Comprehensive guide for status updates
- product-management/references/context_building.md - Deep-dive on gathering context
- bigquery/references/ - API references and query examples
### modal.App
## When Reference Docs Are Useful
The main unit of deployment. Groups related functions.
Reference docs are ideal for:
- Comprehensive API documentation
- Detailed workflow guides
- Complex multi-step processes
- Information too lengthy for main SKILL.md
- Content that's only needed for specific use cases
```python
app = modal.App("my-app")
```
## Structure Suggestions
| Method | Description |
|--------|-------------|
| `app.function(**kwargs)` | Decorator to register a function |
| `app.cls(**kwargs)` | Decorator to register a class |
| `app.local_entrypoint()` | Decorator for local entry point |
### API Reference Example
- Overview
- Authentication
- Endpoints with examples
- Error codes
- Rate limits
### modal.Function
### Workflow Guide Example
- Prerequisites
- Step-by-step instructions
- Common patterns
- Troubleshooting
- Best practices
A serverless function backed by an autoscaling container pool.
| Method | Description |
|--------|-------------|
| `.remote(*args)` | Execute in the cloud (sync) |
| `.local(*args)` | Execute locally |
| `.spawn(*args)` | Execute async, returns `FunctionCall` |
| `.map(inputs)` | Parallel execution over inputs |
| `.starmap(inputs)` | Parallel execution with multiple args |
| `.from_name(app, fn)` | Reference a deployed function |
| `.update_autoscaler(**kwargs)` | Dynamic scaling update |
### modal.Cls
A serverless class with lifecycle hooks.
```python
@app.cls(gpu="L40S")
class MyClass:
@modal.enter()
def setup(self): ...
@modal.method()
def run(self, data): ...
@modal.exit()
def cleanup(self): ...
```
| Decorator | Description |
|-----------|-------------|
| `@modal.enter()` | Container startup hook |
| `@modal.exit()` | Container shutdown hook |
| `@modal.method()` | Expose as callable method |
| `@modal.parameter()` | Class-level parameter |
## Image
### modal.Image
Defines the container environment.
| Method | Description |
|--------|-------------|
| `.debian_slim(python_version=)` | Debian base image |
| `.from_registry(tag)` | Docker Hub image |
| `.from_dockerfile(path)` | Build from Dockerfile |
| `.micromamba(python_version=)` | Conda/mamba base |
| `.uv_pip_install(*pkgs)` | Install with uv (recommended) |
| `.pip_install(*pkgs)` | Install with pip |
| `.pip_install_from_requirements(path)` | Install from file |
| `.apt_install(*pkgs)` | Install system packages |
| `.run_commands(*cmds)` | Run shell commands |
| `.run_function(fn)` | Run Python during build |
| `.add_local_dir(local, remote)` | Add directory |
| `.add_local_file(local, remote)` | Add single file |
| `.add_local_python_source(module)` | Add Python module |
| `.env(dict)` | Set environment variables |
| `.imports()` | Context manager for remote imports |
## Storage
### modal.Volume
Distributed persistent file storage.
```python
vol = modal.Volume.from_name("name", create_if_missing=True)
```
| Method | Description |
|--------|-------------|
| `.from_name(name)` | Reference or create a volume |
| `.commit()` | Force immediate commit |
| `.reload()` | Refresh to see other containers' writes |
Mount: `@app.function(volumes={"/path": vol})`
### modal.NetworkFileSystem
Legacy shared storage (superseded by Volume).
## Secrets
### modal.Secret
Secure credential injection.
| Method | Description |
|--------|-------------|
| `.from_name(name)` | Reference a named secret |
| `.from_dict(dict)` | Create inline (dev only) |
| `.from_dotenv()` | Load from .env file |
Usage: `@app.function(secrets=[modal.Secret.from_name("x")])`
Access in function: `os.environ["KEY"]`
## Scheduling
### modal.Cron
```python
schedule = modal.Cron("0 9 * * *") # Cron syntax
```
### modal.Period
```python
schedule = modal.Period(hours=6) # Fixed interval
```
Usage: `@app.function(schedule=modal.Cron("..."))`
## Web
### Decorators
| Decorator | Description |
|-----------|-------------|
| `@modal.fastapi_endpoint()` | Simple FastAPI endpoint |
| `@modal.asgi_app()` | Full ASGI app (FastAPI, Starlette) |
| `@modal.wsgi_app()` | Full WSGI app (Flask, Django) |
| `@modal.web_server(port=)` | Custom web server |
### Function Modifiers
| Decorator | Description |
|-----------|-------------|
| `@modal.concurrent(max_inputs=)` | Handle multiple inputs per container |
| `@modal.batched(max_batch_size=, wait_ms=)` | Dynamic input batching |
## GPU Strings
| String | GPU |
|--------|-----|
| `"T4"` | NVIDIA T4 16GB |
| `"L4"` | NVIDIA L4 24GB |
| `"A10"` | NVIDIA A10 24GB |
| `"L40S"` | NVIDIA L40S 48GB |
| `"A100-40GB"` | NVIDIA A100 40GB |
| `"A100-80GB"` | NVIDIA A100 80GB |
| `"H100"` | NVIDIA H100 80GB |
| `"H100!"` | H100 (no auto-upgrade) |
| `"H200"` | NVIDIA H200 141GB |
| `"B200"` | NVIDIA B200 192GB |
| `"B200+"` | B200 or B300, B200 price |
| `"H100:4"` | 4x H100 |
## CLI Commands
| Command | Description |
|---------|-------------|
| `modal setup` | Authenticate |
| `modal run <file>` | Run local entrypoint |
| `modal serve <file>` | Dev server with hot reload |
| `modal deploy <file>` | Production deployment |
| `modal app list` | List deployed apps |
| `modal app stop <name>` | Stop an app |
| `modal volume create <name>` | Create volume |
| `modal volume ls <name>` | List volume files |
| `modal volume put <name> <file>` | Upload to volume |
| `modal volume get <name> <file>` | Download from volume |
| `modal secret create <name> K=V` | Create secret |
| `modal secret list` | List secrets |
| `modal secret delete <name>` | Delete secret |
| `modal token set` | Set auth token |

View File

@@ -1,433 +1,266 @@
# Common Patterns for Scientific Computing
# Modal Common Examples
## Machine Learning Model Inference
### Basic Model Serving
## LLM Inference Service (vLLM)
```python
import modal
app = modal.App("ml-inference")
app = modal.App("vllm-service")
image = (
modal.Image.debian_slim()
.uv_pip_install("torch", "transformers")
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("vllm>=0.6.0")
)
@app.cls(
image=image,
gpu="L40S",
)
class Model:
@app.cls(gpu="H100", image=image, min_containers=1)
class LLMService:
@modal.enter()
def load_model(self):
from transformers import AutoModel, AutoTokenizer
self.model = AutoModel.from_pretrained("bert-base-uncased")
self.tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
def load(self):
from vllm import LLM
self.llm = LLM(model="meta-llama/Llama-3-70B-Instruct")
@modal.method()
def predict(self, text: str):
inputs = self.tokenizer(text, return_tensors="pt")
outputs = self.model(**inputs)
return outputs.last_hidden_state.mean(dim=1).tolist()
def generate(self, prompt: str, max_tokens: int = 512) -> str:
from vllm import SamplingParams
params = SamplingParams(max_tokens=max_tokens, temperature=0.7)
outputs = self.llm.generate([prompt], params)
return outputs[0].outputs[0].text
@app.local_entrypoint()
def main():
model = Model()
result = model.predict.remote("Hello world")
print(result)
@modal.fastapi_endpoint(method="POST")
def api(self, request: dict):
text = self.generate(request["prompt"], request.get("max_tokens", 512))
return {"text": text}
```
### Model Serving with Volume
## Image Generation (Flux)
```python
volume = modal.Volume.from_name("models", create_if_missing=True)
MODEL_PATH = "/models"
import modal
@app.cls(
image=image,
gpu="A100",
volumes={MODEL_PATH: volume}
app = modal.App("image-gen")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("diffusers", "torch", "transformers", "accelerate")
)
class ModelServer:
vol = modal.Volume.from_name("flux-weights", create_if_missing=True)
@app.cls(gpu="L40S", image=image, volumes={"/models": vol})
class ImageGenerator:
@modal.enter()
def load(self):
import torch
self.model = torch.load(f"{MODEL_PATH}/model.pt")
self.model.eval()
from diffusers import FluxPipeline
self.pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16,
cache_dir="/models",
).to("cuda")
@modal.method()
def infer(self, data):
import torch
with torch.no_grad():
return self.model(torch.tensor(data)).tolist()
def generate(self, prompt: str) -> bytes:
image = self.pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
import io
buf = io.BytesIO()
image.save(buf, format="PNG")
return buf.getvalue()
```
## Batch Processing
### Parallel Data Processing
## Speech Transcription (Whisper)
```python
@app.function(
image=modal.Image.debian_slim().uv_pip_install("pandas", "numpy"),
cpu=2.0,
memory=8192
)
def process_batch(batch_id: int):
import pandas as pd
import modal
# Load batch
df = pd.read_csv(f"s3://bucket/batch_{batch_id}.csv")
app = modal.App("transcription")
# Process
result = df.apply(lambda row: complex_calculation(row), axis=1)
# Save result
result.to_csv(f"s3://bucket/results_{batch_id}.csv")
return batch_id
@app.local_entrypoint()
def main():
# Process 100 batches in parallel
results = list(process_batch.map(range(100)))
print(f"Processed {len(results)} batches")
```
### Batch Processing with Progress
```python
@app.function()
def process_item(item_id: int):
# Expensive processing
result = compute_something(item_id)
return result
@app.local_entrypoint()
def main():
items = list(range(1000))
print(f"Processing {len(items)} items...")
results = []
for i, result in enumerate(process_item.map(items)):
results.append(result)
if (i + 1) % 100 == 0:
print(f"Completed {i + 1}/{len(items)}")
print("All items processed!")
```
## Data Analysis Pipeline
### ETL Pipeline
```python
volume = modal.Volume.from_name("data-pipeline")
DATA_PATH = "/data"
@app.function(
image=modal.Image.debian_slim().uv_pip_install("pandas", "polars"),
volumes={DATA_PATH: volume},
cpu=4.0,
memory=16384
)
def extract_transform_load():
import polars as pl
# Extract
raw_data = pl.read_csv(f"{DATA_PATH}/raw/*.csv")
# Transform
transformed = (
raw_data
.filter(pl.col("value") > 0)
.group_by("category")
.agg([
pl.col("value").mean().alias("avg_value"),
pl.col("value").sum().alias("total_value")
])
image = (
modal.Image.debian_slim(python_version="3.11")
.apt_install("ffmpeg")
.uv_pip_install("openai-whisper", "torch")
)
# Load
transformed.write_parquet(f"{DATA_PATH}/processed/data.parquet")
volume.commit()
return transformed.shape
@app.function(schedule=modal.Cron("0 2 * * *"))
def daily_pipeline():
result = extract_transform_load.remote()
print(f"Processed data shape: {result}")
```
## GPU-Accelerated Computing
### Distributed Training
```python
@app.function(
gpu="A100:2",
image=modal.Image.debian_slim().uv_pip_install("torch", "accelerate"),
timeout=7200,
)
def train_model():
import torch
from torch.nn.parallel import DataParallel
# Load data
train_loader = get_data_loader()
# Initialize model
model = MyModel()
model = DataParallel(model)
model = model.cuda()
# Train
optimizer = torch.optim.Adam(model.parameters())
for epoch in range(10):
for batch in train_loader:
loss = train_step(model, batch, optimizer)
print(f"Epoch {epoch}, Loss: {loss}")
return "Training complete"
```
### GPU Batch Inference
```python
@app.function(
gpu="L40S",
image=modal.Image.debian_slim().uv_pip_install("torch", "transformers")
)
def batch_inference(texts: list[str]):
from transformers import pipeline
classifier = pipeline("sentiment-analysis", device=0)
results = classifier(texts, batch_size=32)
return results
@app.local_entrypoint()
def main():
# Process 10,000 texts
texts = load_texts()
# Split into chunks of 100
chunks = [texts[i:i+100] for i in range(0, len(texts), 100)]
# Process in parallel on multiple GPUs
all_results = []
for results in batch_inference.map(chunks):
all_results.extend(results)
print(f"Processed {len(all_results)} texts")
```
## Scientific Computing
### Molecular Dynamics Simulation
```python
@app.function(
image=modal.Image.debian_slim().apt_install("openmpi-bin").uv_pip_install("mpi4py", "numpy"),
cpu=16.0,
memory=65536,
timeout=7200,
)
def run_simulation(config: dict):
import numpy as np
# Initialize system
positions = initialize_positions(config["n_particles"])
velocities = initialize_velocities(config["temperature"])
# Run MD steps
for step in range(config["n_steps"]):
forces = compute_forces(positions)
velocities += forces * config["dt"]
positions += velocities * config["dt"]
if step % 1000 == 0:
energy = compute_energy(positions, velocities)
print(f"Step {step}, Energy: {energy}")
return positions, velocities
```
### Distributed Monte Carlo
```python
@app.function(cpu=2.0)
def monte_carlo_trial(trial_id: int, n_samples: int):
import random
count = sum(1 for _ in range(n_samples)
if random.random()**2 + random.random()**2 <= 1)
return count
@app.local_entrypoint()
def estimate_pi():
n_trials = 100
n_samples_per_trial = 1_000_000
# Run trials in parallel
results = list(monte_carlo_trial.map(
range(n_trials),
[n_samples_per_trial] * n_trials
))
total_count = sum(results)
total_samples = n_trials * n_samples_per_trial
pi_estimate = 4 * total_count / total_samples
print(f"Estimated π = {pi_estimate}")
```
## Data Processing with Volumes
### Image Processing Pipeline
```python
volume = modal.Volume.from_name("images")
IMAGE_PATH = "/images"
@app.function(
image=modal.Image.debian_slim().uv_pip_install("Pillow", "numpy"),
volumes={IMAGE_PATH: volume}
)
def process_image(filename: str):
from PIL import Image
import numpy as np
# Load image
img = Image.open(f"{IMAGE_PATH}/raw/{filename}")
# Process
img_array = np.array(img)
processed = apply_filters(img_array)
# Save
result_img = Image.fromarray(processed)
result_img.save(f"{IMAGE_PATH}/processed/{filename}")
return filename
@app.function(volumes={IMAGE_PATH: volume})
def process_all_images():
import os
# Get all images
filenames = os.listdir(f"{IMAGE_PATH}/raw")
# Process in parallel
results = list(process_image.map(filenames))
volume.commit()
return f"Processed {len(results)} images"
```
## Web API for Scientific Computing
```python
image = modal.Image.debian_slim().uv_pip_install("fastapi[standard]", "numpy", "scipy")
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
def compute_statistics(data: dict):
import numpy as np
from scipy import stats
values = np.array(data["values"])
return {
"mean": float(np.mean(values)),
"median": float(np.median(values)),
"std": float(np.std(values)),
"skewness": float(stats.skew(values)),
"kurtosis": float(stats.kurtosis(values))
}
```
## Scheduled Data Collection
```python
@app.function(
schedule=modal.Cron("*/30 * * * *"), # Every 30 minutes
secrets=[modal.Secret.from_name("api-keys")],
volumes={"/data": modal.Volume.from_name("sensor-data")}
)
def collect_sensor_data():
import requests
import json
from datetime import datetime
# Fetch from API
response = requests.get(
"https://api.example.com/sensors",
headers={"Authorization": f"Bearer {os.environ['API_KEY']}"}
)
data = response.json()
# Save with timestamp
timestamp = datetime.now().isoformat()
with open(f"/data/{timestamp}.json", "w") as f:
json.dump(data, f)
volume.commit()
return f"Collected {len(data)} sensor readings"
```
## Best Practices
### Use Classes for Stateful Workloads
```python
@app.cls(gpu="A100")
class ModelService:
@app.cls(gpu="T4", image=image)
class Transcriber:
@modal.enter()
def setup(self):
# Load once, reuse across requests
self.model = load_heavy_model()
def load(self):
import whisper
self.model = whisper.load_model("large-v3")
@modal.method()
def predict(self, x):
return self.model(x)
def transcribe(self, audio_path: str) -> dict:
return self.model.transcribe(audio_path)
```
### Batch Similar Workloads
## Batch Data Processing
```python
@app.function()
def process_many(items: list):
# More efficient than processing one at a time
return [process(item) for item in items]
import modal
app = modal.App("batch-processor")
image = modal.Image.debian_slim().uv_pip_install("pandas", "pyarrow")
vol = modal.Volume.from_name("batch-data", create_if_missing=True)
@app.function(image=image, volumes={"/data": vol}, cpu=4.0, memory=8192)
def process_chunk(chunk_id: int) -> dict:
import pandas as pd
df = pd.read_parquet(f"/data/input/chunk_{chunk_id:04d}.parquet")
result = df.groupby("category").agg({"value": ["sum", "mean", "count"]})
result.to_parquet(f"/data/output/result_{chunk_id:04d}.parquet")
return {"chunk_id": chunk_id, "rows": len(df)}
@app.local_entrypoint()
def main():
chunk_ids = list(range(500))
results = list(process_chunk.map(chunk_ids))
total = sum(r["rows"] for r in results)
print(f"Processed {total} total rows across {len(results)} chunks")
```
### Use Volumes for Large Datasets
## Web Scraping at Scale
```python
# Store large datasets in volumes, not in image
volume = modal.Volume.from_name("dataset")
import modal
@app.function(volumes={"/data": volume})
def train():
data = load_from_volume("/data/training.parquet")
model = train_model(data)
app = modal.App("scraper")
image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
@app.function(image=image, retries=3, timeout=60)
def scrape_url(url: str) -> dict:
import httpx
from bs4 import BeautifulSoup
response = httpx.get(url, follow_redirects=True, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")
return {
"url": url,
"title": soup.title.string if soup.title else None,
"text": soup.get_text()[:5000],
}
@app.local_entrypoint()
def main():
urls = ["https://example.com", "https://example.org"] # Your URL list
results = list(scrape_url.map(urls))
for r in results:
print(f"{r['url']}: {r['title']}")
```
### Profile Before Scaling to GPUs
## Protein Structure Prediction
```python
# Test on CPU first
@app.function(cpu=4.0)
def test_pipeline():
...
import modal
# Then scale to GPU if needed
@app.function(gpu="A100")
def gpu_pipeline():
...
app = modal.App("protein-folding")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("chai-lab")
)
vol = modal.Volume.from_name("protein-data", create_if_missing=True)
@app.function(gpu="A100-80GB", image=image, volumes={"/data": vol}, timeout=3600)
def fold_protein(sequence: str) -> str:
from chai_lab.chai1 import run_inference
output = run_inference(
fasta_file=write_fasta(sequence, "/data/input.fasta"),
output_dir="/data/output/",
)
return str(output)
```
## Scheduled ETL Pipeline
```python
import modal
app = modal.App("etl")
image = modal.Image.debian_slim().uv_pip_install("pandas", "sqlalchemy", "psycopg2-binary")
@app.function(
image=image,
schedule=modal.Cron("0 3 * * *"), # 3 AM UTC daily
secrets=[modal.Secret.from_name("database-creds")],
timeout=7200,
)
def daily_etl():
import os
import pandas as pd
from sqlalchemy import create_engine
source = create_engine(os.environ["SOURCE_DB"])
dest = create_engine(os.environ["DEST_DB"])
df = pd.read_sql("SELECT * FROM events WHERE date = CURRENT_DATE - 1", source)
df = transform(df)
df.to_sql("daily_summary", dest, if_exists="append", index=False)
print(f"Loaded {len(df)} rows")
```
## FastAPI with GPU Model
```python
import modal
app = modal.App("api-with-gpu")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("fastapi", "sentence-transformers", "torch")
)
@app.cls(gpu="L40S", image=image, min_containers=1)
class EmbeddingService:
@modal.enter()
def load(self):
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
@modal.asgi_app()
def serve(self):
from fastapi import FastAPI
api = FastAPI()
@api.post("/embed")
async def embed(request: dict):
embeddings = self.model.encode(request["texts"])
return {"embeddings": embeddings.tolist()}
@api.get("/health")
async def health():
return {"status": "ok"}
return api
```
## Document OCR Job Queue
```python
import modal
app = modal.App("ocr-queue")
image = modal.Image.debian_slim().uv_pip_install("pytesseract", "Pillow").apt_install("tesseract-ocr")
vol = modal.Volume.from_name("ocr-data", create_if_missing=True)
@app.function(image=image, volumes={"/data": vol})
def ocr_page(image_path: str) -> str:
import pytesseract
from PIL import Image
img = Image.open(image_path)
return pytesseract.image_to_string(img)
@app.function(volumes={"/data": vol})
def process_document(doc_id: str):
import os
pages = sorted(os.listdir(f"/data/docs/{doc_id}/"))
paths = [f"/data/docs/{doc_id}/{p}" for p in pages]
texts = list(ocr_page.map(paths))
full_text = "\n\n".join(texts)
with open(f"/data/results/{doc_id}.txt", "w") as f:
f.write(full_text)
return {"doc_id": doc_id, "pages": len(texts)}
```

View File

@@ -1,274 +1,260 @@
# Modal Functions
# Modal Functions and Classes
## Basic Function Definition
## Table of Contents
Decorate Python functions with `@app.function()`:
- [Functions](#functions)
- [Remote Execution](#remote-execution)
- [Classes with Lifecycle Hooks](#classes-with-lifecycle-hooks)
- [Parallel Execution](#parallel-execution)
- [Async Functions](#async-functions)
- [Local Entrypoints](#local-entrypoints)
- [Generators](#generators)
## Functions
### Basic Function
```python
import modal
app = modal.App(name="my-app")
app = modal.App("my-app")
@app.function()
def my_function():
print("Hello from Modal!")
return "result"
def compute(x: int, y: int) -> int:
return x + y
```
## Calling Functions
### Function Parameters
### Remote Execution
The `@app.function()` decorator accepts:
Call `.remote()` to run on Modal:
| Parameter | Type | Description |
|-----------|------|-------------|
| `image` | `Image` | Container image |
| `gpu` | `str` | GPU type (e.g., `"H100"`, `"A100:2"`) |
| `cpu` | `float` | CPU cores |
| `memory` | `int` | Memory in MiB |
| `timeout` | `int` | Max execution time in seconds |
| `secrets` | `list[Secret]` | Secrets to inject |
| `volumes` | `dict[str, Volume]` | Volumes to mount |
| `schedule` | `Schedule` | Cron or periodic schedule |
| `max_containers` | `int` | Max container count |
| `min_containers` | `int` | Minimum warm containers |
| `retries` | `int` | Retry count on failure |
| `concurrency_limit` | `int` | Max concurrent inputs |
| `ephemeral_disk` | `int` | Disk in MiB |
## Remote Execution
### `.remote()` — Synchronous Call
```python
@app.local_entrypoint()
def main():
result = my_function.remote()
print(result)
result = compute.remote(3, 4) # Runs in the cloud, blocks until done
```
### Local Execution
Call `.local()` to run locally (useful for testing):
### `.local()` — Local Execution
```python
result = my_function.local()
result = compute.local(3, 4) # Runs locally (for testing)
```
## Function Parameters
Functions accept standard Python arguments:
### `.spawn()` — Async Fire-and-Forget
```python
@app.function()
def process(x: int, y: str):
return f"{y}: {x * 2}"
@app.local_entrypoint()
def main():
result = process.remote(42, "answer")
call = compute.spawn(3, 4) # Returns immediately
# ... do other work ...
result = call.get() # Retrieve result later
```
## Deployment
`.spawn()` supports up to 1 million pending inputs.
### Ephemeral Apps
## Classes with Lifecycle Hooks
Run temporarily:
```bash
modal run script.py
```
### Deployed Apps
Deploy persistently:
```bash
modal deploy script.py
```
Access deployed functions from other code:
Use `@app.cls()` for stateful workloads where you want to load resources once:
```python
f = modal.Function.from_name("my-app", "my_function")
result = f.remote(args)
@app.cls(gpu="L40S", image=image)
class Model:
@modal.enter()
def setup(self):
"""Runs once when the container starts."""
import torch
self.model = torch.load("/weights/model.pt")
self.model.eval()
@modal.method()
def predict(self, text: str) -> dict:
"""Callable remotely."""
return self.model(text)
@modal.exit()
def teardown(self):
"""Runs when the container shuts down."""
cleanup_resources()
```
## Entrypoints
### Lifecycle Decorators
### Local Entrypoint
| Decorator | When It Runs |
|-----------|-------------|
| `@modal.enter()` | Once on container startup, before any inputs |
| `@modal.method()` | For each remote call |
| `@modal.exit()` | On container shutdown |
Code that runs on local machine:
### Calling Class Methods
```python
@app.local_entrypoint()
def main():
result = my_function.remote()
print(result)
# Create instance and call method
model = Model()
result = model.predict.remote("Hello world")
# Parallel calls
results = list(model.predict.map(["text1", "text2", "text3"]))
```
### Remote Entrypoint
Use `@app.function()` without local_entrypoint - runs entirely on Modal:
### Parameterized Classes
```python
@app.function()
def train_model():
# All code runs in Modal
...
```
@app.cls()
class Worker:
model_name: str = modal.parameter()
Invoke with:
```bash
modal run script.py::app.train_model
```
@modal.enter()
def load(self):
self.model = load_model(self.model_name)
## Argument Parsing
@modal.method()
def run(self, data):
return self.model(data)
Entrypoints with primitive type arguments get automatic CLI parsing:
```python
@app.local_entrypoint()
def main(foo: int, bar: str):
some_function.remote(foo, bar)
```
Run with:
```bash
modal run script.py --foo 1 --bar "hello"
```
For custom parsing, accept variable-length arguments:
```python
import argparse
@app.function()
def train(*arglist):
parser = argparse.ArgumentParser()
parser.add_argument("--foo", type=int)
args = parser.parse_args(args=arglist)
```
## Function Configuration
Common parameters:
```python
@app.function(
image=my_image, # Custom environment
gpu="A100", # GPU type
cpu=2.0, # CPU cores
memory=4096, # Memory in MB
timeout=3600, # Timeout in seconds
retries=3, # Number of retries
secrets=[my_secret], # Environment secrets
volumes={"/data": vol}, # Persistent storage
)
def my_function():
...
# Different model instances autoscale independently
gpt = Worker(model_name="gpt-4")
llama = Worker(model_name="llama-3")
```
## Parallel Execution
### Map
### `.map()` — Parallel Processing
Run function on multiple inputs in parallel:
Process multiple inputs across containers:
```python
@app.function()
def evaluate_model(x):
return x ** 2
def process(item):
return heavy_computation(item)
@app.local_entrypoint()
def main():
inputs = list(range(100))
for result in evaluate_model.map(inputs):
print(result)
items = list(range(1000))
results = list(process.map(items))
print(f"Processed {len(results)} items")
```
### Starmap
- Results are returned in the same order as inputs
- Modal autoscales containers to handle the workload
- Use `return_exceptions=True` to collect errors instead of raising
For functions with multiple arguments:
### `.starmap()` — Multi-Argument Parallel
```python
@app.function()
def add(a, b):
return a + b
def add(x, y):
return x + y
@app.local_entrypoint()
def main():
results = list(add.starmap([(1, 2), (3, 4)]))
# [3, 7]
results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
# [3, 7, 11]
```
### Exception Handling
### `.map()` with `order_outputs=False`
For faster throughput when order doesn't matter:
```python
results = my_func.map(
range(3),
return_exceptions=True,
wrap_returned_exceptions=False
)
# [0, 1, Exception('error')]
for result in process.map(items, order_outputs=False):
handle(result) # Results arrive as they complete
```
## Async Functions
Define async functions:
Modal supports async/await natively:
```python
@app.function()
async def async_function(x: int):
await asyncio.sleep(1)
return x * 2
@app.local_entrypoint()
async def main():
result = await async_function.remote.aio(42)
async def fetch_data(url: str) -> str:
import httpx
async with httpx.AsyncClient() as client:
response = await client.get(url)
return response.text
```
## Generator Functions
Async functions are especially useful with `@modal.concurrent()` for handling multiple requests per container.
Return iterators for streaming results:
## Local Entrypoints
The `@app.local_entrypoint()` runs on your machine and orchestrates remote calls:
```python
@app.local_entrypoint()
def main():
# This code runs locally
data = load_local_data()
# These calls run in the cloud
results = list(process.map(data))
# Back to local
save_results(results)
```
You can also define multiple entrypoints and select by function name:
```bash
modal run script.py::train
modal run script.py::evaluate
```
## Generators
Functions can yield results as they're produced:
```python
@app.function()
def generate_data():
for i in range(10):
yield i
for i in range(100):
yield process(i)
@app.local_entrypoint()
def main():
for value in generate_data.remote_gen():
print(value)
for result in generate_data.remote_gen():
print(result)
```
## Spawning Functions
## Retries
Submit functions for background execution:
Configure automatic retries on failure:
```python
@app.function()
def process_job(data):
# Long-running job
return result
@app.local_entrypoint()
def main():
# Spawn without waiting
call = process_job.spawn(data)
# Get result later
result = call.get(timeout=60)
@app.function(retries=3)
def flaky_operation():
...
```
## Programmatic Execution
Run apps programmatically:
For more control, use `modal.Retries`:
```python
def main():
with modal.enable_output():
with app.run():
result = some_function.remote()
@app.function(retries=modal.Retries(max_retries=3, backoff_coefficient=2.0))
def api_call():
...
```
## Specifying Entrypoint
## Timeouts
With multiple functions, specify which to run:
Set maximum execution time:
```python
@app.function()
def f():
print("Function f")
@app.function()
def g():
print("Function g")
@app.function(timeout=3600) # 1 hour
def long_training():
...
```
Run specific function:
```bash
modal run script.py::app.f
modal run script.py::app.g
```
Default timeout is 300 seconds (5 minutes). Maximum is 86400 seconds (24 hours).

View File

@@ -1,92 +1,175 @@
# Getting Started with Modal
# Modal Getting Started Guide
## Sign Up
## Installation
Sign up for free at https://modal.com and get $30/month of credits.
Install Modal using uv (recommended) or pip:
```bash
# Recommended
uv pip install modal
# Alternative
pip install modal
```
## Authentication
Set up authentication using the Modal CLI:
### Interactive Setup
```bash
modal token new
modal setup
```
This creates credentials in `~/.modal.toml`. Alternatively, set environment variables:
- `MODAL_TOKEN_ID`
- `MODAL_TOKEN_SECRET`
This opens a browser for authentication and stores credentials locally.
## Basic Concepts
### Headless / CI/CD Setup
### Modal is Serverless
For environments without a browser, use token-based authentication:
Modal is a serverless platform - only pay for resources used and spin up containers on demand in seconds.
1. Generate tokens at https://modal.com/settings
2. Set environment variables:
### Core Components
```bash
export MODAL_TOKEN_ID=<your-token-id>
export MODAL_TOKEN_SECRET=<your-token-secret>
```
**App**: Represents an application running on Modal, grouping one or more Functions for atomic deployment.
Or use the CLI:
**Function**: Acts as an independent unit that scales up and down independently. No containers run (and no charges) when there are no live inputs.
```bash
modal token set --token-id <id> --token-secret <secret>
```
**Image**: The environment code runs in - a container snapshot with dependencies installed.
### Free Tier
## First Modal App
Modal provides $30/month in free credits. No credit card required for the free tier.
Create a file `hello_modal.py`:
## Your First App
### Hello World
Create a file `hello.py`:
```python
import modal
app = modal.App(name="hello-modal")
app = modal.App("hello-world")
@app.function()
def hello():
print("Hello from Modal!")
return "success"
def greet(name: str) -> str:
return f"Hello, {name}! This ran in the cloud."
@app.local_entrypoint()
def main():
hello.remote()
result = greet.remote("World")
print(result)
```
Run with:
Run it:
```bash
modal run hello_modal.py
modal run hello.py
```
## Running Apps
What happens:
1. Modal packages your code
2. Creates a container in the cloud
3. Executes `greet()` remotely
4. Returns the result to your local machine
### Ephemeral Apps (Development)
### Understanding the Flow
Run temporarily with `modal run`:
```bash
modal run script.py
- `modal.App("name")` — Creates a named application
- `@app.function()` — Marks a function for remote execution
- `@app.local_entrypoint()` — Defines the local entry point (runs on your machine)
- `.remote()` — Calls the function in the cloud
- `.local()` — Calls the function locally (for testing)
### Running Modes
| Command | Description |
|---------|-------------|
| `modal run script.py` | Run the `@app.local_entrypoint()` function |
| `modal serve script.py` | Start a dev server with hot reload (for web endpoints) |
| `modal deploy script.py` | Deploy to production (persistent) |
### A Simple Web Scraper
```python
import modal
app = modal.App("web-scraper")
image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
@app.function(image=image)
def scrape(url: str) -> str:
import httpx
from bs4 import BeautifulSoup
response = httpx.get(url)
soup = BeautifulSoup(response.text, "html.parser")
return soup.get_text()[:1000]
@app.local_entrypoint()
def main():
result = scrape.remote("https://example.com")
print(result)
```
The app stops when the script exits. Use `--detach` to keep running after client exits.
### GPU-Accelerated Inference
### Deployed Apps (Production)
```python
import modal
Deploy persistently with `modal deploy`:
```bash
modal deploy script.py
app = modal.App("gpu-inference")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("torch", "transformers", "accelerate")
)
@app.function(gpu="L40S", image=image)
def generate(prompt: str) -> str:
from transformers import pipeline
pipe = pipeline("text-generation", model="gpt2", device="cuda")
result = pipe(prompt, max_length=100)
return result[0]["generated_text"]
@app.local_entrypoint()
def main():
print(generate.remote("The future of AI is"))
```
View deployed apps at https://modal.com/apps or with:
```bash
modal app list
## Project Structure
Modal apps are typically single Python files, but can be organized into modules:
```
my-project/
├── app.py # Main app with @app.local_entrypoint()
├── inference.py # Inference functions
├── training.py # Training functions
└── common.py # Shared utilities
```
Stop deployed apps:
```bash
modal app stop app-name
```
Use `modal.Image.add_local_python_source()` to include local modules in the container image.
## Key Features
## Key Concepts Summary
- **Fast prototyping**: Write Python, run on GPUs in seconds
- **Serverless APIs**: Create web endpoints with a decorator
- **Scheduled jobs**: Run cron jobs in the cloud
- **GPU inference**: Access T4, L4, A10, A100, H100, H200, B200 GPUs
- **Distributed volumes**: Persistent storage for ML models
- **Sandboxes**: Secure containers for untrusted code
| Concept | What It Does |
|---------|-------------|
| `App` | Groups related functions into a deployable unit |
| `Function` | A serverless function backed by autoscaling containers |
| `Image` | Defines the container environment (packages, files) |
| `Volume` | Persistent distributed file storage |
| `Secret` | Secure credential injection |
| `Schedule` | Cron or periodic job scheduling |
| `gpu` | GPU type/count for the function |
## Next Steps
- See `functions.md` for advanced function patterns
- See `images.md` for custom container environments
- See `gpu.md` for GPU selection and configuration
- See `web-endpoints.md` for serving APIs

View File

@@ -1,168 +1,174 @@
# GPU Acceleration on Modal
# Modal GPU Compute
## Quick Start
## Table of Contents
Run functions on GPUs with the `gpu` parameter:
- [Available GPUs](#available-gpus)
- [Requesting GPUs](#requesting-gpus)
- [GPU Selection Guide](#gpu-selection-guide)
- [Multi-GPU](#multi-gpu)
- [GPU Fallback Chains](#gpu-fallback-chains)
- [Auto-Upgrades](#auto-upgrades)
- [Multi-GPU Training](#multi-gpu-training)
```python
import modal
## Available GPUs
image = modal.Image.debian_slim().pip_install("torch")
app = modal.App(image=image)
| GPU | VRAM | Max per Container | Best For |
|-----|------|-------------------|----------|
| T4 | 16 GB | 8 | Budget inference, small models |
| L4 | 24 GB | 8 | Inference, video processing |
| A10 | 24 GB | 4 | Inference, fine-tuning small models |
| L40S | 48 GB | 8 | Inference (best cost/perf), medium models |
| A100-40GB | 40 GB | 8 | Training, large model inference |
| A100-80GB | 80 GB | 8 | Training, large models |
| RTX-PRO-6000 | 48 GB | 8 | Rendering, inference |
| H100 | 80 GB | 8 | Large-scale training, fast inference |
| H200 | 141 GB | 8 | Very large models, training |
| B200 | 192 GB | 8 | Largest models, maximum throughput |
| B200+ | 192 GB | 8 | B200 or B300, B200 pricing |
@app.function(gpu="A100")
def run():
import torch
assert torch.cuda.is_available()
```
## Requesting GPUs
## Available GPU Types
Modal supports the following GPUs:
- `T4` - Entry-level GPU
- `L4` - Balanced performance and cost
- `A10` - Up to 4 GPUs, 96 GB total
- `A100` - 40GB or 80GB variants
- `A100-40GB` - Specific 40GB variant
- `A100-80GB` - Specific 80GB variant
- `L40S` - 48 GB, excellent for inference
- `H100` / `H100!` - Top-tier Hopper architecture
- `H200` - Improved Hopper with more memory
- `B200` - Latest Blackwell architecture
See https://modal.com/pricing for pricing.
## GPU Count
Request multiple GPUs per container with `:n` syntax:
```python
@app.function(gpu="H100:8")
def run_llama_405b():
# 8 H100 GPUs available
...
```
Supported counts:
- B200, H200, H100, A100, L4, T4, L40S: up to 8 GPUs (up to 1,536 GB)
- A10: up to 4 GPUs (up to 96 GB)
Note: Requesting >2 GPUs may result in longer wait times.
## GPU Selection Guide
**For Inference (Recommended)**: Start with L40S
- Excellent cost/performance
- 48 GB memory
- Good for LLaMA, Stable Diffusion, etc.
**For Training**: Consider H100 or A100
- High compute throughput
- Large memory for batch processing
**For Memory-Bound Tasks**: H200 or A100-80GB
- More memory capacity
- Better for large models
## B200 GPUs
NVIDIA's flagship Blackwell chip:
```python
@app.function(gpu="B200:8")
def run_deepseek():
# Most powerful option
...
```
## H200 and H100 GPUs
Hopper architecture GPUs with excellent software support:
### Basic Request
```python
@app.function(gpu="H100")
def train():
...
import torch
assert torch.cuda.is_available()
print(f"Using: {torch.cuda.get_device_name(0)}")
```
### Automatic H200 Upgrades
Modal may upgrade `gpu="H100"` to H200 at no extra cost. H200 provides:
- 141 GB memory (vs 80 GB for H100)
- 4.8 TB/s bandwidth (vs 3.35 TB/s)
To avoid automatic upgrades (e.g., for benchmarking):
```python
@app.function(gpu="H100!")
def benchmark():
...
```
## A100 GPUs
Ampere architecture with 40GB or 80GB variants:
### String Shorthand
```python
# May be automatically upgraded to 80GB
@app.function(gpu="A100")
def qwen_7b():
...
# Specific variants
@app.function(gpu="A100-40GB")
def model_40gb():
...
@app.function(gpu="A100-80GB")
def llama_70b():
...
gpu="T4" # Single T4
gpu="A100-80GB" # Single A100 80GB
gpu="H100:4" # Four H100s
```
## GPU Fallbacks
Specify multiple GPU types with fallback:
### GPU Object (Advanced)
```python
@app.function(gpu=["H100", "A100-40GB:2"])
def run_on_80gb():
# Tries H100 first, falls back to 2x A100-40GB
@app.function(gpu=modal.gpu.H100(count=2))
def multi_gpu():
...
```
Modal respects ordering and allocates most preferred available GPU.
## GPU Selection Guide
### For Inference
| Model Size | Recommended GPU | Why |
|-----------|----------------|-----|
| < 7B params | T4, L4 | Cost-effective, sufficient VRAM |
| 7B-13B params | L40S | Best cost/performance, 48 GB VRAM |
| 13B-70B params | A100-80GB, H100 | Large VRAM, fast memory bandwidth |
| 70B+ params | H100:2+, H200, B200 | Multi-GPU or very large VRAM |
### For Training
| Task | Recommended GPU |
|------|----------------|
| Fine-tuning (LoRA) | L40S, A100-40GB |
| Full fine-tuning small models | A100-80GB |
| Full fine-tuning large models | H100:4+, H200 |
| Pre-training | H100:8, B200:8 |
### General Recommendation
L40S is the best default for inference workloads — it offers an excellent trade-off of cost and performance with 48 GB of GPU RAM.
## Multi-GPU
Request multiple GPUs by appending `:count`:
```python
@app.function(gpu="H100:4")
def distributed():
import torch
print(f"GPUs available: {torch.cuda.device_count()}")
# All 4 GPUs are on the same physical machine
```
- Up to 8 GPUs for most types (up to 4 for A10)
- All GPUs attach to the same physical machine
- Requesting more than 2 GPUs may result in longer wait times
- Maximum VRAM: 8 x B200 = 1,536 GB
## GPU Fallback Chains
Specify a prioritized list of GPU types:
```python
@app.function(gpu=["H100", "A100-80GB", "L40S"])
def flexible():
# Modal tries H100 first, then A100-80GB, then L40S
...
```
Useful for reducing queue times when a specific GPU isn't available.
## Auto-Upgrades
### H100 → H200
Modal may automatically upgrade H100 requests to H200 at no extra cost. To prevent this:
```python
@app.function(gpu="H100!") # Exclamation mark prevents auto-upgrade
def must_use_h100():
...
```
### A100 → A100-80GB
A100-40GB requests may be upgraded to 80GB at no extra cost.
### B200+
`gpu="B200+"` allows Modal to run on B200 or B300 GPUs at B200 pricing. Requires CUDA 13.0+.
## Multi-GPU Training
Modal supports multi-GPU training on a single node. Multi-node training is in closed beta.
Modal supports multi-GPU training on a single node. Multi-node training is in private beta.
### PyTorch Example
For frameworks that re-execute entrypoints, use subprocess or specific strategies:
### PyTorch DDP Example
```python
@app.function(gpu="A100:2")
def train():
import subprocess
import sys
subprocess.run(
["python", "train.py"],
stdout=sys.stdout,
stderr=sys.stderr,
check=True,
)
@app.function(gpu="H100:4", image=image, timeout=86400)
def train_distributed():
import torch
import torch.distributed as dist
dist.init_process_group(backend="nccl")
local_rank = int(os.environ.get("LOCAL_RANK", 0))
device = torch.device(f"cuda:{local_rank}")
# ... training loop with DDP ...
```
For PyTorch Lightning, set strategy to `ddp_spawn` or `ddp_notebook`.
### PyTorch Lightning
## Performance Considerations
When using frameworks that re-execute Python entrypoints (like PyTorch Lightning), either:
**Memory-Bound vs Compute-Bound**:
- Running models with small batch sizes is memory-bound
- Newer GPUs have faster arithmetic than memory access
- Speedup from newer hardware may not justify cost for memory-bound workloads
1. Set strategy to `ddp_spawn` or `ddp_notebook`
2. Or run training as a subprocess
**Optimization**:
- Use batching when possible
- Consider L40S before jumping to H100/B200
- Profile to identify bottlenecks
```python
@app.function(gpu="H100:4", image=image)
def train():
import subprocess
subprocess.run(["python", "train_script.py"], check=True)
```
### Hugging Face Accelerate
```python
@app.function(gpu="A100-80GB:4", image=image)
def finetune():
import subprocess
subprocess.run([
"accelerate", "launch",
"--num_processes", "4",
"train.py"
], check=True)
```

View File

@@ -1,261 +1,259 @@
# Modal Images
# Modal Container Images
## Table of Contents
- [Overview](#overview)
- [Base Images](#base-images)
- [Installing Packages](#installing-packages)
- [System Packages](#system-packages)
- [Shell Commands](#shell-commands)
- [Running Python During Build](#running-python-during-build)
- [Adding Local Files](#adding-local-files)
- [Environment Variables](#environment-variables)
- [Dockerfiles](#dockerfiles)
- [Alternative Package Managers](#alternative-package-managers)
- [Image Caching](#image-caching)
- [Handling Remote-Only Imports](#handling-remote-only-imports)
## Overview
Modal Images define the environment code runs in - containers with dependencies installed. Images are built from method chains starting from a base image.
Every Modal function runs inside a container built from an `Image`. By default, Modal uses a Debian Linux image with the same Python minor version as your local interpreter.
Images are built lazily — Modal only builds/pulls the image when a function using it is first invoked. Layers are cached for fast rebuilds.
## Base Images
Start with a base image and chain methods:
```python
# Default: Debian slim with your local Python version
image = modal.Image.debian_slim()
# Specific Python version
image = modal.Image.debian_slim(python_version="3.11")
# From Docker Hub
image = modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04")
# From a Dockerfile
image = modal.Image.from_dockerfile("./Dockerfile")
```
## Installing Packages
### uv (Recommended)
`uv_pip_install` uses the uv package manager for fast, reliable installs:
```python
image = (
modal.Image.debian_slim(python_version="3.13")
.apt_install("git")
.uv_pip_install("torch<3")
.env({"HALT_AND_CATCH_FIRE": "0"})
.run_commands("git clone https://github.com/modal-labs/agi")
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install(
"torch==2.8.0",
"transformers>=4.40",
"accelerate",
"scipy",
)
)
```
Available base images:
- `Image.debian_slim()` - Debian Linux with Python
- `Image.micromamba()` - Base with Micromamba package manager
- `Image.from_registry()` - Pull from Docker Hub, ECR, etc.
- `Image.from_dockerfile()` - Build from existing Dockerfile
Pin versions for reproducibility. uv resolves dependencies faster than pip.
## Installing Python Packages
### pip (Fallback)
### With uv (Recommended)
```python
image = modal.Image.debian_slim().pip_install(
"numpy==1.26.0",
"pandas==2.1.0",
)
```
Use `.uv_pip_install()` for fast package installation:
### From requirements.txt
```python
image = modal.Image.debian_slim().pip_install_from_requirements("requirements.txt")
```
### Private Packages
```python
image = (
modal.Image.debian_slim()
.uv_pip_install("pandas==2.2.0", "numpy")
.pip_install_private_repos(
"github.com/org/private-repo",
git_user="username",
secrets=[modal.Secret.from_name("github-token")],
)
)
```
### With pip
## System Packages
Fallback to standard pip if needed:
```python
image = (
modal.Image.debian_slim(python_version="3.13")
.pip_install("pandas==2.2.0", "numpy")
)
```
Pin dependencies tightly (e.g., `"torch==2.8.0"`) for reproducibility.
## Installing System Packages
Install Linux packages with apt:
```python
image = modal.Image.debian_slim().apt_install("git", "curl")
```
## Setting Environment Variables
Pass a dictionary to `.env()`:
```python
image = modal.Image.debian_slim().env({"PORT": "6443"})
```
## Running Shell Commands
Execute commands during image build:
Install Linux packages via apt:
```python
image = (
modal.Image.debian_slim()
.apt_install("git")
.run_commands("git clone https://github.com/modal-labs/gpu-glossary")
.apt_install("ffmpeg", "libsndfile1", "git", "curl")
.uv_pip_install("librosa", "soundfile")
)
```
## Running Python Functions at Build Time
## Shell Commands
Download model weights or perform setup:
Run arbitrary commands during image build:
```python
def download_models():
import diffusers
model_name = "segmind/small-sd"
pipe = diffusers.StableDiffusionPipeline.from_pretrained(model_name)
hf_cache = modal.Volume.from_name("hf-cache")
image = (
modal.Image.debian_slim()
.pip_install("diffusers[torch]", "transformers")
.run_function(
download_models,
secrets=[modal.Secret.from_name("huggingface-secret")],
volumes={"/root/.cache/huggingface": hf_cache},
.run_commands(
"wget https://example.com/data.tar.gz",
"tar -xzf data.tar.gz -C /opt/data",
"rm data.tar.gz",
)
)
```
### With GPU
Some build steps require GPU access (e.g., compiling CUDA kernels):
```python
image = (
modal.Image.debian_slim()
.uv_pip_install("torch")
.run_commands("python -c 'import torch; torch.cuda.is_available()'", gpu="A100")
)
```
## Running Python During Build
Execute Python functions as build steps — useful for downloading model weights:
```python
def download_model():
from huggingface_hub import snapshot_download
snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("huggingface_hub", "torch", "transformers")
.run_function(download_model, secrets=[modal.Secret.from_name("huggingface")])
)
```
The resulting filesystem (including downloaded files) is snapshotted into the image.
## Adding Local Files
### Add Files or Directories
### Local Directories
```python
image = modal.Image.debian_slim().add_local_dir(
"/user/erikbern/.aws",
remote_path="/root/.aws"
local_path="./config",
remote_path="/root/config",
)
```
By default, files are added at container startup. Use `copy=True` to include in built image.
By default, files are added at container startup (not baked into the image layer). Use `copy=True` to bake them in.
### Add Python Source
Add importable Python modules:
### Local Python Modules
```python
image = modal.Image.debian_slim().add_local_python_source("local_module")
@app.function(image=image)
def f():
import local_module
local_module.do_stuff()
image = modal.Image.debian_slim().add_local_python_source("my_module")
```
## Using Existing Container Images
This uses Python's import system to find and include the module.
### From Public Registry
### Individual Files
```python
sklearn_image = modal.Image.from_registry("huanjason/scikit-learn")
@app.function(image=sklearn_image)
def fit_knn():
from sklearn.neighbors import KNeighborsClassifier
...
```
Can pull from Docker Hub, Nvidia NGC, AWS ECR, GitHub ghcr.io.
### From Private Registry
Use Modal Secrets for authentication:
**Docker Hub**:
```python
secret = modal.Secret.from_name("my-docker-secret")
image = modal.Image.from_registry(
"private-repo/image:tag",
secret=secret
image = modal.Image.debian_slim().add_local_file(
local_path="./model_config.json",
remote_path="/root/config.json",
)
```
**AWS ECR**:
```python
aws_secret = modal.Secret.from_name("my-aws-secret")
image = modal.Image.from_aws_ecr(
"000000000000.dkr.ecr.us-east-1.amazonaws.com/my-private-registry:latest",
secret=aws_secret,
)
```
### From Dockerfile
```python
image = modal.Image.from_dockerfile("Dockerfile")
@app.function(image=image)
def fit():
import sklearn
...
```
Can still extend with other image methods after importing.
## Using Micromamba
For coordinated installation of Python and system packages:
```python
numpyro_pymc_image = (
modal.Image.micromamba()
.micromamba_install("pymc==5.10.4", "numpyro==0.13.2", channels=["conda-forge"])
)
```
## GPU Support at Build Time
Run build steps on GPU instances:
## Environment Variables
```python
image = (
modal.Image.debian_slim()
.pip_install("bitsandbytes", gpu="H100")
.env({
"TRANSFORMERS_CACHE": "/cache",
"TOKENIZERS_PARALLELISM": "false",
"HF_HOME": "/cache/huggingface",
})
)
```
Names and values must be strings.
## Dockerfiles
Build from existing Dockerfiles:
```python
image = modal.Image.from_dockerfile("./Dockerfile")
# With build context
image = modal.Image.from_dockerfile("./Dockerfile", context_mount=modal.Mount.from_local_dir("."))
```
## Alternative Package Managers
### Micromamba / Conda
For packages requiring coordinated system and Python package installs:
```python
image = (
modal.Image.micromamba(python_version="3.11")
.micromamba_install("cudatoolkit=11.8", "cudnn=8.6", channels=["conda-forge"])
.uv_pip_install("torch")
)
```
## Image Caching
Images are cached per layer. Breaking cache on one layer causes cascading rebuilds for subsequent layers.
Modal caches images per layer (per method call). Breaking the cache on one layer cascades to all subsequent layers.
Define frequently-changing layers last to maximize cache reuse.
### Optimization Tips
1. **Order layers by change frequency**: Put stable dependencies first, frequently changing code last
2. **Pin versions**: Unpinned versions may resolve differently and break cache
3. **Separate large installs**: Put heavy packages (torch, tensorflow) in early layers
### Force Rebuild
```python
image = (
modal.Image.debian_slim()
.apt_install("git")
.pip_install("slack-sdk", force_build=True)
)
# Single layer
image = modal.Image.debian_slim().apt_install("git", force_build=True)
```
Or set environment variable:
```bash
MODAL_FORCE_BUILD=1 modal run ...
# All images in a run
MODAL_FORCE_BUILD=1 modal run script.py
# Rebuild without updating cache
MODAL_IGNORE_CACHE=1 modal run script.py
```
## Handling Different Local/Remote Packages
## Handling Remote-Only Imports
Import packages only available remotely inside function bodies:
When packages are only available in the container (not locally), use conditional imports:
```python
@app.function(image=image)
def my_function():
import pandas as pd # Only imported remotely
df = pd.DataFrame()
...
def process():
import torch # Only available in the container
return torch.cuda.device_count()
```
Or use the imports context manager:
For module-level imports shared across functions, use the `Image.imports()` context manager:
```python
pandas_image = modal.Image.debian_slim().pip_install("pandas")
with pandas_image.imports():
import pandas as pd
@app.function(image=pandas_image)
def my_function():
df = pd.DataFrame()
with image.imports():
import torch
import transformers
```
## Fast Pull from Registry with eStargz
Improve pull performance with eStargz compression:
```bash
docker buildx build --tag "<registry>/<namespace>/<repo>:<version>" \
--output type=registry,compression=estargz,force-compression=true,oci-mediatypes=true \
.
```
Supported registries:
- AWS ECR
- Docker Hub
- Google Artifact Registry
This prevents `ImportError` locally while making the imports available in the container.

View File

@@ -1,129 +1,117 @@
# CPU, Memory, and Disk Resources
# Modal Resource Configuration
## Default Resources
## CPU
Each Modal container has default reservations:
- **CPU**: 0.125 cores
- **Memory**: 128 MiB
Containers can exceed minimum if worker has available resources.
## CPU Cores
Request CPU cores as floating-point number:
### Requesting CPU
```python
@app.function(cpu=8.0)
def my_function():
# Guaranteed access to at least 8 physical cores
@app.function(cpu=4.0)
def compute():
...
```
Values correspond to physical cores, not vCPUs.
Modal sets multi-threading environment variables based on CPU reservation:
- `OPENBLAS_NUM_THREADS`
- `OMP_NUM_THREADS`
- `MKL_NUM_THREADS`
## Memory
Request memory in megabytes (integer):
```python
@app.function(memory=32768)
def my_function():
# Guaranteed access to at least 32 GiB RAM
...
```
## Resource Limits
- Values are **physical cores**, not vCPUs
- Default: 0.125 cores
- Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request
### CPU Limits
Default soft CPU limit: request + 16 cores
- Default request: 0.125 cores → default limit: 16.125 cores
- Above limit, host throttles CPU usage
Set explicit CPU limit:
- Default soft limit: 16 physical cores above the CPU request
- Set explicit limits to prevent noisy-neighbor effects:
```python
cpu_request = 1.0
cpu_limit = 4.0
@app.function(cpu=(cpu_request, cpu_limit))
def f():
@app.function(cpu=4.0) # Request 4 cores
def bounded_compute():
...
```
## Memory
### Requesting Memory
```python
@app.function(memory=16384) # 16 GiB in MiB
def large_data():
...
```
- Value in **MiB** (megabytes)
- Default: 128 MiB
### Memory Limits
Set hard memory limit to OOM kill containers at threshold:
Set hard memory limits to OOM-kill containers that exceed them:
```python
mem_request = 1024 # MB
mem_limit = 2048 # MB
@app.function(memory=(mem_request, mem_limit))
def f():
# Container killed if exceeds 2048 MB
@app.function(memory=8192) # 8 GiB request and limit
def bounded_memory():
...
```
Useful for catching memory leaks early.
This prevents paying for runaway memory leaks.
### Disk Limits
## Ephemeral Disk
Running containers have access to many GBs of SSD disk, limited by:
1. Underlying worker's SSD capacity
2. Per-container disk quota (100s of GBs)
Hitting limits causes `OSError` on disk writes.
Request larger disk with `ephemeral_disk`:
For temporary storage within a container's lifetime:
```python
@app.function(ephemeral_disk=10240) # 10 GiB
def process_large_files():
@app.function(ephemeral_disk=102400) # 100 GiB in MiB
def process_dataset():
# Temporary files at /tmp or anywhere in the container filesystem
...
```
Maximum disk size: 3.0 TiB (3,145,728 MiB)
Intended use: dataset processing
- Value in **MiB**
- Default: 512 GiB quota per container
- Maximum: 3,145,728 MiB (3 TiB)
- Data is lost when the container shuts down
- Use Volumes for persistent storage
Larger disk requests increase the memory request at a 20:1 ratio for billing purposes.
## Timeout
```python
@app.function(timeout=3600) # 1 hour in seconds
def long_running():
...
```
- Default: 300 seconds (5 minutes)
- Maximum: 86,400 seconds (24 hours)
- Function is killed when timeout expires
## Billing
Charged based on whichever is higher: reservation or actual usage.
You are charged based on **whichever is higher**: your resource request or actual usage.
Disk requests increase memory request at 20:1 ratio:
- Requesting 500 GiB disk → increases memory request to 25 GiB (if not already higher)
| Resource | Billing Basis |
|----------|--------------|
| CPU | max(requested, used) |
| Memory | max(requested, used) |
| GPU | Time GPU is allocated |
| Disk | Increases memory billing at 20:1 ratio |
## Maximum Requests
### Cost Optimization Tips
Modal enforces maximums at Function creation time. Requests exceeding maximum will be rejected with `InvalidError`.
- Request only what you need
- Use appropriate GPU tiers (L40S over H100 for inference)
- Set `scaledown_window` to minimize idle time
- Use `min_containers=0` when cold starts are acceptable
- Batch inputs with `.map()` instead of individual `.remote()` calls
Contact support if you need higher limits.
## Example: Resource Configuration
## Complete Example
```python
@app.function(
cpu=4.0, # 4 physical cores
memory=16384, # 16 GiB RAM
ephemeral_disk=51200, # 50 GiB disk
timeout=3600, # 1 hour timeout
cpu=8.0, # 8 physical cores
memory=32768, # 32 GiB
gpu="L40S", # L40S GPU
ephemeral_disk=204800, # 200 GiB temp disk
timeout=7200, # 2 hours
max_containers=50,
min_containers=1,
)
def process_data():
# Heavy processing with large files
def full_pipeline(data_path: str):
...
```
## Monitoring Resource Usage
View resource usage in Modal dashboard:
- CPU utilization
- Memory usage
- Disk usage
- GPU metrics (if applicable)
Access via https://modal.com/apps

View File

@@ -1,230 +1,173 @@
# Scaling Out on Modal
# Modal Scaling and Concurrency
## Automatic Autoscaling
## Table of Contents
Every Modal Function corresponds to an autoscaling pool of containers. Modal's autoscaler:
- Spins up containers when no capacity available
- Spins down containers when resources idle
- Scales to zero by default when no inputs to process
- [Autoscaling](#autoscaling)
- [Configuration](#configuration)
- [Parallel Execution](#parallel-execution)
- [Concurrent Inputs](#concurrent-inputs)
- [Dynamic Batching](#dynamic-batching)
- [Dynamic Autoscaler Updates](#dynamic-autoscaler-updates)
- [Limits](#limits)
Autoscaling decisions are made quickly and frequently.
## Autoscaling
## Parallel Execution with `.map()`
Modal automatically manages a pool of containers for each function:
- Spins up containers when there's no capacity for new inputs
- Spins down idle containers to save costs
- Scales from zero (no cost when idle) to thousands of containers
Run function repeatedly with different inputs in parallel:
No configuration needed for basic autoscaling — it works out of the box.
```python
@app.function()
def evaluate_model(x):
return x ** 2
## Configuration
@app.local_entrypoint()
def main():
inputs = list(range(100))
# Runs 100 inputs in parallel across containers
for result in evaluate_model.map(inputs):
print(result)
```
### Multiple Arguments with `.starmap()`
For functions with multiple arguments:
```python
@app.function()
def add(a, b):
return a + b
@app.local_entrypoint()
def main():
results = list(add.starmap([(1, 2), (3, 4)]))
# [3, 7]
```
### Exception Handling
```python
@app.function()
def may_fail(a):
if a == 2:
raise Exception("error")
return a ** 2
@app.local_entrypoint()
def main():
results = list(may_fail.map(
range(3),
return_exceptions=True,
wrap_returned_exceptions=False
))
# [0, 1, Exception('error')]
```
## Autoscaling Configuration
Configure autoscaler behavior with parameters:
Fine-tune autoscaling behavior:
```python
@app.function(
max_containers=100, # Upper limit on containers
min_containers=2, # Keep warm even when inactive
buffer_containers=5, # Maintain buffer while active
scaledown_window=60, # Max idle time before scaling down (seconds)
max_containers=100, # Upper limit on container count
min_containers=2, # Keep 2 warm (reduces cold starts)
buffer_containers=5, # Reserve 5 extra for burst traffic
scaledown_window=300, # Wait 5 min idle before shutting down
)
def my_function():
def handle_request(data):
...
```
Parameters:
- **max_containers**: Upper limit on total containers
- **min_containers**: Minimum kept warm even when inactive
- **buffer_containers**: Buffer size while function active (additional inputs won't need to queue)
- **scaledown_window**: Maximum idle duration before scale down (seconds)
| Parameter | Default | Description |
|-----------|---------|-------------|
| `max_containers` | Unlimited | Hard cap on total containers |
| `min_containers` | 0 | Minimum warm containers (costs money even when idle) |
| `buffer_containers` | 0 | Extra containers to prevent queuing |
| `scaledown_window` | 60 | Seconds of idle time before shutdown |
Trade-offs:
- Larger warm pool/buffer → Higher cost, lower latency
- Longer scaledown window → Less churn for infrequent requests
### Trade-offs
- Higher `min_containers` = lower latency, higher cost
- Higher `buffer_containers` = less queuing, higher cost
- Lower `scaledown_window` = faster cost savings, more cold starts
## Parallel Execution
### `.map()` — Process Many Inputs
```python
@app.function()
def process(item):
return heavy_computation(item)
@app.local_entrypoint()
def main():
items = list(range(10_000))
results = list(process.map(items))
```
Modal automatically scales containers to handle the workload. Results maintain input order.
### `.map()` Options
```python
# Unordered results (faster)
for result in process.map(items, order_outputs=False):
handle(result)
# Collect errors instead of raising
results = list(process.map(items, return_exceptions=True))
for r in results:
if isinstance(r, Exception):
print(f"Error: {r}")
```
### `.starmap()` — Multi-Argument
```python
@app.function()
def add(x, y):
return x + y
results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
# [3, 7, 11]
```
### `.spawn()` — Fire-and-Forget
```python
# Returns immediately
call = process.spawn(large_data)
# Check status or get result later
result = call.get()
```
Up to 1 million pending `.spawn()` calls.
## Concurrent Inputs
By default, each container handles one input at a time. Use `@modal.concurrent` to handle multiple:
```python
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=10)
async def predict(text: str):
result = await model.predict_async(text)
return result
```
This is ideal for I/O-bound workloads or async inference where a single GPU can handle multiple requests.
### With Web Endpoints
```python
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=20)
@modal.asgi_app()
def web_service():
return fastapi_app
```
## Dynamic Batching
Collect inputs into batches for efficient GPU utilization:
```python
@app.function(gpu="L40S")
@modal.batched(max_batch_size=32, wait_ms=100)
async def batch_predict(texts: list[str]):
# Called with up to 32 texts at once
embeddings = model.encode(texts)
return list(embeddings)
```
- `max_batch_size` — Maximum inputs per batch
- `wait_ms` — How long to wait for more inputs before processing
- The function receives a list and must return a list of the same length
## Dynamic Autoscaler Updates
Update autoscaler settings without redeployment:
```python
f = modal.Function.from_name("my-app", "f")
f.update_autoscaler(max_containers=100)
```
Settings revert to decorator configuration on next deploy, or are overridden by further updates:
```python
f.update_autoscaler(min_containers=2, max_containers=10)
f.update_autoscaler(min_containers=4) # max_containers=10 still in effect
```
### Time-Based Scaling
Adjust warm pool based on time of day:
Adjust autoscaling at runtime without redeploying:
```python
@app.function()
def inference_server():
...
def scale_up_for_peak():
process = modal.Function.from_name("my-app", "process")
process.update_autoscaler(min_containers=10, buffer_containers=20)
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
def increase_warm_pool():
inference_server.update_autoscaler(min_containers=4)
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
def decrease_warm_pool():
inference_server.update_autoscaler(min_containers=0)
```
### For Classes
Update autoscaler for specific parameter instances:
```python
MyClass = modal.Cls.from_name("my-app", "MyClass")
obj = MyClass(model_version="3.5")
obj.update_autoscaler(buffer_containers=2) # type: ignore
```
## Input Concurrency
Process multiple inputs per container with `@modal.concurrent`:
```python
@app.function()
@modal.concurrent(max_inputs=100)
def my_function(input: str):
# Container can handle up to 100 concurrent inputs
...
def scale_down_after_peak():
process = modal.Function.from_name("my-app", "process")
process.update_autoscaler(min_containers=1, buffer_containers=2)
```
Ideal for I/O-bound workloads:
- Database queries
- External API requests
- Remote Modal Function calls
Settings revert to the decorator values on the next deployment.
### Concurrency Mechanisms
## Limits
**Synchronous Functions**: Separate threads (must be thread-safe)
| Resource | Limit |
|----------|-------|
| Pending inputs (unassigned) | 2,000 |
| Total inputs (running + pending) | 25,000 |
| Pending `.spawn()` inputs | 1,000,000 |
| Concurrent inputs per `.map()` | 1,000 |
| Rate limit (web endpoints) | 200 req/s |
```python
@app.function()
@modal.concurrent(max_inputs=10)
def sync_function():
time.sleep(1) # Must be thread-safe
```
**Async Functions**: Separate asyncio tasks (must not block event loop)
```python
@app.function()
@modal.concurrent(max_inputs=10)
async def async_function():
await asyncio.sleep(1) # Must not block event loop
```
### Target vs Max Inputs
```python
@app.function()
@modal.concurrent(
max_inputs=120, # Hard limit
target_inputs=100 # Autoscaler target
)
def my_function(input: str):
# Allow 20% burst above target
...
```
Autoscaler aims for `target_inputs`, but containers can burst to `max_inputs` during scale-up.
## Scaling Limits
Modal enforces limits per function:
- 2,000 pending inputs (not yet assigned to containers)
- 25,000 total inputs (running + pending)
For `.spawn()` async jobs: up to 1 million pending inputs.
Exceeding limits returns `Resource Exhausted` error - retry later.
Each `.map()` invocation: max 1,000 concurrent inputs.
## Async Usage
Use async APIs for arbitrary parallel execution patterns:
```python
@app.function()
async def async_task(x):
await asyncio.sleep(1)
return x * 2
@app.local_entrypoint()
async def main():
tasks = [async_task.remote.aio(i) for i in range(100)]
results = await asyncio.gather(*tasks)
```
## Common Gotchas
**Incorrect**: Using Python's builtin map (runs sequentially)
```python
# DON'T DO THIS
results = map(evaluate_model, inputs)
```
**Incorrect**: Calling function first
```python
# DON'T DO THIS
results = evaluate_model(inputs).map()
```
**Correct**: Call .map() on Modal function object
```python
# DO THIS
results = evaluate_model.map(inputs)
```
Exceeding these limits triggers `Resource Exhausted` errors. Implement retry logic for resilience.

View File

@@ -1,303 +1,143 @@
# Scheduled Jobs and Cron
# Modal Scheduled Jobs
## Basic Scheduling
## Overview
Schedule functions to run automatically at regular intervals or specific times.
Modal supports running functions automatically on a schedule, either using cron syntax or fixed intervals. Deploy scheduled functions with `modal deploy` and they run unattended in the cloud.
### Simple Daily Schedule
## Schedule Types
### modal.Cron
Standard cron syntax — stable across deploys:
```python
import modal
app = modal.App()
app = modal.App("scheduled-tasks")
@app.function(schedule=modal.Period(days=1))
def daily_task():
print("Running daily task")
# Process data, send reports, etc.
# Daily at 9 AM UTC
@app.function(schedule=modal.Cron("0 9 * * *"))
def daily_report():
generate_and_send_report()
# Every Monday at midnight
@app.function(schedule=modal.Cron("0 0 * * 1"))
def weekly_cleanup():
cleanup_old_data()
# Every 15 minutes
@app.function(schedule=modal.Cron("*/15 * * * *"))
def frequent_check():
check_system_health()
```
Deploy to activate:
```bash
modal deploy script.py
#### Cron Syntax Reference
```
┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6, Sun=0)
│ │ │ │ │
* * * * *
```
Function runs every 24 hours from deployment time.
| Pattern | Meaning |
|---------|---------|
| `0 9 * * *` | Daily at 9:00 AM UTC |
| `0 */6 * * *` | Every 6 hours |
| `*/30 * * * *` | Every 30 minutes |
| `0 0 * * 1` | Every Monday at midnight |
| `0 0 1 * *` | First day of every month |
| `0 9 * * 1-5` | Weekdays at 9 AM |
## Schedule Types
### modal.Period
### Period Schedules
Run at fixed intervals from deployment time:
Fixed interval — resets on each deploy:
```python
# Every 5 hours
@app.function(schedule=modal.Period(hours=5))
def every_5_hours():
...
def periodic_sync():
sync_data()
# Every 30 minutes
@app.function(schedule=modal.Period(minutes=30))
def every_30_minutes():
...
def poll_updates():
check_for_updates()
# Every day
@app.function(schedule=modal.Period(days=1))
def daily():
def daily_task():
...
```
**Note**: Redeploying resets the period timer.
`modal.Period` resets its timer on each deployment. If you need a schedule that doesn't shift with deploys, use `modal.Cron`.
### Cron Schedules
## Deploying Scheduled Functions
Run at specific times using cron syntax:
```python
# Every Monday at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * 1"))
def weekly_report():
...
# Daily at 6 AM New York time
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
def morning_report():
...
# Every hour on the hour
@app.function(schedule=modal.Cron("0 * * * *"))
def hourly():
...
# Every 15 minutes
@app.function(schedule=modal.Cron("*/15 * * * *"))
def quarter_hourly():
...
```
**Cron syntax**: `minute hour day month day_of_week`
- Minute: 0-59
- Hour: 0-23
- Day: 1-31
- Month: 1-12
- Day of week: 0-6 (0 = Sunday)
### Timezone Support
Specify timezone for cron schedules:
```python
@app.function(schedule=modal.Cron("0 9 * * *", timezone="Europe/London"))
def uk_morning_task():
...
@app.function(schedule=modal.Cron("0 17 * * 5", timezone="Asia/Tokyo"))
def friday_evening_jp():
...
```
## Deployment
### Deploy Scheduled Functions
Schedules only activate when deployed:
```bash
modal deploy script.py
```
Scheduled functions persist until explicitly stopped.
### Programmatic Deployment
```python
if __name__ == "__main__":
app.deploy()
```
`modal run` and `modal serve` do not activate schedules.
## Monitoring
### View Execution Logs
- View scheduled runs in the **Apps** section of the Modal dashboard
- Each run appears with its status, duration, and logs
- Use the **"Run Now"** button on the dashboard to trigger manually
Check https://modal.com/apps for:
- Past execution logs
- Execution history
- Failure notifications
## Management
### Run Manually
Trigger scheduled function immediately via dashboard "Run now" button.
## Schedule Management
### Pausing Schedules
Schedules cannot be paused. To stop:
1. Remove `schedule` parameter
2. Redeploy app
### Updating Schedules
Change schedule parameters and redeploy:
```python
# Update from daily to weekly
@app.function(schedule=modal.Period(days=7))
def task():
...
```
```bash
modal deploy script.py
```
- Schedules cannot be paused — remove the schedule and redeploy to stop
- To change a schedule, update the `schedule` parameter and redeploy
- To stop entirely, either remove the `schedule` parameter or run `modal app stop <name>`
## Common Patterns
### Data Pipeline
### ETL Pipeline
```python
@app.function(
schedule=modal.Cron("0 2 * * *"), # 2 AM daily
timeout=3600, # 1 hour timeout
schedule=modal.Cron("0 2 * * *"), # 2 AM UTC daily
secrets=[modal.Secret.from_name("db-creds")],
timeout=7200,
)
def etl_pipeline():
# Extract data from sources
data = extract_data()
# Transform data
transformed = transform_data(data)
# Load to warehouse
load_to_warehouse(transformed)
import os
data = extract(os.environ["SOURCE_DB_URL"])
transformed = transform(data)
load(transformed, os.environ["DEST_DB_URL"])
```
### Model Retraining
```python
volume = modal.Volume.from_name("models")
@app.function(
schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday midnight
gpu="A100",
timeout=7200, # 2 hours
volumes={"/models": volume}
schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday
gpu="H100",
volumes={"/data": data_vol, "/models": model_vol},
timeout=86400,
)
def retrain_model():
# Load latest data
data = load_training_data()
# Train model
model = train(data)
# Save new model
save_model(model, "/models/latest.pt")
volume.commit()
def retrain():
model = train_on_latest_data("/data/training/")
torch.save(model.state_dict(), "/models/latest.pt")
```
### Report Generation
### Health Checks
```python
@app.function(
schedule=modal.Cron("0 9 * * 1"), # Monday 9 AM
secrets=[modal.Secret.from_name("email-creds")]
)
def weekly_report():
# Generate report
report = generate_analytics_report()
# Send email
send_email(
to="team@company.com",
subject="Weekly Analytics Report",
body=report
schedule=modal.Period(minutes=5),
secrets=[modal.Secret.from_name("slack-webhook")],
)
def health_check():
import os, requests
status = check_all_services()
if not status["healthy"]:
requests.post(os.environ["SLACK_URL"], json={"text": f"Alert: {status}"})
```
### Data Cleanup
```python
@app.function(schedule=modal.Period(hours=6))
def cleanup_old_data():
# Remove data older than 30 days
cutoff = datetime.now() - timedelta(days=30)
delete_old_records(cutoff)
```
## Configuration with Secrets and Volumes
Scheduled functions support all function parameters:
```python
vol = modal.Volume.from_name("data")
secret = modal.Secret.from_name("api-keys")
@app.function(
schedule=modal.Cron("0 */6 * * *"), # Every 6 hours
secrets=[secret],
volumes={"/data": vol},
cpu=4.0,
memory=16384,
)
def sync_data():
import os
api_key = os.environ["API_KEY"]
# Fetch from external API
data = fetch_external_data(api_key)
# Save to volume
with open("/data/latest.json", "w") as f:
json.dump(data, f)
vol.commit()
```
## Dynamic Scheduling
Update schedules programmatically:
```python
@app.function()
def main_task():
...
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
def enable_high_traffic_mode():
main_task.update_autoscaler(min_containers=5)
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
def disable_high_traffic_mode():
main_task.update_autoscaler(min_containers=0)
```
## Error Handling
Scheduled functions that fail will:
- Show failure in dashboard
- Send notifications (configurable)
- Retry on next scheduled run
```python
@app.function(
schedule=modal.Cron("0 * * * *"),
retries=3, # Retry failed runs
timeout=1800
)
def robust_task():
try:
perform_task()
except Exception as e:
# Log error
print(f"Task failed: {e}")
# Optionally send alert
send_alert(f"Scheduled task failed: {e}")
raise
```
## Best Practices
1. **Set timeouts**: Always specify timeout for scheduled functions
2. **Use appropriate schedules**: Period for relative timing, Cron for absolute
3. **Monitor failures**: Check dashboard regularly for failed runs
4. **Idempotent operations**: Design tasks to handle reruns safely
5. **Resource limits**: Set appropriate CPU/memory for scheduled workloads
6. **Timezone awareness**: Specify timezone for cron schedules

View File

@@ -1,180 +1,119 @@
# Secrets and Environment Variables
# Modal Secrets
## Overview
Modal Secrets securely deliver credentials and sensitive data to functions as environment variables. Secrets are stored encrypted and only available to your workspace.
## Creating Secrets
### Via Dashboard
Create secrets at https://modal.com/secrets
Templates available for:
- Database credentials (Postgres, MongoDB)
- Cloud providers (AWS, GCP, Azure)
- ML platforms (Weights & Biases, Hugging Face)
- And more
### Via CLI
```bash
# Create secret with key-value pairs
modal secret create my-secret KEY1=value1 KEY2=value2
# Create with key-value pairs
modal secret create my-api-keys API_KEY=sk-xxx DB_PASSWORD=hunter2
# Use environment variables
modal secret create db-secret PGHOST=uri PGPASSWORD="$PGPASSWORD"
# Create from existing environment variables
modal secret create my-env-keys API_KEY=$API_KEY
# List secrets
# List all secrets
modal secret list
# Delete secret
modal secret delete my-secret
# Delete a secret
modal secret delete my-api-keys
```
### Programmatically
### Via Dashboard
From dictionary:
Navigate to https://modal.com/secrets to create and manage secrets. Templates are available for common services (Postgres, MongoDB, Hugging Face, Weights & Biases, etc.).
### Programmatic (Inline)
```python
if modal.is_local():
local_secret = modal.Secret.from_dict({"FOO": os.environ["LOCAL_FOO"]})
else:
local_secret = modal.Secret.from_dict({})
# From a dictionary (useful for development)
secret = modal.Secret.from_dict({"API_KEY": "sk-xxx"})
@app.function(secrets=[local_secret])
def some_function():
import os
print(os.environ["FOO"])
# From a .env file
secret = modal.Secret.from_dotenv()
# From a named secret (created via CLI or dashboard)
secret = modal.Secret.from_name("my-api-keys")
```
From .env file:
## Using Secrets in Functions
### Basic Usage
```python
@app.function(secrets=[modal.Secret.from_dotenv()])
def some_function():
@app.function(secrets=[modal.Secret.from_name("my-api-keys")])
def call_api():
import os
print(os.environ["USERNAME"])
```
## Using Secrets
Inject secrets into functions:
```python
@app.function(secrets=[modal.Secret.from_name("my-secret")])
def some_function():
import os
secret_key = os.environ["MY_PASSWORD"]
# Use secret
...
api_key = os.environ["API_KEY"]
# Use the key
response = requests.get(url, headers={"Authorization": f"Bearer {api_key}"})
return response.json()
```
### Multiple Secrets
```python
@app.function(secrets=[
modal.Secret.from_name("openai-keys"),
modal.Secret.from_name("database-creds"),
modal.Secret.from_name("api-keys"),
])
def other_function():
# All keys from both secrets available
def process():
import os
openai_key = os.environ["OPENAI_API_KEY"]
db_url = os.environ["DATABASE_URL"]
...
```
Later secrets override earlier ones if keys clash.
Secrets are applied in order — if two secrets define the same key, the later one wins.
## Environment Variables
### Reserved Runtime Variables
**All Containers**:
- `MODAL_CLOUD_PROVIDER` - Cloud provider (AWS/GCP/OCI)
- `MODAL_IMAGE_ID` - Image ID
- `MODAL_REGION` - Region identifier (e.g., us-east-1)
- `MODAL_TASK_ID` - Container task ID
**Function Containers**:
- `MODAL_ENVIRONMENT` - Modal Environment name
- `MODAL_IS_REMOTE` - Set to '1' in remote containers
- `MODAL_IDENTITY_TOKEN` - OIDC token for function identity
**Sandbox Containers**:
- `MODAL_SANDBOX_ID` - Sandbox ID
### Setting Environment Variables
Via Image:
### With Classes
```python
image = modal.Image.debian_slim().env({"PORT": "6443"})
@app.function(image=image)
def my_function():
@app.cls(secrets=[modal.Secret.from_name("huggingface")])
class ModelService:
@modal.enter()
def load(self):
import os
port = os.environ["PORT"]
token = os.environ["HF_TOKEN"]
self.model = AutoModel.from_pretrained("model-name", token=token)
```
Via Secrets:
### From .env File
```python
secret = modal.Secret.from_dict({"API_KEY": "secret-value"})
@app.function(secrets=[secret])
def my_function():
# Reads .env file from current directory
@app.function(secrets=[modal.Secret.from_dotenv()])
def local_dev():
import os
api_key = os.environ["API_KEY"]
```
## Common Secret Patterns
The `.env` file format:
### AWS Credentials
```python
aws_secret = modal.Secret.from_name("my-aws-secret")
@app.function(secrets=[aws_secret])
def use_aws():
import boto3
s3 = boto3.client('s3')
# AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY automatically used
```
API_KEY=sk-xxx
DATABASE_URL=postgres://user:pass@host/db
DEBUG=false
```
### Hugging Face Token
## Common Secret Templates
```python
hf_secret = modal.Secret.from_name("huggingface")
@app.function(secrets=[hf_secret])
def download_model():
from transformers import AutoModel
# HF_TOKEN automatically used for authentication
model = AutoModel.from_pretrained("private-model")
```
### Database Credentials
```python
db_secret = modal.Secret.from_name("postgres-creds")
@app.function(secrets=[db_secret])
def query_db():
import psycopg2
conn = psycopg2.connect(
host=os.environ["PGHOST"],
port=os.environ["PGPORT"],
user=os.environ["PGUSER"],
password=os.environ["PGPASSWORD"],
)
```
## Best Practices
1. **Never hardcode secrets** - Always use Modal Secrets
2. **Use specific secrets** - Create separate secrets for different purposes
3. **Rotate secrets regularly** - Update secrets periodically
4. **Minimal scope** - Only attach secrets to functions that need them
5. **Environment-specific** - Use different secrets for dev/staging/prod
| Service | Typical Keys |
|---------|-------------|
| OpenAI | `OPENAI_API_KEY` |
| Hugging Face | `HF_TOKEN` |
| AWS | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
| Postgres | `PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE` |
| Weights & Biases | `WANDB_API_KEY` |
| GitHub | `GITHUB_TOKEN` |
## Security Notes
- Secrets are encrypted at rest
- Only available to functions that explicitly request them
- Not logged or exposed in dashboards
- Can be scoped to specific environments
- Secrets are encrypted at rest and in transit
- Only accessible to functions in your workspace
- Never log or print secret values
- Use `.from_name()` in production (not `.from_dict()`)
- Rotate secrets regularly via the dashboard or CLI

View File

@@ -1,303 +1,247 @@
# Modal Volumes
## Table of Contents
- [Overview](#overview)
- [Creating Volumes](#creating-volumes)
- [Mounting Volumes](#mounting-volumes)
- [Reading and Writing Files](#reading-and-writing-files)
- [CLI Access](#cli-access)
- [Commits and Reloads](#commits-and-reloads)
- [Concurrent Access](#concurrent-access)
- [Volumes v2](#volumes-v2)
- [Common Patterns](#common-patterns)
## Overview
Modal Volumes provide high-performance distributed file systems for Modal applications. Designed for write-once, read-many workloads like ML model weights and distributed data processing.
Volumes are Modal's distributed file system, optimized for write-once, read-many workloads like storing model weights and distributing them across containers.
Key characteristics:
- Persistent across function invocations and deployments
- Mountable by multiple functions simultaneously
- Background auto-commits every few seconds
- Final commit on container shutdown
## Creating Volumes
### In Code (Lazy Creation)
```python
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
```
### Via CLI
```bash
modal volume create my-volume
# v2 volume (beta)
modal volume create my-volume --version=2
```
For Volumes v2 (beta):
```bash
modal volume create --version=2 my-volume
```
### From Code
### Programmatic v2
```python
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
# For v2
vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
```
## Using Volumes
## Mounting Volumes
Attach to functions via mount points:
Mount volumes to functions via the `volumes` parameter:
```python
vol = modal.Volume.from_name("my-volume")
vol = modal.Volume.from_name("model-store", create_if_missing=True)
@app.function(volumes={"/data": vol})
def run():
with open("/data/xyz.txt", "w") as f:
f.write("hello")
vol.commit() # Persist changes
@app.function(volumes={"/models": vol})
def use_model():
# Access files at /models/
with open("/models/config.json") as f:
config = json.load(f)
```
## Commits and Reloads
### Commits
Persist changes to Volume:
Mount multiple volumes:
```python
@app.function(volumes={"/data": vol})
def write_data():
with open("/data/file.txt", "w") as f:
f.write("data")
vol.commit() # Make changes visible to other containers
```
weights_vol = modal.Volume.from_name("weights")
data_vol = modal.Volume.from_name("datasets")
**Background commits**: Modal automatically commits Volume changes every few seconds and on container shutdown.
### Reloads
Fetch latest changes from other containers:
```python
@app.function(volumes={"/data": vol})
def read_data():
vol.reload() # Fetch latest changes
with open("/data/file.txt", "r") as f:
content = f.read()
```
At container creation, latest Volume state is mounted. Reload needed to see subsequent commits from other containers.
## Uploading Files
### Batch Upload (Efficient)
```python
vol = modal.Volume.from_name("my-volume")
with vol.batch_upload() as batch:
batch.put_file("local-path.txt", "/remote-path.txt")
batch.put_directory("/local/directory/", "/remote/directory")
batch.put_file(io.BytesIO(b"some data"), "/foobar")
```
### Via Image
```python
image = modal.Image.debian_slim().add_local_dir(
local_path="/home/user/my_dir",
remote_path="/app"
)
@app.function(image=image)
def process():
# Files available at /app
@app.function(volumes={"/weights": weights_vol, "/data": data_vol})
def train():
...
```
## Downloading Files
## Reading and Writing Files
### Via CLI
```bash
modal volume get my-volume remote.txt local.txt
```
Max file size via CLI: No limit
Max file size via dashboard: 16 MB
### Via Python SDK
### Writing
```python
vol = modal.Volume.from_name("my-volume")
@app.function(volumes={"/data": vol})
def save_results(results):
import json
import os
for data in vol.read_file("path.txt"):
print(data)
os.makedirs("/data/outputs", exist_ok=True)
with open("/data/outputs/results.json", "w") as f:
json.dump(results, f)
```
## Volume Performance
### Volumes v1
Best for:
- <50,000 files (recommended)
- <500,000 files (hard limit)
- Sequential access patterns
- <5 concurrent writers
### Volumes v2 (Beta)
Improved for:
- Unlimited files
- Hundreds of concurrent writers
- Random access patterns
- Large files (up to 1 TiB)
Current v2 limits:
- Max file size: 1 TiB
- Max files per directory: 32,768
- Unlimited directory depth
## Model Storage
### Saving Model Weights
### Reading
```python
volume = modal.Volume.from_name("model-weights", create_if_missing=True)
MODEL_DIR = "/models"
@app.function(volumes={"/data": vol})
def load_results():
with open("/data/outputs/results.json") as f:
return json.load(f)
```
@app.function(volumes={MODEL_DIR: volume})
def train():
### Large Files (Model Weights)
```python
@app.function(volumes={"/models": vol}, gpu="L40S")
def save_model():
import torch
model = train_model()
save_model(f"{MODEL_DIR}/my_model.pt", model)
volume.commit()
torch.save(model.state_dict(), "/models/checkpoint.pt")
@app.function(volumes={"/models": vol}, gpu="L40S")
def load_model():
import torch
model = MyModel()
model.load_state_dict(torch.load("/models/checkpoint.pt"))
return model
```
### Loading Model Weights
```python
@app.function(volumes={MODEL_DIR: volume})
def inference(model_id: str):
try:
model = load_model(f"{MODEL_DIR}/{model_id}")
except NotFound:
volume.reload() # Fetch latest models
model = load_model(f"{MODEL_DIR}/{model_id}")
return model.run(request)
```
## Model Checkpointing
Save checkpoints during long training jobs:
```python
volume = modal.Volume.from_name("checkpoints")
VOL_PATH = "/vol"
@app.function(
gpu="A10G",
timeout=2*60*60, # 2 hours
volumes={VOL_PATH: volume}
)
def finetune():
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
output_dir=str(VOL_PATH / "model"), # Checkpoints saved to Volume
save_steps=100,
# ... more args
)
trainer = Seq2SeqTrainer(model=model, args=training_args, ...)
trainer.train()
```
Background commits ensure checkpoints persist even if training is interrupted.
## CLI Commands
## CLI Access
```bash
# List files
modal volume ls my-volume
modal volume ls my-volume /subdir/
# Upload
modal volume put my-volume local.txt remote.txt
# Upload files
modal volume put my-volume local_file.txt
modal volume put my-volume local_file.txt /remote/path/file.txt
# Download
modal volume get my-volume remote.txt local.txt
# Download files
modal volume get my-volume /remote/file.txt local_file.txt
# Copy within Volume
modal volume cp my-volume src.txt dst.txt
# Delete
modal volume rm my-volume file.txt
# List all volumes
modal volume list
# Delete volume
# Delete a volume
modal volume delete my-volume
```
## Ephemeral Volumes
## Commits and Reloads
Create temporary volumes that are garbage collected:
Modal auto-commits volume changes in the background every few seconds and on container shutdown.
### Explicit Commit
Force an immediate commit:
```python
with modal.Volume.ephemeral() as vol:
sb = modal.Sandbox.create(
volumes={"/cache": vol},
app=my_app,
)
# Use volume
# Automatically cleaned up when context exits
@app.function(volumes={"/data": vol})
def writer():
with open("/data/file.txt", "w") as f:
f.write("hello")
vol.commit() # Make immediately visible to other containers
```
### Reload
See changes from other containers:
```python
@app.function(volumes={"/data": vol})
def reader():
vol.reload() # Refresh to see latest writes
with open("/data/file.txt") as f:
return f.read()
```
## Concurrent Access
### Concurrent Reads
### v1 Volumes
Multiple containers can read simultaneously without issues.
- Recommended max 5 concurrent commits
- Last write wins for concurrent modifications of the same file
- Avoid concurrent modification of identical files
- Max 500,000 files (inodes)
### Concurrent Writes
### v2 Volumes
Supported but:
- Avoid modifying same files concurrently
- Last write wins (data loss possible)
- v1: Limit to ~5 concurrent writers
- v2: Hundreds of concurrent writers supported
- Hundreds of concurrent writers (distinct files)
- No file count limit
- Improved random access performance
- Up to 1 TiB per file, 262,144 files per directory
## Volume Errors
## Volumes v2
### "Volume Busy"
v2 Volumes (beta) offer significant improvements:
Cannot reload when files are open:
| Feature | v1 | v2 |
|---------|----|----|
| Max files | 500,000 | Unlimited |
| Concurrent writes | ~5 | Hundreds |
| Max file size | No limit | 1 TiB |
| Random access | Limited | Full support |
| HIPAA compliance | No | Yes |
| Hard links | No | Yes |
Enable v2:
```python
# WRONG
f = open("/vol/data.txt", "r")
volume.reload() # ERROR: volume busy
vol = modal.Volume.from_name("my-vol-v2", create_if_missing=True, version=2)
```
## Common Patterns
### Model Weight Storage
```python
# CORRECT
with open("/vol/data.txt", "r") as f:
data = f.read()
# File closed before reload
volume.reload()
vol = modal.Volume.from_name("model-weights", create_if_missing=True)
# Download once during image build
def download_weights():
from huggingface_hub import snapshot_download
snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
image = (
modal.Image.debian_slim()
.uv_pip_install("huggingface_hub")
.run_function(download_weights, volumes={"/models": vol})
)
```
### "File Not Found"
Remember to use mount point:
### Training Checkpoints
```python
# WRONG - file saved to local disk
with open("/xyz.txt", "w") as f:
f.write("data")
# CORRECT - file saved to Volume
with open("/data/xyz.txt", "w") as f:
f.write("data")
@app.function(volumes={"/checkpoints": vol}, gpu="H100", timeout=86400)
def train():
for epoch in range(100):
train_one_epoch()
torch.save(model.state_dict(), f"/checkpoints/epoch_{epoch}.pt")
vol.commit() # Save checkpoint immediately
```
## Upgrading from v1 to v2
### Shared Data Between Functions
No automated migration currently. Manual steps:
```python
data_vol = modal.Volume.from_name("shared-data", create_if_missing=True)
1. Create new v2 Volume
2. Copy data using `cp` or `rsync`
3. Update app to use new Volume
@app.function(volumes={"/data": data_vol})
def preprocess():
# Write processed data
df.to_parquet("/data/processed.parquet")
```bash
modal volume create --version=2 my-volume-v2
modal shell --volume my-volume --volume my-volume-v2
# In shell:
cp -rp /mnt/my-volume/. /mnt/my-volume-v2/.
sync /mnt/my-volume-v2
@app.function(volumes={"/data": data_vol})
def analyze():
data_vol.reload() # Ensure we see latest data
df = pd.read_parquet("/data/processed.parquet")
return df.describe()
```
Warning: Deployed apps reference Volumes by ID. Re-deploy after creating new Volume.
### Performance Tips
- Volumes are optimized for large files, not many small files
- Keep under 50,000 files and directories for best v1 performance
- Use Parquet or other columnar formats instead of many small CSVs
- For truly temporary data, use `ephemeral_disk` instead of Volumes

View File

@@ -1,337 +1,254 @@
# Web Endpoints
# Modal Web Endpoints
## Quick Start
## Table of Contents
Create web endpoint with single decorator:
```python
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
@app.function(image=image)
@modal.fastapi_endpoint()
def hello():
return "Hello world!"
```
## Development and Deployment
### Development with `modal serve`
```bash
modal serve server.py
```
Creates ephemeral app with live-reloading. Changes to endpoints appear almost immediately.
### Deployment with `modal deploy`
```bash
modal deploy server.py
```
Creates persistent endpoint with stable URL.
- [Simple Endpoints](#simple-endpoints)
- [Deployment](#deployment)
- [ASGI Apps](#asgi-apps-fastapi-starlette-fasthtml)
- [WSGI Apps](#wsgi-apps-flask-django)
- [Custom Web Servers](#custom-web-servers)
- [WebSockets](#websockets)
- [Authentication](#authentication)
- [Streaming](#streaming)
- [Concurrency](#concurrency)
- [Limits](#limits)
## Simple Endpoints
### Query Parameters
The easiest way to create a web endpoint:
```python
@app.function(image=image)
@modal.fastapi_endpoint()
def square(x: int):
return {"square": x**2}
```
import modal
Call with:
```bash
curl "https://workspace--app-square.modal.run?x=42"
```
### POST Requests
```python
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
def square(item: dict):
return {"square": item['x']**2}
```
Call with:
```bash
curl -X POST -H 'Content-Type: application/json' \
--data '{"x": 42}' \
https://workspace--app-square.modal.run
```
### Pydantic Models
```python
from pydantic import BaseModel
class Item(BaseModel):
name: str
qty: int = 42
app = modal.App("api-service")
@app.function()
@modal.fastapi_endpoint(method="POST")
def process(item: Item):
return {"processed": item.name, "quantity": item.qty}
@modal.fastapi_endpoint()
def hello(name: str = "World"):
return {"message": f"Hello, {name}!"}
```
### POST Endpoints
```python
@app.function()
@modal.fastapi_endpoint(method="POST")
def predict(data: dict):
result = model.predict(data["text"])
return {"prediction": result}
```
### Query Parameters
Parameters are automatically parsed from query strings:
```python
@app.function()
@modal.fastapi_endpoint()
def search(query: str, limit: int = 10):
return {"results": do_search(query, limit)}
```
Access via: `https://your-app.modal.run?query=hello&limit=5`
## Deployment
### Development Mode
```bash
modal serve script.py
```
- Creates a temporary public URL
- Hot-reloads on file changes
- Perfect for development and testing
- URL expires when you stop the command
### Production Deployment
```bash
modal deploy script.py
```
- Creates a permanent URL
- Runs persistently in the cloud
- Autoscales based on traffic
- URL format: `https://<workspace>--<app-name>-<function-name>.modal.run`
## ASGI Apps (FastAPI, Starlette, FastHTML)
Serve full ASGI applications:
For full framework applications, use `@modal.asgi_app`:
```python
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
@app.function(image=image)
@modal.concurrent(max_inputs=100)
@modal.asgi_app()
def fastapi_app():
from fastapi import FastAPI
web_app = FastAPI()
@web_app.get("/")
async def root():
return {"message": "Hello"}
return {"status": "ok"}
@web_app.post("/echo")
async def echo(request: Request):
body = await request.json()
return body
@web_app.post("/predict")
async def predict(request: dict):
return {"result": model.run(request["input"])}
@app.function(image=image, gpu="L40S")
@modal.asgi_app()
def fastapi_app():
return web_app
```
### With Class Lifecycle
```python
@app.cls(gpu="L40S", image=image)
class InferenceService:
@modal.enter()
def load_model(self):
self.model = load_model()
@modal.asgi_app()
def serve(self):
from fastapi import FastAPI
app = FastAPI()
@app.post("/generate")
async def generate(request: dict):
return self.model.generate(request["prompt"])
return app
```
## WSGI Apps (Flask, Django)
Serve synchronous web frameworks:
```python
image = modal.Image.debian_slim().pip_install("flask")
from flask import Flask
flask_app = Flask(__name__)
@flask_app.route("/")
def index():
return {"status": "ok"}
@app.function(image=image)
@modal.concurrent(max_inputs=100)
@modal.wsgi_app()
def flask_app():
from flask import Flask, request
web_app = Flask(__name__)
@web_app.post("/echo")
def echo():
return request.json
return web_app
def flask_server():
return flask_app
```
## Non-ASGI Web Servers
WSGI is synchronous — concurrent inputs run on separate threads.
For frameworks with custom network binding:
## Custom Web Servers
> ⚠️ **Security Note**: The example below uses `shell=True` for simplicity. In production environments, prefer using `subprocess.Popen()` with a list of arguments to prevent command injection vulnerabilities.
For non-standard web frameworks (aiohttp, Tornado, TGI):
```python
@app.function()
@modal.concurrent(max_inputs=100)
@modal.web_server(8000)
def my_server():
@app.function(image=image, gpu="H100")
@modal.web_server(port=8000)
def serve():
import subprocess
# Must bind to 0.0.0.0, not 127.0.0.1
# Use list form instead of shell=True for security
subprocess.Popen(["python", "-m", "http.server", "-d", "/", "8000"])
subprocess.Popen([
"python", "-m", "vllm.entrypoints.openai.api_server",
"--model", "meta-llama/Llama-3-70B",
"--host", "0.0.0.0", # Must bind to 0.0.0.0, not localhost
"--port", "8000",
])
```
## Streaming Responses
Use FastAPI's `StreamingResponse`:
```python
import time
def event_generator():
for i in range(10):
yield f"data: event {i}\n\n".encode()
time.sleep(0.5)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def stream():
from fastapi.responses import StreamingResponse
return StreamingResponse(
event_generator(),
media_type="text/event-stream"
)
```
### Streaming from Modal Functions
```python
@app.function(gpu="any")
def process_gpu():
for i in range(10):
yield f"data: result {i}\n\n".encode()
time.sleep(1)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def hook():
from fastapi.responses import StreamingResponse
return StreamingResponse(
process_gpu.remote_gen(),
media_type="text/event-stream"
)
```
### With .map()
```python
@app.function()
def process_segment(i):
return f"segment {i}\n"
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def stream_parallel():
from fastapi.responses import StreamingResponse
return StreamingResponse(
process_segment.map(range(10)),
media_type="text/plain"
)
```
The application must bind to `0.0.0.0` (not `127.0.0.1`).
## WebSockets
Supported with `@web_server`, `@asgi_app`, and `@wsgi_app`. Maintains single function call per connection. Use with `@modal.concurrent` for multiple simultaneous connections.
Supported with `@modal.asgi_app`, `@modal.wsgi_app`, and `@modal.web_server`:
Full WebSocket protocol (RFC 6455) supported. Messages up to 2 MiB each.
```python
from fastapi import FastAPI, WebSocket
web_app = FastAPI()
@web_app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
while True:
data = await websocket.receive_text()
result = process(data)
await websocket.send_text(result)
@app.function()
@modal.asgi_app()
def ws_app():
return web_app
```
- Full WebSocket protocol (RFC 6455)
- Messages up to 2 MiB each
- No RFC 8441 or RFC 7692 support yet
## Authentication
### Proxy Auth Tokens
### Proxy Auth Tokens (Built-in)
First-class authentication via Modal:
Modal provides first-class endpoint protection via proxy auth tokens:
```python
@app.function()
@modal.fastapi_endpoint()
def protected():
return "authenticated!"
def protected(text: str):
return {"result": process(text)}
```
Protect with tokens in settings, pass in headers:
- `Modal-Key`
- `Modal-Secret`
Clients include `Modal-Key` and `Modal-Secret` headers to authenticate.
### Bearer Token Authentication
### Custom Bearer Tokens
```python
from fastapi import Depends, HTTPException, status
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi import Header, HTTPException
auth_scheme = HTTPBearer()
@app.function(secrets=[modal.Secret.from_name("auth-token")])
@modal.fastapi_endpoint()
async def protected(token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
@app.function(secrets=[modal.Secret.from_name("auth-secret")])
@modal.fastapi_endpoint(method="POST")
def secure_predict(data: dict, authorization: str = Header(None)):
import os
if token.credentials != os.environ["AUTH_TOKEN"]:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid token"
)
return "success!"
expected = os.environ["AUTH_TOKEN"]
if authorization != f"Bearer {expected}":
raise HTTPException(status_code=401, detail="Unauthorized")
return {"result": model.predict(data["text"])}
```
### Client IP Address
### Client IP Access
Available for geolocation, rate limiting, and access control.
## Streaming
### Server-Sent Events (SSE)
```python
from fastapi import Request
from fastapi.responses import StreamingResponse
@app.function()
@app.function(gpu="H100")
@modal.fastapi_endpoint()
def get_ip(request: Request):
return f"Your IP: {request.client.host}"
def stream_generate(prompt: str):
def generate():
for token in model.stream(prompt):
yield f"data: {token}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
```
## Web Endpoint URLs
## Concurrency
### Auto-Generated URLs
Format: `https://<workspace>--<app>-<function>.modal.run`
With environment suffix: `https://<workspace>-<suffix>--<app>-<function>.modal.run`
### Custom Labels
Handle multiple requests per container using `@modal.concurrent`:
```python
@app.function()
@modal.fastapi_endpoint(label="api")
def handler():
...
# URL: https://workspace--api.modal.run
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=10)
@modal.fastapi_endpoint(method="POST")
async def batch_predict(data: dict):
return {"result": await model.predict_async(data["text"])}
```
### Programmatic URL Retrieval
```python
@app.function()
@modal.fastapi_endpoint()
def my_endpoint():
url = my_endpoint.get_web_url()
return {"url": url}
# From deployed function
f = modal.Function.from_name("app-name", "my_endpoint")
url = f.get_web_url()
```
### Custom Domains
Available on Team and Enterprise plans:
```python
@app.function()
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
def hello(message: str):
return {"message": f"hello {message}"}
```
Multiple domains:
```python
@modal.fastapi_endpoint(custom_domains=["api.example.com", "api.example.net"])
```
Wildcard domains:
```python
@modal.fastapi_endpoint(custom_domains=["*.example.com"])
```
TLS certificates automatically generated and renewed.
## Performance
### Cold Starts
First request may experience cold start (few seconds). Modal keeps containers alive for subsequent requests.
### Scaling
- Autoscaling based on traffic
- Use `@modal.concurrent` for multiple requests per container
- Beyond concurrency limit, additional containers spin up
- Requests queue when at max containers
### Rate Limits
Default: 200 requests/second with 5-second burst multiplier
- Excess returns 429 status code
- Contact support to increase limits
### Size Limits
## Limits
- Request body: up to 4 GiB
- Response body: unlimited
- WebSocket messages: up to 2 MiB
- Rate limit: 200 requests/second (5-second burst for new accounts)
- Cold starts occur when no containers are active (use `min_containers` to avoid)