mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
Update Modal skill
This commit is contained in:
@@ -6,7 +6,7 @@
|
|||||||
},
|
},
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"description": "Claude scientific skills from K-Dense Inc",
|
"description": "Claude scientific skills from K-Dense Inc",
|
||||||
"version": "2.29.0"
|
"version": "2.30.0"
|
||||||
},
|
},
|
||||||
"plugins": [
|
"plugins": [
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -77,7 +77,7 @@
|
|||||||
|
|
||||||
### Data Management & Infrastructure
|
### Data Management & Infrastructure
|
||||||
- **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing
|
- **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing
|
||||||
- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container), persistent storage via Volumes for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs, parallel execution with `.map()` for batch processing, input concurrency for I/O-bound workloads, and resource configuration (CPU cores, memory, disk). Supports custom Docker images, integration with Hugging Face/Weights & Biases, FastAPI for web endpoints, and distributed training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, embeddings), GPU-accelerated training, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
|
- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200, B200+), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv (recommended)/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container, up to 1,536 GB VRAM), persistent storage via Volumes (v1 and v2) for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs (FastAPI, ASGI, WSGI, WebSockets), parallel execution with `.map()` for batch processing, input concurrency and dynamic batching for I/O-bound workloads, and resource configuration (CPU cores, memory, ephemeral disk up to 3 TiB). Supports custom Docker images, Micromamba/Conda environments, integration with Hugging Face/Weights & Biases, and distributed multi-GPU training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, speech, embeddings), GPU-accelerated training and fine-tuning, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, protein folding and computational biology, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
|
||||||
|
|
||||||
### Cheminformatics & Drug Discovery
|
### Cheminformatics & Drug Discovery
|
||||||
- **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines
|
- **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
---
|
---
|
||||||
name: modal
|
name: modal
|
||||||
description: Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling.
|
description: Cloud computing platform for running Python on GPUs and serverless infrastructure. Use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud. Use this skill whenever the user mentions Modal, serverless GPU compute, deploying ML models to the cloud, serving inference endpoints, running batch processing in the cloud, or needs to scale Python workloads beyond their local machine. Also use when the user wants to run code on H100s, A100s, or other cloud GPUs, or needs to create a web API for a model.
|
||||||
license: Apache-2.0 license
|
license: Apache-2.0
|
||||||
metadata:
|
metadata:
|
||||||
skill-author: K-Dense Inc.
|
skill-author: K-Dense Inc.
|
||||||
---
|
---
|
||||||
@@ -10,372 +10,391 @@ metadata:
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used.
|
Modal is a cloud platform for running Python code serverlessly, with a focus on AI/ML workloads. Key capabilities:
|
||||||
|
- **GPU compute** on demand (T4, L4, A10, L40S, A100, H100, H200, B200)
|
||||||
|
- **Serverless functions** with autoscaling from zero to thousands of containers
|
||||||
|
- **Custom container images** built entirely in Python code
|
||||||
|
- **Persistent storage** via Volumes for model weights and datasets
|
||||||
|
- **Web endpoints** for serving models and APIs
|
||||||
|
- **Scheduled jobs** via cron or fixed intervals
|
||||||
|
- **Sub-second cold starts** for low-latency inference
|
||||||
|
|
||||||
Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits.
|
Everything in Modal is defined as code — no YAML, no Dockerfiles required (though both are supported).
|
||||||
|
|
||||||
## When to Use This Skill
|
## When to Use This Skill
|
||||||
|
|
||||||
Use Modal for:
|
Use this skill when:
|
||||||
- Deploying and serving ML models (LLMs, image generation, embedding models)
|
- Deploy or serve AI/ML models in the cloud
|
||||||
- Running GPU-accelerated computation (training, inference, rendering)
|
- Run GPU-accelerated computations (training, inference, fine-tuning)
|
||||||
- Batch processing large datasets in parallel
|
- Create serverless web APIs or endpoints
|
||||||
- Scheduling compute-intensive jobs (daily data processing, model training)
|
- Scale batch processing jobs in parallel
|
||||||
- Building serverless APIs that need automatic scaling
|
- Schedule recurring tasks (data pipelines, retraining, scraping)
|
||||||
- Scientific computing requiring distributed compute or specialized hardware
|
- Need persistent cloud storage for model weights or datasets
|
||||||
|
- Want to run code in custom container environments
|
||||||
|
- Build job queues or async task processing systems
|
||||||
|
|
||||||
## Authentication and Setup
|
## Installation and Authentication
|
||||||
|
|
||||||
Modal requires authentication via API token.
|
### Install
|
||||||
|
|
||||||
### Initial Setup
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Install Modal
|
uv pip install modal
|
||||||
uv uv pip install modal
|
|
||||||
|
|
||||||
# Authenticate (opens browser for login)
|
|
||||||
modal token new
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations.
|
### Authenticate
|
||||||
|
|
||||||
### Verify Setup
|
```bash
|
||||||
|
modal setup
|
||||||
|
```
|
||||||
|
|
||||||
|
This opens a browser for authentication. For CI/CD or headless environments, set environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export MODAL_TOKEN_ID=<your-token-id>
|
||||||
|
export MODAL_TOKEN_SECRET=<your-token-secret>
|
||||||
|
```
|
||||||
|
|
||||||
|
Generate tokens at https://modal.com/settings
|
||||||
|
|
||||||
|
Modal offers a free tier with $30/month in credits.
|
||||||
|
|
||||||
|
**Reference**: See `references/getting-started.md` for detailed setup and first app walkthrough.
|
||||||
|
|
||||||
|
## Core Concepts
|
||||||
|
|
||||||
|
### App and Functions
|
||||||
|
|
||||||
|
A Modal `App` groups related functions. Functions decorated with `@app.function()` run remotely in the cloud:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import modal
|
import modal
|
||||||
|
|
||||||
app = modal.App("test-app")
|
app = modal.App("my-app")
|
||||||
|
|
||||||
@app.function()
|
@app.function()
|
||||||
def hello():
|
def square(x):
|
||||||
print("Modal is working!")
|
return x ** 2
|
||||||
```
|
|
||||||
|
|
||||||
Run with: `modal run script.py`
|
|
||||||
|
|
||||||
## Core Capabilities
|
|
||||||
|
|
||||||
Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.
|
|
||||||
|
|
||||||
### 1. Define Container Images
|
|
||||||
|
|
||||||
Specify dependencies and environment for functions using Modal Images.
|
|
||||||
|
|
||||||
```python
|
|
||||||
import modal
|
|
||||||
|
|
||||||
# Basic image with Python packages
|
|
||||||
image = (
|
|
||||||
modal.Image.debian_slim(python_version="3.12")
|
|
||||||
.uv_pip_install("torch", "transformers", "numpy")
|
|
||||||
)
|
|
||||||
|
|
||||||
app = modal.App("ml-app", image=image)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Common patterns:**
|
|
||||||
- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
|
|
||||||
- Install system packages: `.apt_install("ffmpeg", "git")`
|
|
||||||
- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
|
|
||||||
- Add local code: `.add_local_python_source("my_module")`
|
|
||||||
|
|
||||||
See `references/images.md` for comprehensive image building documentation.
|
|
||||||
|
|
||||||
### 2. Create Functions
|
|
||||||
|
|
||||||
Define functions that run in the cloud with the `@app.function()` decorator.
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
def process_data(file_path: str):
|
|
||||||
import pandas as pd
|
|
||||||
df = pd.read_csv(file_path)
|
|
||||||
return df.describe()
|
|
||||||
```
|
|
||||||
|
|
||||||
**Call functions:**
|
|
||||||
```python
|
|
||||||
# From local entrypoint
|
|
||||||
@app.local_entrypoint()
|
@app.local_entrypoint()
|
||||||
def main():
|
def main():
|
||||||
result = process_data.remote("data.csv")
|
# .remote() runs in the cloud
|
||||||
print(result)
|
print(square.remote(42))
|
||||||
```
|
```
|
||||||
|
|
||||||
Run with: `modal run script.py`
|
Run with `modal run script.py`. Deploy with `modal deploy script.py`.
|
||||||
|
|
||||||
See `references/functions.md` for function patterns, deployment, and parameter handling.
|
**Reference**: See `references/functions.md` for lifecycle hooks, classes, `.map()`, `.spawn()`, and more.
|
||||||
|
|
||||||
### 3. Request GPUs
|
### Container Images
|
||||||
|
|
||||||
Attach GPUs to functions for accelerated computation.
|
Modal builds container images from Python code. The recommended package installer is `uv`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.uv_pip_install("torch==2.8.0", "transformers", "accelerate")
|
||||||
|
.apt_install("git")
|
||||||
|
)
|
||||||
|
|
||||||
|
@app.function(image=image)
|
||||||
|
def inference(prompt):
|
||||||
|
from transformers import pipeline
|
||||||
|
pipe = pipeline("text-generation", model="meta-llama/Llama-3-8B")
|
||||||
|
return pipe(prompt)
|
||||||
|
```
|
||||||
|
|
||||||
|
Key image methods:
|
||||||
|
- `.uv_pip_install()` — Install Python packages with uv (recommended)
|
||||||
|
- `.pip_install()` — Install with pip (fallback)
|
||||||
|
- `.apt_install()` — Install system packages
|
||||||
|
- `.run_commands()` — Run shell commands during build
|
||||||
|
- `.run_function()` — Run Python during build (e.g., download model weights)
|
||||||
|
- `.add_local_python_source()` — Add local modules
|
||||||
|
- `.env()` — Set environment variables
|
||||||
|
|
||||||
|
**Reference**: See `references/images.md` for Dockerfiles, micromamba, caching, GPU build steps.
|
||||||
|
|
||||||
|
### GPU Compute
|
||||||
|
|
||||||
|
Request GPUs via the `gpu` parameter:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(gpu="H100")
|
@app.function(gpu="H100")
|
||||||
def train_model():
|
def train_model():
|
||||||
import torch
|
import torch
|
||||||
assert torch.cuda.is_available()
|
device = torch.device("cuda")
|
||||||
# GPU-accelerated code here
|
# GPU training code here
|
||||||
|
|
||||||
|
# Multiple GPUs
|
||||||
|
@app.function(gpu="H100:4")
|
||||||
|
def distributed_training():
|
||||||
|
...
|
||||||
|
|
||||||
|
# GPU fallback chain
|
||||||
|
@app.function(gpu=["H100", "A100-80GB", "A100-40GB"])
|
||||||
|
def flexible_inference():
|
||||||
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
**Available GPU types:**
|
Available GPUs: T4, L4, A10, L40S, A100-40GB, A100-80GB, H100, H200, B200, B200+
|
||||||
- `T4`, `L4` - Cost-effective inference
|
|
||||||
- `A10`, `A100`, `A100-80GB` - Standard training/inference
|
|
||||||
- `L40S` - Excellent cost/performance balance (48GB)
|
|
||||||
- `H100`, `H200` - High-performance training
|
|
||||||
- `B200` - Flagship performance (most powerful)
|
|
||||||
|
|
||||||
**Request multiple GPUs:**
|
- Up to 8 GPUs per container (except A10: up to 4)
|
||||||
```python
|
- L40S is recommended for inference (cost/performance balance, 48 GB VRAM)
|
||||||
@app.function(gpu="H100:8") # 8x H100 GPUs
|
- H100/A100 can be auto-upgraded to H200/A100-80GB at no extra cost
|
||||||
def train_large_model():
|
- Use `gpu="H100!"` to prevent auto-upgrade
|
||||||
pass
|
|
||||||
```
|
|
||||||
|
|
||||||
See `references/gpu.md` for GPU selection guidance, CUDA setup, and multi-GPU configuration.
|
**Reference**: See `references/gpu.md` for GPU selection guidance and multi-GPU training.
|
||||||
|
|
||||||
### 4. Configure Resources
|
### Volumes (Persistent Storage)
|
||||||
|
|
||||||
Request CPU cores, memory, and disk for functions.
|
Volumes provide distributed, persistent file storage:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
vol = modal.Volume.from_name("model-weights", create_if_missing=True)
|
||||||
cpu=8.0, # 8 physical cores
|
|
||||||
memory=32768, # 32 GiB RAM
|
@app.function(volumes={"/data": vol})
|
||||||
ephemeral_disk=10240 # 10 GiB disk
|
def save_model():
|
||||||
)
|
# Write to the mounted path
|
||||||
def memory_intensive_task():
|
with open("/data/model.pt", "wb") as f:
|
||||||
pass
|
torch.save(model.state_dict(), f)
|
||||||
|
|
||||||
|
@app.function(volumes={"/data": vol})
|
||||||
|
def load_model():
|
||||||
|
model.load_state_dict(torch.load("/data/model.pt"))
|
||||||
```
|
```
|
||||||
|
|
||||||
Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.
|
- Optimized for write-once, read-many workloads (model weights, datasets)
|
||||||
|
- CLI access: `modal volume ls`, `modal volume put`, `modal volume get`
|
||||||
|
- Background auto-commits every few seconds
|
||||||
|
|
||||||
See `references/resources.md` for resource limits and billing details.
|
**Reference**: See `references/volumes.md` for v2 volumes, concurrent writes, and best practices.
|
||||||
|
|
||||||
### 5. Scale Automatically
|
### Secrets
|
||||||
|
|
||||||
Modal autoscales functions from zero to thousands of containers based on demand.
|
Securely pass credentials to functions:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(secrets=[modal.Secret.from_name("my-api-keys")])
|
||||||
|
def call_api():
|
||||||
|
import os
|
||||||
|
api_key = os.environ["API_KEY"]
|
||||||
|
# Use the key
|
||||||
|
```
|
||||||
|
|
||||||
|
Create secrets via CLI: `modal secret create my-api-keys API_KEY=sk-xxx`
|
||||||
|
|
||||||
|
Or from a `.env` file: `modal.Secret.from_dotenv()`
|
||||||
|
|
||||||
|
**Reference**: See `references/secrets.md` for dashboard setup, multiple secrets, and templates.
|
||||||
|
|
||||||
|
### Web Endpoints
|
||||||
|
|
||||||
|
Serve models and APIs as web endpoints:
|
||||||
|
|
||||||
**Process inputs in parallel:**
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
def analyze_sample(sample_id: int):
|
@modal.fastapi_endpoint()
|
||||||
# Process single sample
|
def predict(text: str):
|
||||||
return result
|
return {"result": model.predict(text)}
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
sample_ids = range(1000)
|
|
||||||
# Automatically parallelized across containers
|
|
||||||
results = list(analyze_sample.map(sample_ids))
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Configure autoscaling:**
|
- `modal serve script.py` — Development with hot reload and temporary URL
|
||||||
|
- `modal deploy script.py` — Production deployment with permanent URL
|
||||||
|
- Supports FastAPI, ASGI (Starlette, FastHTML), WSGI (Flask, Django), WebSockets
|
||||||
|
- Request bodies up to 4 GiB, unlimited response size
|
||||||
|
|
||||||
|
**Reference**: See `references/web-endpoints.md` for ASGI/WSGI apps, streaming, auth, and WebSockets.
|
||||||
|
|
||||||
|
### Scheduled Jobs
|
||||||
|
|
||||||
|
Run functions on a schedule:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(schedule=modal.Cron("0 9 * * *")) # Daily at 9 AM UTC
|
||||||
|
def daily_pipeline():
|
||||||
|
# ETL, retraining, scraping, etc.
|
||||||
|
...
|
||||||
|
|
||||||
|
@app.function(schedule=modal.Period(hours=6))
|
||||||
|
def periodic_check():
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Deploy with `modal deploy script.py` to activate the schedule.
|
||||||
|
|
||||||
|
- `modal.Cron("...")` — Standard cron syntax, stable across deploys
|
||||||
|
- `modal.Period(hours=N)` — Fixed interval, resets on redeploy
|
||||||
|
- Monitor runs in the Modal dashboard
|
||||||
|
|
||||||
|
**Reference**: See `references/scheduled-jobs.md` for cron syntax and management.
|
||||||
|
|
||||||
|
### Scaling and Concurrency
|
||||||
|
|
||||||
|
Modal autoscales containers automatically. Configure limits:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
@app.function(
|
||||||
max_containers=100, # Upper limit
|
max_containers=100, # Upper limit
|
||||||
min_containers=2, # Keep warm
|
min_containers=2, # Keep warm for low latency
|
||||||
buffer_containers=5 # Idle buffer for bursts
|
buffer_containers=5, # Reserve capacity
|
||||||
|
scaledown_window=300, # Idle seconds before shutdown
|
||||||
)
|
)
|
||||||
def inference():
|
def process(data):
|
||||||
pass
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
See `references/scaling.md` for autoscaling configuration, concurrency, and scaling limits.
|
Process inputs in parallel with `.map()`:
|
||||||
|
|
||||||
### 6. Store Data Persistently
|
|
||||||
|
|
||||||
Use Volumes for persistent storage across function invocations.
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
volume = modal.Volume.from_name("my-data", create_if_missing=True)
|
results = list(process.map([item1, item2, item3, ...]))
|
||||||
|
|
||||||
@app.function(volumes={"/data": volume})
|
|
||||||
def save_results(data):
|
|
||||||
with open("/data/results.txt", "w") as f:
|
|
||||||
f.write(data)
|
|
||||||
volume.commit() # Persist changes
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Volumes persist data between runs, store model weights, cache datasets, and share data between functions.
|
Enable concurrent request handling per container:
|
||||||
|
|
||||||
See `references/volumes.md` for volume management, commits, and caching patterns.
|
|
||||||
|
|
||||||
### 7. Manage Secrets
|
|
||||||
|
|
||||||
Store API keys and credentials securely using Modal Secrets.
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(secrets=[modal.Secret.from_name("huggingface")])
|
|
||||||
def download_model():
|
|
||||||
import os
|
|
||||||
token = os.environ["HF_TOKEN"]
|
|
||||||
# Use token for authentication
|
|
||||||
```
|
|
||||||
|
|
||||||
**Create secrets in Modal dashboard or via CLI:**
|
|
||||||
```bash
|
|
||||||
modal secret create my-secret KEY=value API_TOKEN=xyz
|
|
||||||
```
|
|
||||||
|
|
||||||
See `references/secrets.md` for secret management and authentication patterns.
|
|
||||||
|
|
||||||
### 8. Deploy Web Endpoints
|
|
||||||
|
|
||||||
Serve HTTP endpoints, APIs, and webhooks with `@modal.web_endpoint()`.
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
@modal.web_endpoint(method="POST")
|
@modal.concurrent(max_inputs=10)
|
||||||
def predict(data: dict):
|
async def handle_request(req):
|
||||||
# Process request
|
...
|
||||||
result = model.predict(data["input"])
|
|
||||||
return {"prediction": result}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Deploy with:**
|
**Reference**: See `references/scaling.md` for `.map()`, `.starmap()`, `.spawn()`, and limits.
|
||||||
```bash
|
|
||||||
modal deploy script.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Modal provides HTTPS URL for the endpoint.
|
### Resource Configuration
|
||||||
|
|
||||||
See `references/web-endpoints.md` for FastAPI integration, streaming, authentication, and WebSocket support.
|
|
||||||
|
|
||||||
### 9. Schedule Jobs
|
|
||||||
|
|
||||||
Run functions on a schedule with cron expressions.
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM
|
@app.function(
|
||||||
def daily_backup():
|
cpu=4.0, # Physical cores (not vCPUs)
|
||||||
# Backup data
|
memory=16384, # MiB
|
||||||
pass
|
ephemeral_disk=51200, # MiB (up to 3 TiB)
|
||||||
|
timeout=3600, # Seconds
|
||||||
@app.function(schedule=modal.Period(hours=4)) # Every 4 hours
|
)
|
||||||
def refresh_cache():
|
def heavy_computation():
|
||||||
# Update cache
|
...
|
||||||
pass
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Scheduled functions run automatically without manual invocation.
|
Defaults: 0.125 CPU cores, 128 MiB memory. Billed on max(request, usage).
|
||||||
|
|
||||||
See `references/scheduled-jobs.md` for cron syntax, timezone configuration, and monitoring.
|
**Reference**: See `references/resources.md` for limits and billing details.
|
||||||
|
|
||||||
## Common Workflows
|
## Classes with Lifecycle Hooks
|
||||||
|
|
||||||
### Deploy ML Model for Inference
|
For stateful workloads (e.g., loading a model once and serving many requests):
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.cls(gpu="L40S", image=image)
|
||||||
|
class Predictor:
|
||||||
|
@modal.enter()
|
||||||
|
def load_model(self):
|
||||||
|
self.model = load_heavy_model() # Runs once on container start
|
||||||
|
|
||||||
|
@modal.method()
|
||||||
|
def predict(self, text: str):
|
||||||
|
return self.model(text)
|
||||||
|
|
||||||
|
@modal.exit()
|
||||||
|
def cleanup(self):
|
||||||
|
... # Runs on container shutdown
|
||||||
|
```
|
||||||
|
|
||||||
|
Call with: `Predictor().predict.remote("hello")`
|
||||||
|
|
||||||
|
## Common Workflow Patterns
|
||||||
|
|
||||||
|
### GPU Model Inference Service
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import modal
|
import modal
|
||||||
|
|
||||||
# Define dependencies
|
app = modal.App("llm-service")
|
||||||
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
|
|
||||||
app = modal.App("llm-inference", image=image)
|
|
||||||
|
|
||||||
# Download model at build time
|
image = (
|
||||||
@app.function()
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
def download_model():
|
.uv_pip_install("vllm")
|
||||||
from transformers import AutoModel
|
)
|
||||||
AutoModel.from_pretrained("bert-base-uncased")
|
|
||||||
|
|
||||||
# Serve model
|
@app.cls(gpu="H100", image=image, min_containers=1)
|
||||||
@app.cls(gpu="L40S")
|
class LLMService:
|
||||||
class Model:
|
|
||||||
@modal.enter()
|
@modal.enter()
|
||||||
def load_model(self):
|
def load(self):
|
||||||
from transformers import pipeline
|
from vllm import LLM
|
||||||
self.pipe = pipeline("text-classification", device="cuda")
|
self.llm = LLM(model="meta-llama/Llama-3-70B")
|
||||||
|
|
||||||
@modal.method()
|
@modal.method()
|
||||||
def predict(self, text: str):
|
@modal.fastapi_endpoint(method="POST")
|
||||||
return self.pipe(text)
|
def generate(self, prompt: str, max_tokens: int = 256):
|
||||||
|
outputs = self.llm.generate([prompt], max_tokens=max_tokens)
|
||||||
@app.local_entrypoint()
|
return {"text": outputs[0].outputs[0].text}
|
||||||
def main():
|
|
||||||
model = Model()
|
|
||||||
result = model.predict.remote("Modal is great!")
|
|
||||||
print(result)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Batch Process Large Dataset
|
### Batch Processing Pipeline
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(cpu=2.0, memory=4096)
|
app = modal.App("batch-pipeline")
|
||||||
def process_file(file_path: str):
|
vol = modal.Volume.from_name("pipeline-data", create_if_missing=True)
|
||||||
|
|
||||||
|
@app.function(volumes={"/data": vol}, cpu=4.0, memory=8192)
|
||||||
|
def process_chunk(chunk_id: int):
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
df = pd.read_csv(file_path)
|
df = pd.read_parquet(f"/data/input/chunk_{chunk_id}.parquet")
|
||||||
# Process data
|
result = heavy_transform(df)
|
||||||
return df.shape[0]
|
result.to_parquet(f"/data/output/chunk_{chunk_id}.parquet")
|
||||||
|
return len(result)
|
||||||
|
|
||||||
@app.local_entrypoint()
|
@app.local_entrypoint()
|
||||||
def main():
|
def main():
|
||||||
files = ["file1.csv", "file2.csv", ...] # 1000s of files
|
chunk_ids = list(range(100))
|
||||||
# Automatically parallelized across containers
|
results = list(process_chunk.map(chunk_ids))
|
||||||
for count in process_file.map(files):
|
print(f"Processed {sum(results)} total rows")
|
||||||
print(f"Processed {count} rows")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Train Model on GPU
|
### Scheduled Data Pipeline
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
app = modal.App("etl-pipeline")
|
||||||
|
|
||||||
@app.function(
|
@app.function(
|
||||||
gpu="A100:2", # 2x A100 GPUs
|
schedule=modal.Cron("0 */6 * * *"), # Every 6 hours
|
||||||
timeout=3600 # 1 hour timeout
|
secrets=[modal.Secret.from_name("db-credentials")],
|
||||||
)
|
)
|
||||||
def train_model(config: dict):
|
def etl_job():
|
||||||
import torch
|
import os
|
||||||
# Multi-GPU training code
|
db_url = os.environ["DATABASE_URL"]
|
||||||
model = create_model(config)
|
# Extract, transform, load
|
||||||
train(model)
|
...
|
||||||
return metrics
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Reference Documentation
|
## CLI Reference
|
||||||
|
|
||||||
Detailed documentation for specific features:
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `modal setup` | Authenticate with Modal |
|
||||||
|
| `modal run script.py` | Run a script's local entrypoint |
|
||||||
|
| `modal serve script.py` | Dev server with hot reload |
|
||||||
|
| `modal deploy script.py` | Deploy to production |
|
||||||
|
| `modal volume ls <name>` | List files in a volume |
|
||||||
|
| `modal volume put <name> <file>` | Upload file to volume |
|
||||||
|
| `modal volume get <name> <file>` | Download file from volume |
|
||||||
|
| `modal secret create <name> K=V` | Create a secret |
|
||||||
|
| `modal secret list` | List secrets |
|
||||||
|
| `modal app list` | List deployed apps |
|
||||||
|
| `modal app stop <name>` | Stop a deployed app |
|
||||||
|
|
||||||
- **`references/getting-started.md`** - Authentication, setup, basic concepts
|
## Reference Files
|
||||||
- **`references/images.md`** - Image building, dependencies, Dockerfiles
|
|
||||||
- **`references/functions.md`** - Function patterns, deployment, parameters
|
|
||||||
- **`references/gpu.md`** - GPU types, CUDA, multi-GPU configuration
|
|
||||||
- **`references/resources.md`** - CPU, memory, disk management
|
|
||||||
- **`references/scaling.md`** - Autoscaling, parallel execution, concurrency
|
|
||||||
- **`references/volumes.md`** - Persistent storage, data management
|
|
||||||
- **`references/secrets.md`** - Environment variables, authentication
|
|
||||||
- **`references/web-endpoints.md`** - APIs, webhooks, endpoints
|
|
||||||
- **`references/scheduled-jobs.md`** - Cron jobs, periodic tasks
|
|
||||||
- **`references/examples.md`** - Common patterns for scientific computing
|
|
||||||
|
|
||||||
## Best Practices
|
Detailed documentation for each topic:
|
||||||
|
|
||||||
1. **Pin dependencies** in `.uv_pip_install()` for reproducible builds
|
- `references/getting-started.md` — Installation, authentication, first app
|
||||||
2. **Use appropriate GPU types** - L40S for inference, H100/A100 for training
|
- `references/functions.md` — Functions, classes, lifecycle hooks, remote execution
|
||||||
3. **Leverage caching** - Use Volumes for model weights and datasets
|
- `references/images.md` — Container images, package installation, caching
|
||||||
4. **Configure autoscaling** - Set `max_containers` and `min_containers` based on workload
|
- `references/gpu.md` — GPU types, selection, multi-GPU, training
|
||||||
5. **Import packages in function body** if not available locally
|
- `references/volumes.md` — Persistent storage, file management, v2 volumes
|
||||||
6. **Use `.map()` for parallel processing** instead of sequential loops
|
- `references/secrets.md` — Credentials, environment variables, dotenv
|
||||||
7. **Store secrets securely** - Never hardcode API keys
|
- `references/web-endpoints.md` — FastAPI, ASGI/WSGI, streaming, auth, WebSockets
|
||||||
8. **Monitor costs** - Check Modal dashboard for usage and billing
|
- `references/scheduled-jobs.md` — Cron, periodic schedules, management
|
||||||
|
- `references/scaling.md` — Autoscaling, concurrency, .map(), limits
|
||||||
## Troubleshooting
|
- `references/resources.md` — CPU, memory, disk, timeout configuration
|
||||||
|
- `references/examples.md` — Common use cases and patterns
|
||||||
**"Module not found" errors:**
|
- `references/api_reference.md` — Key API classes and methods
|
||||||
- Add packages to image with `.uv_pip_install("package-name")`
|
|
||||||
- Import packages inside function body if not available locally
|
|
||||||
|
|
||||||
**GPU not detected:**
|
|
||||||
- Verify GPU specification: `@app.function(gpu="A100")`
|
|
||||||
- Check CUDA availability: `torch.cuda.is_available()`
|
|
||||||
|
|
||||||
**Function timeout:**
|
|
||||||
- Increase timeout: `@app.function(timeout=3600)`
|
|
||||||
- Default timeout is 5 minutes
|
|
||||||
|
|
||||||
**Volume changes not persisting:**
|
|
||||||
- Call `volume.commit()` after writing files
|
|
||||||
- Verify volume mounted correctly in function decorator
|
|
||||||
|
|
||||||
For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.
|
|
||||||
|
|
||||||
|
Read these files when detailed information is needed beyond this overview.
|
||||||
|
|||||||
@@ -1,34 +1,187 @@
|
|||||||
# Reference Documentation for Modal
|
# Modal API Reference
|
||||||
|
|
||||||
This is a placeholder for detailed reference documentation.
|
## Core Classes
|
||||||
Replace with actual reference content or delete if not needed.
|
|
||||||
|
|
||||||
Example real reference docs from other skills:
|
### modal.App
|
||||||
- product-management/references/communication.md - Comprehensive guide for status updates
|
|
||||||
- product-management/references/context_building.md - Deep-dive on gathering context
|
|
||||||
- bigquery/references/ - API references and query examples
|
|
||||||
|
|
||||||
## When Reference Docs Are Useful
|
The main unit of deployment. Groups related functions.
|
||||||
|
|
||||||
Reference docs are ideal for:
|
```python
|
||||||
- Comprehensive API documentation
|
app = modal.App("my-app")
|
||||||
- Detailed workflow guides
|
```
|
||||||
- Complex multi-step processes
|
|
||||||
- Information too lengthy for main SKILL.md
|
|
||||||
- Content that's only needed for specific use cases
|
|
||||||
|
|
||||||
## Structure Suggestions
|
| Method | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `app.function(**kwargs)` | Decorator to register a function |
|
||||||
|
| `app.cls(**kwargs)` | Decorator to register a class |
|
||||||
|
| `app.local_entrypoint()` | Decorator for local entry point |
|
||||||
|
|
||||||
### API Reference Example
|
### modal.Function
|
||||||
- Overview
|
|
||||||
- Authentication
|
|
||||||
- Endpoints with examples
|
|
||||||
- Error codes
|
|
||||||
- Rate limits
|
|
||||||
|
|
||||||
### Workflow Guide Example
|
A serverless function backed by an autoscaling container pool.
|
||||||
- Prerequisites
|
|
||||||
- Step-by-step instructions
|
| Method | Description |
|
||||||
- Common patterns
|
|--------|-------------|
|
||||||
- Troubleshooting
|
| `.remote(*args)` | Execute in the cloud (sync) |
|
||||||
- Best practices
|
| `.local(*args)` | Execute locally |
|
||||||
|
| `.spawn(*args)` | Execute async, returns `FunctionCall` |
|
||||||
|
| `.map(inputs)` | Parallel execution over inputs |
|
||||||
|
| `.starmap(inputs)` | Parallel execution with multiple args |
|
||||||
|
| `.from_name(app, fn)` | Reference a deployed function |
|
||||||
|
| `.update_autoscaler(**kwargs)` | Dynamic scaling update |
|
||||||
|
|
||||||
|
### modal.Cls
|
||||||
|
|
||||||
|
A serverless class with lifecycle hooks.
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.cls(gpu="L40S")
|
||||||
|
class MyClass:
|
||||||
|
@modal.enter()
|
||||||
|
def setup(self): ...
|
||||||
|
|
||||||
|
@modal.method()
|
||||||
|
def run(self, data): ...
|
||||||
|
|
||||||
|
@modal.exit()
|
||||||
|
def cleanup(self): ...
|
||||||
|
```
|
||||||
|
|
||||||
|
| Decorator | Description |
|
||||||
|
|-----------|-------------|
|
||||||
|
| `@modal.enter()` | Container startup hook |
|
||||||
|
| `@modal.exit()` | Container shutdown hook |
|
||||||
|
| `@modal.method()` | Expose as callable method |
|
||||||
|
| `@modal.parameter()` | Class-level parameter |
|
||||||
|
|
||||||
|
## Image
|
||||||
|
|
||||||
|
### modal.Image
|
||||||
|
|
||||||
|
Defines the container environment.
|
||||||
|
|
||||||
|
| Method | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `.debian_slim(python_version=)` | Debian base image |
|
||||||
|
| `.from_registry(tag)` | Docker Hub image |
|
||||||
|
| `.from_dockerfile(path)` | Build from Dockerfile |
|
||||||
|
| `.micromamba(python_version=)` | Conda/mamba base |
|
||||||
|
| `.uv_pip_install(*pkgs)` | Install with uv (recommended) |
|
||||||
|
| `.pip_install(*pkgs)` | Install with pip |
|
||||||
|
| `.pip_install_from_requirements(path)` | Install from file |
|
||||||
|
| `.apt_install(*pkgs)` | Install system packages |
|
||||||
|
| `.run_commands(*cmds)` | Run shell commands |
|
||||||
|
| `.run_function(fn)` | Run Python during build |
|
||||||
|
| `.add_local_dir(local, remote)` | Add directory |
|
||||||
|
| `.add_local_file(local, remote)` | Add single file |
|
||||||
|
| `.add_local_python_source(module)` | Add Python module |
|
||||||
|
| `.env(dict)` | Set environment variables |
|
||||||
|
| `.imports()` | Context manager for remote imports |
|
||||||
|
|
||||||
|
## Storage
|
||||||
|
|
||||||
|
### modal.Volume
|
||||||
|
|
||||||
|
Distributed persistent file storage.
|
||||||
|
|
||||||
|
```python
|
||||||
|
vol = modal.Volume.from_name("name", create_if_missing=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
| Method | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `.from_name(name)` | Reference or create a volume |
|
||||||
|
| `.commit()` | Force immediate commit |
|
||||||
|
| `.reload()` | Refresh to see other containers' writes |
|
||||||
|
|
||||||
|
Mount: `@app.function(volumes={"/path": vol})`
|
||||||
|
|
||||||
|
### modal.NetworkFileSystem
|
||||||
|
|
||||||
|
Legacy shared storage (superseded by Volume).
|
||||||
|
|
||||||
|
## Secrets
|
||||||
|
|
||||||
|
### modal.Secret
|
||||||
|
|
||||||
|
Secure credential injection.
|
||||||
|
|
||||||
|
| Method | Description |
|
||||||
|
|--------|-------------|
|
||||||
|
| `.from_name(name)` | Reference a named secret |
|
||||||
|
| `.from_dict(dict)` | Create inline (dev only) |
|
||||||
|
| `.from_dotenv()` | Load from .env file |
|
||||||
|
|
||||||
|
Usage: `@app.function(secrets=[modal.Secret.from_name("x")])`
|
||||||
|
|
||||||
|
Access in function: `os.environ["KEY"]`
|
||||||
|
|
||||||
|
## Scheduling
|
||||||
|
|
||||||
|
### modal.Cron
|
||||||
|
|
||||||
|
```python
|
||||||
|
schedule = modal.Cron("0 9 * * *") # Cron syntax
|
||||||
|
```
|
||||||
|
|
||||||
|
### modal.Period
|
||||||
|
|
||||||
|
```python
|
||||||
|
schedule = modal.Period(hours=6) # Fixed interval
|
||||||
|
```
|
||||||
|
|
||||||
|
Usage: `@app.function(schedule=modal.Cron("..."))`
|
||||||
|
|
||||||
|
## Web
|
||||||
|
|
||||||
|
### Decorators
|
||||||
|
|
||||||
|
| Decorator | Description |
|
||||||
|
|-----------|-------------|
|
||||||
|
| `@modal.fastapi_endpoint()` | Simple FastAPI endpoint |
|
||||||
|
| `@modal.asgi_app()` | Full ASGI app (FastAPI, Starlette) |
|
||||||
|
| `@modal.wsgi_app()` | Full WSGI app (Flask, Django) |
|
||||||
|
| `@modal.web_server(port=)` | Custom web server |
|
||||||
|
|
||||||
|
### Function Modifiers
|
||||||
|
|
||||||
|
| Decorator | Description |
|
||||||
|
|-----------|-------------|
|
||||||
|
| `@modal.concurrent(max_inputs=)` | Handle multiple inputs per container |
|
||||||
|
| `@modal.batched(max_batch_size=, wait_ms=)` | Dynamic input batching |
|
||||||
|
|
||||||
|
## GPU Strings
|
||||||
|
|
||||||
|
| String | GPU |
|
||||||
|
|--------|-----|
|
||||||
|
| `"T4"` | NVIDIA T4 16GB |
|
||||||
|
| `"L4"` | NVIDIA L4 24GB |
|
||||||
|
| `"A10"` | NVIDIA A10 24GB |
|
||||||
|
| `"L40S"` | NVIDIA L40S 48GB |
|
||||||
|
| `"A100-40GB"` | NVIDIA A100 40GB |
|
||||||
|
| `"A100-80GB"` | NVIDIA A100 80GB |
|
||||||
|
| `"H100"` | NVIDIA H100 80GB |
|
||||||
|
| `"H100!"` | H100 (no auto-upgrade) |
|
||||||
|
| `"H200"` | NVIDIA H200 141GB |
|
||||||
|
| `"B200"` | NVIDIA B200 192GB |
|
||||||
|
| `"B200+"` | B200 or B300, B200 price |
|
||||||
|
| `"H100:4"` | 4x H100 |
|
||||||
|
|
||||||
|
## CLI Commands
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `modal setup` | Authenticate |
|
||||||
|
| `modal run <file>` | Run local entrypoint |
|
||||||
|
| `modal serve <file>` | Dev server with hot reload |
|
||||||
|
| `modal deploy <file>` | Production deployment |
|
||||||
|
| `modal app list` | List deployed apps |
|
||||||
|
| `modal app stop <name>` | Stop an app |
|
||||||
|
| `modal volume create <name>` | Create volume |
|
||||||
|
| `modal volume ls <name>` | List volume files |
|
||||||
|
| `modal volume put <name> <file>` | Upload to volume |
|
||||||
|
| `modal volume get <name> <file>` | Download from volume |
|
||||||
|
| `modal secret create <name> K=V` | Create secret |
|
||||||
|
| `modal secret list` | List secrets |
|
||||||
|
| `modal secret delete <name>` | Delete secret |
|
||||||
|
| `modal token set` | Set auth token |
|
||||||
|
|||||||
@@ -1,433 +1,266 @@
|
|||||||
# Common Patterns for Scientific Computing
|
# Modal Common Examples
|
||||||
|
|
||||||
## Machine Learning Model Inference
|
## LLM Inference Service (vLLM)
|
||||||
|
|
||||||
### Basic Model Serving
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import modal
|
import modal
|
||||||
|
|
||||||
app = modal.App("ml-inference")
|
app = modal.App("vllm-service")
|
||||||
|
|
||||||
image = (
|
image = (
|
||||||
modal.Image.debian_slim()
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
.uv_pip_install("torch", "transformers")
|
.uv_pip_install("vllm>=0.6.0")
|
||||||
)
|
)
|
||||||
|
|
||||||
@app.cls(
|
@app.cls(gpu="H100", image=image, min_containers=1)
|
||||||
image=image,
|
class LLMService:
|
||||||
gpu="L40S",
|
|
||||||
)
|
|
||||||
class Model:
|
|
||||||
@modal.enter()
|
@modal.enter()
|
||||||
def load_model(self):
|
def load(self):
|
||||||
from transformers import AutoModel, AutoTokenizer
|
from vllm import LLM
|
||||||
self.model = AutoModel.from_pretrained("bert-base-uncased")
|
self.llm = LLM(model="meta-llama/Llama-3-70B-Instruct")
|
||||||
self.tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
|
||||||
|
|
||||||
@modal.method()
|
@modal.method()
|
||||||
def predict(self, text: str):
|
def generate(self, prompt: str, max_tokens: int = 512) -> str:
|
||||||
inputs = self.tokenizer(text, return_tensors="pt")
|
from vllm import SamplingParams
|
||||||
outputs = self.model(**inputs)
|
params = SamplingParams(max_tokens=max_tokens, temperature=0.7)
|
||||||
return outputs.last_hidden_state.mean(dim=1).tolist()
|
outputs = self.llm.generate([prompt], params)
|
||||||
|
return outputs[0].outputs[0].text
|
||||||
|
|
||||||
@app.local_entrypoint()
|
@modal.fastapi_endpoint(method="POST")
|
||||||
def main():
|
def api(self, request: dict):
|
||||||
model = Model()
|
text = self.generate(request["prompt"], request.get("max_tokens", 512))
|
||||||
result = model.predict.remote("Hello world")
|
return {"text": text}
|
||||||
print(result)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Model Serving with Volume
|
## Image Generation (Flux)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
volume = modal.Volume.from_name("models", create_if_missing=True)
|
import modal
|
||||||
MODEL_PATH = "/models"
|
|
||||||
|
|
||||||
@app.cls(
|
app = modal.App("image-gen")
|
||||||
image=image,
|
|
||||||
gpu="A100",
|
image = (
|
||||||
volumes={MODEL_PATH: volume}
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.uv_pip_install("diffusers", "torch", "transformers", "accelerate")
|
||||||
)
|
)
|
||||||
class ModelServer:
|
|
||||||
|
vol = modal.Volume.from_name("flux-weights", create_if_missing=True)
|
||||||
|
|
||||||
|
@app.cls(gpu="L40S", image=image, volumes={"/models": vol})
|
||||||
|
class ImageGenerator:
|
||||||
@modal.enter()
|
@modal.enter()
|
||||||
def load(self):
|
def load(self):
|
||||||
import torch
|
import torch
|
||||||
self.model = torch.load(f"{MODEL_PATH}/model.pt")
|
from diffusers import FluxPipeline
|
||||||
self.model.eval()
|
self.pipe = FluxPipeline.from_pretrained(
|
||||||
|
"black-forest-labs/FLUX.1-schnell",
|
||||||
|
torch_dtype=torch.bfloat16,
|
||||||
|
cache_dir="/models",
|
||||||
|
).to("cuda")
|
||||||
|
|
||||||
@modal.method()
|
@modal.method()
|
||||||
def infer(self, data):
|
def generate(self, prompt: str) -> bytes:
|
||||||
import torch
|
image = self.pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
|
||||||
with torch.no_grad():
|
import io
|
||||||
return self.model(torch.tensor(data)).tolist()
|
buf = io.BytesIO()
|
||||||
|
image.save(buf, format="PNG")
|
||||||
|
return buf.getvalue()
|
||||||
```
|
```
|
||||||
|
|
||||||
## Batch Processing
|
## Speech Transcription (Whisper)
|
||||||
|
|
||||||
### Parallel Data Processing
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
import modal
|
||||||
image=modal.Image.debian_slim().uv_pip_install("pandas", "numpy"),
|
|
||||||
cpu=2.0,
|
app = modal.App("transcription")
|
||||||
memory=8192
|
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.apt_install("ffmpeg")
|
||||||
|
.uv_pip_install("openai-whisper", "torch")
|
||||||
)
|
)
|
||||||
def process_batch(batch_id: int):
|
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
# Load batch
|
@app.cls(gpu="T4", image=image)
|
||||||
df = pd.read_csv(f"s3://bucket/batch_{batch_id}.csv")
|
class Transcriber:
|
||||||
|
|
||||||
# Process
|
|
||||||
result = df.apply(lambda row: complex_calculation(row), axis=1)
|
|
||||||
|
|
||||||
# Save result
|
|
||||||
result.to_csv(f"s3://bucket/results_{batch_id}.csv")
|
|
||||||
|
|
||||||
return batch_id
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
# Process 100 batches in parallel
|
|
||||||
results = list(process_batch.map(range(100)))
|
|
||||||
print(f"Processed {len(results)} batches")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Batch Processing with Progress
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
def process_item(item_id: int):
|
|
||||||
# Expensive processing
|
|
||||||
result = compute_something(item_id)
|
|
||||||
return result
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
items = list(range(1000))
|
|
||||||
|
|
||||||
print(f"Processing {len(items)} items...")
|
|
||||||
results = []
|
|
||||||
for i, result in enumerate(process_item.map(items)):
|
|
||||||
results.append(result)
|
|
||||||
if (i + 1) % 100 == 0:
|
|
||||||
print(f"Completed {i + 1}/{len(items)}")
|
|
||||||
|
|
||||||
print("All items processed!")
|
|
||||||
```
|
|
||||||
|
|
||||||
## Data Analysis Pipeline
|
|
||||||
|
|
||||||
### ETL Pipeline
|
|
||||||
|
|
||||||
```python
|
|
||||||
volume = modal.Volume.from_name("data-pipeline")
|
|
||||||
DATA_PATH = "/data"
|
|
||||||
|
|
||||||
@app.function(
|
|
||||||
image=modal.Image.debian_slim().uv_pip_install("pandas", "polars"),
|
|
||||||
volumes={DATA_PATH: volume},
|
|
||||||
cpu=4.0,
|
|
||||||
memory=16384
|
|
||||||
)
|
|
||||||
def extract_transform_load():
|
|
||||||
import polars as pl
|
|
||||||
|
|
||||||
# Extract
|
|
||||||
raw_data = pl.read_csv(f"{DATA_PATH}/raw/*.csv")
|
|
||||||
|
|
||||||
# Transform
|
|
||||||
transformed = (
|
|
||||||
raw_data
|
|
||||||
.filter(pl.col("value") > 0)
|
|
||||||
.group_by("category")
|
|
||||||
.agg([
|
|
||||||
pl.col("value").mean().alias("avg_value"),
|
|
||||||
pl.col("value").sum().alias("total_value")
|
|
||||||
])
|
|
||||||
)
|
|
||||||
|
|
||||||
# Load
|
|
||||||
transformed.write_parquet(f"{DATA_PATH}/processed/data.parquet")
|
|
||||||
volume.commit()
|
|
||||||
|
|
||||||
return transformed.shape
|
|
||||||
|
|
||||||
@app.function(schedule=modal.Cron("0 2 * * *"))
|
|
||||||
def daily_pipeline():
|
|
||||||
result = extract_transform_load.remote()
|
|
||||||
print(f"Processed data shape: {result}")
|
|
||||||
```
|
|
||||||
|
|
||||||
## GPU-Accelerated Computing
|
|
||||||
|
|
||||||
### Distributed Training
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(
|
|
||||||
gpu="A100:2",
|
|
||||||
image=modal.Image.debian_slim().uv_pip_install("torch", "accelerate"),
|
|
||||||
timeout=7200,
|
|
||||||
)
|
|
||||||
def train_model():
|
|
||||||
import torch
|
|
||||||
from torch.nn.parallel import DataParallel
|
|
||||||
|
|
||||||
# Load data
|
|
||||||
train_loader = get_data_loader()
|
|
||||||
|
|
||||||
# Initialize model
|
|
||||||
model = MyModel()
|
|
||||||
model = DataParallel(model)
|
|
||||||
model = model.cuda()
|
|
||||||
|
|
||||||
# Train
|
|
||||||
optimizer = torch.optim.Adam(model.parameters())
|
|
||||||
for epoch in range(10):
|
|
||||||
for batch in train_loader:
|
|
||||||
loss = train_step(model, batch, optimizer)
|
|
||||||
print(f"Epoch {epoch}, Loss: {loss}")
|
|
||||||
|
|
||||||
return "Training complete"
|
|
||||||
```
|
|
||||||
|
|
||||||
### GPU Batch Inference
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(
|
|
||||||
gpu="L40S",
|
|
||||||
image=modal.Image.debian_slim().uv_pip_install("torch", "transformers")
|
|
||||||
)
|
|
||||||
def batch_inference(texts: list[str]):
|
|
||||||
from transformers import pipeline
|
|
||||||
|
|
||||||
classifier = pipeline("sentiment-analysis", device=0)
|
|
||||||
results = classifier(texts, batch_size=32)
|
|
||||||
|
|
||||||
return results
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
# Process 10,000 texts
|
|
||||||
texts = load_texts()
|
|
||||||
|
|
||||||
# Split into chunks of 100
|
|
||||||
chunks = [texts[i:i+100] for i in range(0, len(texts), 100)]
|
|
||||||
|
|
||||||
# Process in parallel on multiple GPUs
|
|
||||||
all_results = []
|
|
||||||
for results in batch_inference.map(chunks):
|
|
||||||
all_results.extend(results)
|
|
||||||
|
|
||||||
print(f"Processed {len(all_results)} texts")
|
|
||||||
```
|
|
||||||
|
|
||||||
## Scientific Computing
|
|
||||||
|
|
||||||
### Molecular Dynamics Simulation
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(
|
|
||||||
image=modal.Image.debian_slim().apt_install("openmpi-bin").uv_pip_install("mpi4py", "numpy"),
|
|
||||||
cpu=16.0,
|
|
||||||
memory=65536,
|
|
||||||
timeout=7200,
|
|
||||||
)
|
|
||||||
def run_simulation(config: dict):
|
|
||||||
import numpy as np
|
|
||||||
|
|
||||||
# Initialize system
|
|
||||||
positions = initialize_positions(config["n_particles"])
|
|
||||||
velocities = initialize_velocities(config["temperature"])
|
|
||||||
|
|
||||||
# Run MD steps
|
|
||||||
for step in range(config["n_steps"]):
|
|
||||||
forces = compute_forces(positions)
|
|
||||||
velocities += forces * config["dt"]
|
|
||||||
positions += velocities * config["dt"]
|
|
||||||
|
|
||||||
if step % 1000 == 0:
|
|
||||||
energy = compute_energy(positions, velocities)
|
|
||||||
print(f"Step {step}, Energy: {energy}")
|
|
||||||
|
|
||||||
return positions, velocities
|
|
||||||
```
|
|
||||||
|
|
||||||
### Distributed Monte Carlo
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(cpu=2.0)
|
|
||||||
def monte_carlo_trial(trial_id: int, n_samples: int):
|
|
||||||
import random
|
|
||||||
|
|
||||||
count = sum(1 for _ in range(n_samples)
|
|
||||||
if random.random()**2 + random.random()**2 <= 1)
|
|
||||||
|
|
||||||
return count
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def estimate_pi():
|
|
||||||
n_trials = 100
|
|
||||||
n_samples_per_trial = 1_000_000
|
|
||||||
|
|
||||||
# Run trials in parallel
|
|
||||||
results = list(monte_carlo_trial.map(
|
|
||||||
range(n_trials),
|
|
||||||
[n_samples_per_trial] * n_trials
|
|
||||||
))
|
|
||||||
|
|
||||||
total_count = sum(results)
|
|
||||||
total_samples = n_trials * n_samples_per_trial
|
|
||||||
|
|
||||||
pi_estimate = 4 * total_count / total_samples
|
|
||||||
print(f"Estimated π = {pi_estimate}")
|
|
||||||
```
|
|
||||||
|
|
||||||
## Data Processing with Volumes
|
|
||||||
|
|
||||||
### Image Processing Pipeline
|
|
||||||
|
|
||||||
```python
|
|
||||||
volume = modal.Volume.from_name("images")
|
|
||||||
IMAGE_PATH = "/images"
|
|
||||||
|
|
||||||
@app.function(
|
|
||||||
image=modal.Image.debian_slim().uv_pip_install("Pillow", "numpy"),
|
|
||||||
volumes={IMAGE_PATH: volume}
|
|
||||||
)
|
|
||||||
def process_image(filename: str):
|
|
||||||
from PIL import Image
|
|
||||||
import numpy as np
|
|
||||||
|
|
||||||
# Load image
|
|
||||||
img = Image.open(f"{IMAGE_PATH}/raw/{filename}")
|
|
||||||
|
|
||||||
# Process
|
|
||||||
img_array = np.array(img)
|
|
||||||
processed = apply_filters(img_array)
|
|
||||||
|
|
||||||
# Save
|
|
||||||
result_img = Image.fromarray(processed)
|
|
||||||
result_img.save(f"{IMAGE_PATH}/processed/{filename}")
|
|
||||||
|
|
||||||
return filename
|
|
||||||
|
|
||||||
@app.function(volumes={IMAGE_PATH: volume})
|
|
||||||
def process_all_images():
|
|
||||||
import os
|
|
||||||
|
|
||||||
# Get all images
|
|
||||||
filenames = os.listdir(f"{IMAGE_PATH}/raw")
|
|
||||||
|
|
||||||
# Process in parallel
|
|
||||||
results = list(process_image.map(filenames))
|
|
||||||
|
|
||||||
volume.commit()
|
|
||||||
return f"Processed {len(results)} images"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Web API for Scientific Computing
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = modal.Image.debian_slim().uv_pip_install("fastapi[standard]", "numpy", "scipy")
|
|
||||||
|
|
||||||
@app.function(image=image)
|
|
||||||
@modal.fastapi_endpoint(method="POST")
|
|
||||||
def compute_statistics(data: dict):
|
|
||||||
import numpy as np
|
|
||||||
from scipy import stats
|
|
||||||
|
|
||||||
values = np.array(data["values"])
|
|
||||||
|
|
||||||
return {
|
|
||||||
"mean": float(np.mean(values)),
|
|
||||||
"median": float(np.median(values)),
|
|
||||||
"std": float(np.std(values)),
|
|
||||||
"skewness": float(stats.skew(values)),
|
|
||||||
"kurtosis": float(stats.kurtosis(values))
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Scheduled Data Collection
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(
|
|
||||||
schedule=modal.Cron("*/30 * * * *"), # Every 30 minutes
|
|
||||||
secrets=[modal.Secret.from_name("api-keys")],
|
|
||||||
volumes={"/data": modal.Volume.from_name("sensor-data")}
|
|
||||||
)
|
|
||||||
def collect_sensor_data():
|
|
||||||
import requests
|
|
||||||
import json
|
|
||||||
from datetime import datetime
|
|
||||||
|
|
||||||
# Fetch from API
|
|
||||||
response = requests.get(
|
|
||||||
"https://api.example.com/sensors",
|
|
||||||
headers={"Authorization": f"Bearer {os.environ['API_KEY']}"}
|
|
||||||
)
|
|
||||||
|
|
||||||
data = response.json()
|
|
||||||
|
|
||||||
# Save with timestamp
|
|
||||||
timestamp = datetime.now().isoformat()
|
|
||||||
with open(f"/data/{timestamp}.json", "w") as f:
|
|
||||||
json.dump(data, f)
|
|
||||||
|
|
||||||
volume.commit()
|
|
||||||
|
|
||||||
return f"Collected {len(data)} sensor readings"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
### Use Classes for Stateful Workloads
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.cls(gpu="A100")
|
|
||||||
class ModelService:
|
|
||||||
@modal.enter()
|
@modal.enter()
|
||||||
def setup(self):
|
def load(self):
|
||||||
# Load once, reuse across requests
|
import whisper
|
||||||
self.model = load_heavy_model()
|
self.model = whisper.load_model("large-v3")
|
||||||
|
|
||||||
@modal.method()
|
@modal.method()
|
||||||
def predict(self, x):
|
def transcribe(self, audio_path: str) -> dict:
|
||||||
return self.model(x)
|
return self.model.transcribe(audio_path)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Batch Similar Workloads
|
## Batch Data Processing
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
import modal
|
||||||
def process_many(items: list):
|
|
||||||
# More efficient than processing one at a time
|
app = modal.App("batch-processor")
|
||||||
return [process(item) for item in items]
|
|
||||||
|
image = modal.Image.debian_slim().uv_pip_install("pandas", "pyarrow")
|
||||||
|
vol = modal.Volume.from_name("batch-data", create_if_missing=True)
|
||||||
|
|
||||||
|
@app.function(image=image, volumes={"/data": vol}, cpu=4.0, memory=8192)
|
||||||
|
def process_chunk(chunk_id: int) -> dict:
|
||||||
|
import pandas as pd
|
||||||
|
df = pd.read_parquet(f"/data/input/chunk_{chunk_id:04d}.parquet")
|
||||||
|
result = df.groupby("category").agg({"value": ["sum", "mean", "count"]})
|
||||||
|
result.to_parquet(f"/data/output/result_{chunk_id:04d}.parquet")
|
||||||
|
return {"chunk_id": chunk_id, "rows": len(df)}
|
||||||
|
|
||||||
|
@app.local_entrypoint()
|
||||||
|
def main():
|
||||||
|
chunk_ids = list(range(500))
|
||||||
|
results = list(process_chunk.map(chunk_ids))
|
||||||
|
total = sum(r["rows"] for r in results)
|
||||||
|
print(f"Processed {total} total rows across {len(results)} chunks")
|
||||||
```
|
```
|
||||||
|
|
||||||
### Use Volumes for Large Datasets
|
## Web Scraping at Scale
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Store large datasets in volumes, not in image
|
import modal
|
||||||
volume = modal.Volume.from_name("dataset")
|
|
||||||
|
|
||||||
@app.function(volumes={"/data": volume})
|
app = modal.App("scraper")
|
||||||
def train():
|
|
||||||
data = load_from_volume("/data/training.parquet")
|
image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
|
||||||
model = train_model(data)
|
|
||||||
|
@app.function(image=image, retries=3, timeout=60)
|
||||||
|
def scrape_url(url: str) -> dict:
|
||||||
|
import httpx
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
response = httpx.get(url, follow_redirects=True, timeout=30)
|
||||||
|
soup = BeautifulSoup(response.text, "html.parser")
|
||||||
|
return {
|
||||||
|
"url": url,
|
||||||
|
"title": soup.title.string if soup.title else None,
|
||||||
|
"text": soup.get_text()[:5000],
|
||||||
|
}
|
||||||
|
|
||||||
|
@app.local_entrypoint()
|
||||||
|
def main():
|
||||||
|
urls = ["https://example.com", "https://example.org"] # Your URL list
|
||||||
|
results = list(scrape_url.map(urls))
|
||||||
|
for r in results:
|
||||||
|
print(f"{r['url']}: {r['title']}")
|
||||||
```
|
```
|
||||||
|
|
||||||
### Profile Before Scaling to GPUs
|
## Protein Structure Prediction
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Test on CPU first
|
import modal
|
||||||
@app.function(cpu=4.0)
|
|
||||||
def test_pipeline():
|
|
||||||
...
|
|
||||||
|
|
||||||
# Then scale to GPU if needed
|
app = modal.App("protein-folding")
|
||||||
@app.function(gpu="A100")
|
|
||||||
def gpu_pipeline():
|
image = (
|
||||||
...
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.uv_pip_install("chai-lab")
|
||||||
|
)
|
||||||
|
|
||||||
|
vol = modal.Volume.from_name("protein-data", create_if_missing=True)
|
||||||
|
|
||||||
|
@app.function(gpu="A100-80GB", image=image, volumes={"/data": vol}, timeout=3600)
|
||||||
|
def fold_protein(sequence: str) -> str:
|
||||||
|
from chai_lab.chai1 import run_inference
|
||||||
|
output = run_inference(
|
||||||
|
fasta_file=write_fasta(sequence, "/data/input.fasta"),
|
||||||
|
output_dir="/data/output/",
|
||||||
|
)
|
||||||
|
return str(output)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Scheduled ETL Pipeline
|
||||||
|
|
||||||
|
```python
|
||||||
|
import modal
|
||||||
|
|
||||||
|
app = modal.App("etl")
|
||||||
|
|
||||||
|
image = modal.Image.debian_slim().uv_pip_install("pandas", "sqlalchemy", "psycopg2-binary")
|
||||||
|
|
||||||
|
@app.function(
|
||||||
|
image=image,
|
||||||
|
schedule=modal.Cron("0 3 * * *"), # 3 AM UTC daily
|
||||||
|
secrets=[modal.Secret.from_name("database-creds")],
|
||||||
|
timeout=7200,
|
||||||
|
)
|
||||||
|
def daily_etl():
|
||||||
|
import os
|
||||||
|
import pandas as pd
|
||||||
|
from sqlalchemy import create_engine
|
||||||
|
|
||||||
|
source = create_engine(os.environ["SOURCE_DB"])
|
||||||
|
dest = create_engine(os.environ["DEST_DB"])
|
||||||
|
|
||||||
|
df = pd.read_sql("SELECT * FROM events WHERE date = CURRENT_DATE - 1", source)
|
||||||
|
df = transform(df)
|
||||||
|
df.to_sql("daily_summary", dest, if_exists="append", index=False)
|
||||||
|
print(f"Loaded {len(df)} rows")
|
||||||
|
```
|
||||||
|
|
||||||
|
## FastAPI with GPU Model
|
||||||
|
|
||||||
|
```python
|
||||||
|
import modal
|
||||||
|
|
||||||
|
app = modal.App("api-with-gpu")
|
||||||
|
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.uv_pip_install("fastapi", "sentence-transformers", "torch")
|
||||||
|
)
|
||||||
|
|
||||||
|
@app.cls(gpu="L40S", image=image, min_containers=1)
|
||||||
|
class EmbeddingService:
|
||||||
|
@modal.enter()
|
||||||
|
def load(self):
|
||||||
|
from sentence_transformers import SentenceTransformer
|
||||||
|
self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
|
||||||
|
|
||||||
|
@modal.asgi_app()
|
||||||
|
def serve(self):
|
||||||
|
from fastapi import FastAPI
|
||||||
|
api = FastAPI()
|
||||||
|
|
||||||
|
@api.post("/embed")
|
||||||
|
async def embed(request: dict):
|
||||||
|
embeddings = self.model.encode(request["texts"])
|
||||||
|
return {"embeddings": embeddings.tolist()}
|
||||||
|
|
||||||
|
@api.get("/health")
|
||||||
|
async def health():
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
return api
|
||||||
|
```
|
||||||
|
|
||||||
|
## Document OCR Job Queue
|
||||||
|
|
||||||
|
```python
|
||||||
|
import modal
|
||||||
|
|
||||||
|
app = modal.App("ocr-queue")
|
||||||
|
|
||||||
|
image = modal.Image.debian_slim().uv_pip_install("pytesseract", "Pillow").apt_install("tesseract-ocr")
|
||||||
|
vol = modal.Volume.from_name("ocr-data", create_if_missing=True)
|
||||||
|
|
||||||
|
@app.function(image=image, volumes={"/data": vol})
|
||||||
|
def ocr_page(image_path: str) -> str:
|
||||||
|
import pytesseract
|
||||||
|
from PIL import Image
|
||||||
|
img = Image.open(image_path)
|
||||||
|
return pytesseract.image_to_string(img)
|
||||||
|
|
||||||
|
@app.function(volumes={"/data": vol})
|
||||||
|
def process_document(doc_id: str):
|
||||||
|
import os
|
||||||
|
pages = sorted(os.listdir(f"/data/docs/{doc_id}/"))
|
||||||
|
paths = [f"/data/docs/{doc_id}/{p}" for p in pages]
|
||||||
|
texts = list(ocr_page.map(paths))
|
||||||
|
full_text = "\n\n".join(texts)
|
||||||
|
with open(f"/data/results/{doc_id}.txt", "w") as f:
|
||||||
|
f.write(full_text)
|
||||||
|
return {"doc_id": doc_id, "pages": len(texts)}
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -1,274 +1,260 @@
|
|||||||
# Modal Functions
|
# Modal Functions and Classes
|
||||||
|
|
||||||
## Basic Function Definition
|
## Table of Contents
|
||||||
|
|
||||||
Decorate Python functions with `@app.function()`:
|
- [Functions](#functions)
|
||||||
|
- [Remote Execution](#remote-execution)
|
||||||
|
- [Classes with Lifecycle Hooks](#classes-with-lifecycle-hooks)
|
||||||
|
- [Parallel Execution](#parallel-execution)
|
||||||
|
- [Async Functions](#async-functions)
|
||||||
|
- [Local Entrypoints](#local-entrypoints)
|
||||||
|
- [Generators](#generators)
|
||||||
|
|
||||||
|
## Functions
|
||||||
|
|
||||||
|
### Basic Function
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import modal
|
import modal
|
||||||
|
|
||||||
app = modal.App(name="my-app")
|
app = modal.App("my-app")
|
||||||
|
|
||||||
@app.function()
|
@app.function()
|
||||||
def my_function():
|
def compute(x: int, y: int) -> int:
|
||||||
print("Hello from Modal!")
|
return x + y
|
||||||
return "result"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Calling Functions
|
### Function Parameters
|
||||||
|
|
||||||
### Remote Execution
|
The `@app.function()` decorator accepts:
|
||||||
|
|
||||||
Call `.remote()` to run on Modal:
|
| Parameter | Type | Description |
|
||||||
|
|-----------|------|-------------|
|
||||||
|
| `image` | `Image` | Container image |
|
||||||
|
| `gpu` | `str` | GPU type (e.g., `"H100"`, `"A100:2"`) |
|
||||||
|
| `cpu` | `float` | CPU cores |
|
||||||
|
| `memory` | `int` | Memory in MiB |
|
||||||
|
| `timeout` | `int` | Max execution time in seconds |
|
||||||
|
| `secrets` | `list[Secret]` | Secrets to inject |
|
||||||
|
| `volumes` | `dict[str, Volume]` | Volumes to mount |
|
||||||
|
| `schedule` | `Schedule` | Cron or periodic schedule |
|
||||||
|
| `max_containers` | `int` | Max container count |
|
||||||
|
| `min_containers` | `int` | Minimum warm containers |
|
||||||
|
| `retries` | `int` | Retry count on failure |
|
||||||
|
| `concurrency_limit` | `int` | Max concurrent inputs |
|
||||||
|
| `ephemeral_disk` | `int` | Disk in MiB |
|
||||||
|
|
||||||
|
## Remote Execution
|
||||||
|
|
||||||
|
### `.remote()` — Synchronous Call
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.local_entrypoint()
|
result = compute.remote(3, 4) # Runs in the cloud, blocks until done
|
||||||
def main():
|
|
||||||
result = my_function.remote()
|
|
||||||
print(result)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Local Execution
|
### `.local()` — Local Execution
|
||||||
|
|
||||||
Call `.local()` to run locally (useful for testing):
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
result = my_function.local()
|
result = compute.local(3, 4) # Runs locally (for testing)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Function Parameters
|
### `.spawn()` — Async Fire-and-Forget
|
||||||
|
|
||||||
Functions accept standard Python arguments:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
call = compute.spawn(3, 4) # Returns immediately
|
||||||
def process(x: int, y: str):
|
# ... do other work ...
|
||||||
return f"{y}: {x * 2}"
|
result = call.get() # Retrieve result later
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
result = process.remote(42, "answer")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Deployment
|
`.spawn()` supports up to 1 million pending inputs.
|
||||||
|
|
||||||
### Ephemeral Apps
|
## Classes with Lifecycle Hooks
|
||||||
|
|
||||||
Run temporarily:
|
Use `@app.cls()` for stateful workloads where you want to load resources once:
|
||||||
```bash
|
|
||||||
modal run script.py
|
|
||||||
```
|
|
||||||
|
|
||||||
### Deployed Apps
|
|
||||||
|
|
||||||
Deploy persistently:
|
|
||||||
```bash
|
|
||||||
modal deploy script.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Access deployed functions from other code:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
f = modal.Function.from_name("my-app", "my_function")
|
@app.cls(gpu="L40S", image=image)
|
||||||
result = f.remote(args)
|
class Model:
|
||||||
|
@modal.enter()
|
||||||
|
def setup(self):
|
||||||
|
"""Runs once when the container starts."""
|
||||||
|
import torch
|
||||||
|
self.model = torch.load("/weights/model.pt")
|
||||||
|
self.model.eval()
|
||||||
|
|
||||||
|
@modal.method()
|
||||||
|
def predict(self, text: str) -> dict:
|
||||||
|
"""Callable remotely."""
|
||||||
|
return self.model(text)
|
||||||
|
|
||||||
|
@modal.exit()
|
||||||
|
def teardown(self):
|
||||||
|
"""Runs when the container shuts down."""
|
||||||
|
cleanup_resources()
|
||||||
```
|
```
|
||||||
|
|
||||||
## Entrypoints
|
### Lifecycle Decorators
|
||||||
|
|
||||||
### Local Entrypoint
|
| Decorator | When It Runs |
|
||||||
|
|-----------|-------------|
|
||||||
|
| `@modal.enter()` | Once on container startup, before any inputs |
|
||||||
|
| `@modal.method()` | For each remote call |
|
||||||
|
| `@modal.exit()` | On container shutdown |
|
||||||
|
|
||||||
Code that runs on local machine:
|
### Calling Class Methods
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.local_entrypoint()
|
# Create instance and call method
|
||||||
def main():
|
model = Model()
|
||||||
result = my_function.remote()
|
result = model.predict.remote("Hello world")
|
||||||
print(result)
|
|
||||||
|
# Parallel calls
|
||||||
|
results = list(model.predict.map(["text1", "text2", "text3"]))
|
||||||
```
|
```
|
||||||
|
|
||||||
### Remote Entrypoint
|
### Parameterized Classes
|
||||||
|
|
||||||
Use `@app.function()` without local_entrypoint - runs entirely on Modal:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.cls()
|
||||||
def train_model():
|
class Worker:
|
||||||
# All code runs in Modal
|
model_name: str = modal.parameter()
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Invoke with:
|
@modal.enter()
|
||||||
```bash
|
def load(self):
|
||||||
modal run script.py::app.train_model
|
self.model = load_model(self.model_name)
|
||||||
```
|
|
||||||
|
|
||||||
## Argument Parsing
|
@modal.method()
|
||||||
|
def run(self, data):
|
||||||
|
return self.model(data)
|
||||||
|
|
||||||
Entrypoints with primitive type arguments get automatic CLI parsing:
|
# Different model instances autoscale independently
|
||||||
|
gpt = Worker(model_name="gpt-4")
|
||||||
```python
|
llama = Worker(model_name="llama-3")
|
||||||
@app.local_entrypoint()
|
|
||||||
def main(foo: int, bar: str):
|
|
||||||
some_function.remote(foo, bar)
|
|
||||||
```
|
|
||||||
|
|
||||||
Run with:
|
|
||||||
```bash
|
|
||||||
modal run script.py --foo 1 --bar "hello"
|
|
||||||
```
|
|
||||||
|
|
||||||
For custom parsing, accept variable-length arguments:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import argparse
|
|
||||||
|
|
||||||
@app.function()
|
|
||||||
def train(*arglist):
|
|
||||||
parser = argparse.ArgumentParser()
|
|
||||||
parser.add_argument("--foo", type=int)
|
|
||||||
args = parser.parse_args(args=arglist)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Function Configuration
|
|
||||||
|
|
||||||
Common parameters:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(
|
|
||||||
image=my_image, # Custom environment
|
|
||||||
gpu="A100", # GPU type
|
|
||||||
cpu=2.0, # CPU cores
|
|
||||||
memory=4096, # Memory in MB
|
|
||||||
timeout=3600, # Timeout in seconds
|
|
||||||
retries=3, # Number of retries
|
|
||||||
secrets=[my_secret], # Environment secrets
|
|
||||||
volumes={"/data": vol}, # Persistent storage
|
|
||||||
)
|
|
||||||
def my_function():
|
|
||||||
...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Parallel Execution
|
## Parallel Execution
|
||||||
|
|
||||||
### Map
|
### `.map()` — Parallel Processing
|
||||||
|
|
||||||
Run function on multiple inputs in parallel:
|
Process multiple inputs across containers:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
def evaluate_model(x):
|
def process(item):
|
||||||
return x ** 2
|
return heavy_computation(item)
|
||||||
|
|
||||||
@app.local_entrypoint()
|
@app.local_entrypoint()
|
||||||
def main():
|
def main():
|
||||||
inputs = list(range(100))
|
items = list(range(1000))
|
||||||
for result in evaluate_model.map(inputs):
|
results = list(process.map(items))
|
||||||
print(result)
|
print(f"Processed {len(results)} items")
|
||||||
```
|
```
|
||||||
|
|
||||||
### Starmap
|
- Results are returned in the same order as inputs
|
||||||
|
- Modal autoscales containers to handle the workload
|
||||||
|
- Use `return_exceptions=True` to collect errors instead of raising
|
||||||
|
|
||||||
For functions with multiple arguments:
|
### `.starmap()` — Multi-Argument Parallel
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
def add(a, b):
|
def add(x, y):
|
||||||
return a + b
|
return x + y
|
||||||
|
|
||||||
@app.local_entrypoint()
|
results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
|
||||||
def main():
|
# [3, 7, 11]
|
||||||
results = list(add.starmap([(1, 2), (3, 4)]))
|
|
||||||
# [3, 7]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Exception Handling
|
### `.map()` with `order_outputs=False`
|
||||||
|
|
||||||
|
For faster throughput when order doesn't matter:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
results = my_func.map(
|
for result in process.map(items, order_outputs=False):
|
||||||
range(3),
|
handle(result) # Results arrive as they complete
|
||||||
return_exceptions=True,
|
|
||||||
wrap_returned_exceptions=False
|
|
||||||
)
|
|
||||||
# [0, 1, Exception('error')]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Async Functions
|
## Async Functions
|
||||||
|
|
||||||
Define async functions:
|
Modal supports async/await natively:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
async def async_function(x: int):
|
async def fetch_data(url: str) -> str:
|
||||||
await asyncio.sleep(1)
|
import httpx
|
||||||
return x * 2
|
async with httpx.AsyncClient() as client:
|
||||||
|
response = await client.get(url)
|
||||||
@app.local_entrypoint()
|
return response.text
|
||||||
async def main():
|
|
||||||
result = await async_function.remote.aio(42)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Generator Functions
|
Async functions are especially useful with `@modal.concurrent()` for handling multiple requests per container.
|
||||||
|
|
||||||
Return iterators for streaming results:
|
## Local Entrypoints
|
||||||
|
|
||||||
|
The `@app.local_entrypoint()` runs on your machine and orchestrates remote calls:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.local_entrypoint()
|
||||||
|
def main():
|
||||||
|
# This code runs locally
|
||||||
|
data = load_local_data()
|
||||||
|
|
||||||
|
# These calls run in the cloud
|
||||||
|
results = list(process.map(data))
|
||||||
|
|
||||||
|
# Back to local
|
||||||
|
save_results(results)
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also define multiple entrypoints and select by function name:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
modal run script.py::train
|
||||||
|
modal run script.py::evaluate
|
||||||
|
```
|
||||||
|
|
||||||
|
## Generators
|
||||||
|
|
||||||
|
Functions can yield results as they're produced:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
def generate_data():
|
def generate_data():
|
||||||
for i in range(10):
|
for i in range(100):
|
||||||
yield i
|
yield process(i)
|
||||||
|
|
||||||
@app.local_entrypoint()
|
@app.local_entrypoint()
|
||||||
def main():
|
def main():
|
||||||
for value in generate_data.remote_gen():
|
for result in generate_data.remote_gen():
|
||||||
print(value)
|
print(result)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Spawning Functions
|
## Retries
|
||||||
|
|
||||||
Submit functions for background execution:
|
Configure automatic retries on failure:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function(retries=3)
|
||||||
def process_job(data):
|
def flaky_operation():
|
||||||
# Long-running job
|
...
|
||||||
return result
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
# Spawn without waiting
|
|
||||||
call = process_job.spawn(data)
|
|
||||||
|
|
||||||
# Get result later
|
|
||||||
result = call.get(timeout=60)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Programmatic Execution
|
For more control, use `modal.Retries`:
|
||||||
|
|
||||||
Run apps programmatically:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
def main():
|
@app.function(retries=modal.Retries(max_retries=3, backoff_coefficient=2.0))
|
||||||
with modal.enable_output():
|
def api_call():
|
||||||
with app.run():
|
...
|
||||||
result = some_function.remote()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Specifying Entrypoint
|
## Timeouts
|
||||||
|
|
||||||
With multiple functions, specify which to run:
|
Set maximum execution time:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function(timeout=3600) # 1 hour
|
||||||
def f():
|
def long_training():
|
||||||
print("Function f")
|
...
|
||||||
|
|
||||||
@app.function()
|
|
||||||
def g():
|
|
||||||
print("Function g")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Run specific function:
|
Default timeout is 300 seconds (5 minutes). Maximum is 86400 seconds (24 hours).
|
||||||
```bash
|
|
||||||
modal run script.py::app.f
|
|
||||||
modal run script.py::app.g
|
|
||||||
```
|
|
||||||
|
|||||||
@@ -1,92 +1,175 @@
|
|||||||
# Getting Started with Modal
|
# Modal Getting Started Guide
|
||||||
|
|
||||||
## Sign Up
|
## Installation
|
||||||
|
|
||||||
Sign up for free at https://modal.com and get $30/month of credits.
|
Install Modal using uv (recommended) or pip:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Recommended
|
||||||
|
uv pip install modal
|
||||||
|
|
||||||
|
# Alternative
|
||||||
|
pip install modal
|
||||||
|
```
|
||||||
|
|
||||||
## Authentication
|
## Authentication
|
||||||
|
|
||||||
Set up authentication using the Modal CLI:
|
### Interactive Setup
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
modal token new
|
modal setup
|
||||||
```
|
```
|
||||||
|
|
||||||
This creates credentials in `~/.modal.toml`. Alternatively, set environment variables:
|
This opens a browser for authentication and stores credentials locally.
|
||||||
- `MODAL_TOKEN_ID`
|
|
||||||
- `MODAL_TOKEN_SECRET`
|
|
||||||
|
|
||||||
## Basic Concepts
|
### Headless / CI/CD Setup
|
||||||
|
|
||||||
### Modal is Serverless
|
For environments without a browser, use token-based authentication:
|
||||||
|
|
||||||
Modal is a serverless platform - only pay for resources used and spin up containers on demand in seconds.
|
1. Generate tokens at https://modal.com/settings
|
||||||
|
2. Set environment variables:
|
||||||
|
|
||||||
### Core Components
|
```bash
|
||||||
|
export MODAL_TOKEN_ID=<your-token-id>
|
||||||
|
export MODAL_TOKEN_SECRET=<your-token-secret>
|
||||||
|
```
|
||||||
|
|
||||||
**App**: Represents an application running on Modal, grouping one or more Functions for atomic deployment.
|
Or use the CLI:
|
||||||
|
|
||||||
**Function**: Acts as an independent unit that scales up and down independently. No containers run (and no charges) when there are no live inputs.
|
```bash
|
||||||
|
modal token set --token-id <id> --token-secret <secret>
|
||||||
|
```
|
||||||
|
|
||||||
**Image**: The environment code runs in - a container snapshot with dependencies installed.
|
### Free Tier
|
||||||
|
|
||||||
## First Modal App
|
Modal provides $30/month in free credits. No credit card required for the free tier.
|
||||||
|
|
||||||
Create a file `hello_modal.py`:
|
## Your First App
|
||||||
|
|
||||||
|
### Hello World
|
||||||
|
|
||||||
|
Create a file `hello.py`:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import modal
|
import modal
|
||||||
|
|
||||||
app = modal.App(name="hello-modal")
|
app = modal.App("hello-world")
|
||||||
|
|
||||||
@app.function()
|
@app.function()
|
||||||
def hello():
|
def greet(name: str) -> str:
|
||||||
print("Hello from Modal!")
|
return f"Hello, {name}! This ran in the cloud."
|
||||||
return "success"
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
@app.local_entrypoint()
|
||||||
def main():
|
def main():
|
||||||
hello.remote()
|
result = greet.remote("World")
|
||||||
|
print(result)
|
||||||
```
|
```
|
||||||
|
|
||||||
Run with:
|
Run it:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
modal run hello_modal.py
|
modal run hello.py
|
||||||
```
|
```
|
||||||
|
|
||||||
## Running Apps
|
What happens:
|
||||||
|
1. Modal packages your code
|
||||||
|
2. Creates a container in the cloud
|
||||||
|
3. Executes `greet()` remotely
|
||||||
|
4. Returns the result to your local machine
|
||||||
|
|
||||||
### Ephemeral Apps (Development)
|
### Understanding the Flow
|
||||||
|
|
||||||
Run temporarily with `modal run`:
|
- `modal.App("name")` — Creates a named application
|
||||||
```bash
|
- `@app.function()` — Marks a function for remote execution
|
||||||
modal run script.py
|
- `@app.local_entrypoint()` — Defines the local entry point (runs on your machine)
|
||||||
|
- `.remote()` — Calls the function in the cloud
|
||||||
|
- `.local()` — Calls the function locally (for testing)
|
||||||
|
|
||||||
|
### Running Modes
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `modal run script.py` | Run the `@app.local_entrypoint()` function |
|
||||||
|
| `modal serve script.py` | Start a dev server with hot reload (for web endpoints) |
|
||||||
|
| `modal deploy script.py` | Deploy to production (persistent) |
|
||||||
|
|
||||||
|
### A Simple Web Scraper
|
||||||
|
|
||||||
|
```python
|
||||||
|
import modal
|
||||||
|
|
||||||
|
app = modal.App("web-scraper")
|
||||||
|
|
||||||
|
image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
|
||||||
|
|
||||||
|
@app.function(image=image)
|
||||||
|
def scrape(url: str) -> str:
|
||||||
|
import httpx
|
||||||
|
from bs4 import BeautifulSoup
|
||||||
|
|
||||||
|
response = httpx.get(url)
|
||||||
|
soup = BeautifulSoup(response.text, "html.parser")
|
||||||
|
return soup.get_text()[:1000]
|
||||||
|
|
||||||
|
@app.local_entrypoint()
|
||||||
|
def main():
|
||||||
|
result = scrape.remote("https://example.com")
|
||||||
|
print(result)
|
||||||
```
|
```
|
||||||
|
|
||||||
The app stops when the script exits. Use `--detach` to keep running after client exits.
|
### GPU-Accelerated Inference
|
||||||
|
|
||||||
### Deployed Apps (Production)
|
```python
|
||||||
|
import modal
|
||||||
|
|
||||||
Deploy persistently with `modal deploy`:
|
app = modal.App("gpu-inference")
|
||||||
```bash
|
|
||||||
modal deploy script.py
|
image = (
|
||||||
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.uv_pip_install("torch", "transformers", "accelerate")
|
||||||
|
)
|
||||||
|
|
||||||
|
@app.function(gpu="L40S", image=image)
|
||||||
|
def generate(prompt: str) -> str:
|
||||||
|
from transformers import pipeline
|
||||||
|
pipe = pipeline("text-generation", model="gpt2", device="cuda")
|
||||||
|
result = pipe(prompt, max_length=100)
|
||||||
|
return result[0]["generated_text"]
|
||||||
|
|
||||||
|
@app.local_entrypoint()
|
||||||
|
def main():
|
||||||
|
print(generate.remote("The future of AI is"))
|
||||||
```
|
```
|
||||||
|
|
||||||
View deployed apps at https://modal.com/apps or with:
|
## Project Structure
|
||||||
```bash
|
|
||||||
modal app list
|
Modal apps are typically single Python files, but can be organized into modules:
|
||||||
|
|
||||||
|
```
|
||||||
|
my-project/
|
||||||
|
├── app.py # Main app with @app.local_entrypoint()
|
||||||
|
├── inference.py # Inference functions
|
||||||
|
├── training.py # Training functions
|
||||||
|
└── common.py # Shared utilities
|
||||||
```
|
```
|
||||||
|
|
||||||
Stop deployed apps:
|
Use `modal.Image.add_local_python_source()` to include local modules in the container image.
|
||||||
```bash
|
|
||||||
modal app stop app-name
|
|
||||||
```
|
|
||||||
|
|
||||||
## Key Features
|
## Key Concepts Summary
|
||||||
|
|
||||||
- **Fast prototyping**: Write Python, run on GPUs in seconds
|
| Concept | What It Does |
|
||||||
- **Serverless APIs**: Create web endpoints with a decorator
|
|---------|-------------|
|
||||||
- **Scheduled jobs**: Run cron jobs in the cloud
|
| `App` | Groups related functions into a deployable unit |
|
||||||
- **GPU inference**: Access T4, L4, A10, A100, H100, H200, B200 GPUs
|
| `Function` | A serverless function backed by autoscaling containers |
|
||||||
- **Distributed volumes**: Persistent storage for ML models
|
| `Image` | Defines the container environment (packages, files) |
|
||||||
- **Sandboxes**: Secure containers for untrusted code
|
| `Volume` | Persistent distributed file storage |
|
||||||
|
| `Secret` | Secure credential injection |
|
||||||
|
| `Schedule` | Cron or periodic job scheduling |
|
||||||
|
| `gpu` | GPU type/count for the function |
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- See `functions.md` for advanced function patterns
|
||||||
|
- See `images.md` for custom container environments
|
||||||
|
- See `gpu.md` for GPU selection and configuration
|
||||||
|
- See `web-endpoints.md` for serving APIs
|
||||||
|
|||||||
@@ -1,168 +1,174 @@
|
|||||||
# GPU Acceleration on Modal
|
# Modal GPU Compute
|
||||||
|
|
||||||
## Quick Start
|
## Table of Contents
|
||||||
|
|
||||||
Run functions on GPUs with the `gpu` parameter:
|
- [Available GPUs](#available-gpus)
|
||||||
|
- [Requesting GPUs](#requesting-gpus)
|
||||||
|
- [GPU Selection Guide](#gpu-selection-guide)
|
||||||
|
- [Multi-GPU](#multi-gpu)
|
||||||
|
- [GPU Fallback Chains](#gpu-fallback-chains)
|
||||||
|
- [Auto-Upgrades](#auto-upgrades)
|
||||||
|
- [Multi-GPU Training](#multi-gpu-training)
|
||||||
|
|
||||||
```python
|
## Available GPUs
|
||||||
import modal
|
|
||||||
|
|
||||||
image = modal.Image.debian_slim().pip_install("torch")
|
| GPU | VRAM | Max per Container | Best For |
|
||||||
app = modal.App(image=image)
|
|-----|------|-------------------|----------|
|
||||||
|
| T4 | 16 GB | 8 | Budget inference, small models |
|
||||||
|
| L4 | 24 GB | 8 | Inference, video processing |
|
||||||
|
| A10 | 24 GB | 4 | Inference, fine-tuning small models |
|
||||||
|
| L40S | 48 GB | 8 | Inference (best cost/perf), medium models |
|
||||||
|
| A100-40GB | 40 GB | 8 | Training, large model inference |
|
||||||
|
| A100-80GB | 80 GB | 8 | Training, large models |
|
||||||
|
| RTX-PRO-6000 | 48 GB | 8 | Rendering, inference |
|
||||||
|
| H100 | 80 GB | 8 | Large-scale training, fast inference |
|
||||||
|
| H200 | 141 GB | 8 | Very large models, training |
|
||||||
|
| B200 | 192 GB | 8 | Largest models, maximum throughput |
|
||||||
|
| B200+ | 192 GB | 8 | B200 or B300, B200 pricing |
|
||||||
|
|
||||||
@app.function(gpu="A100")
|
## Requesting GPUs
|
||||||
def run():
|
|
||||||
import torch
|
|
||||||
assert torch.cuda.is_available()
|
|
||||||
```
|
|
||||||
|
|
||||||
## Available GPU Types
|
### Basic Request
|
||||||
|
|
||||||
Modal supports the following GPUs:
|
|
||||||
|
|
||||||
- `T4` - Entry-level GPU
|
|
||||||
- `L4` - Balanced performance and cost
|
|
||||||
- `A10` - Up to 4 GPUs, 96 GB total
|
|
||||||
- `A100` - 40GB or 80GB variants
|
|
||||||
- `A100-40GB` - Specific 40GB variant
|
|
||||||
- `A100-80GB` - Specific 80GB variant
|
|
||||||
- `L40S` - 48 GB, excellent for inference
|
|
||||||
- `H100` / `H100!` - Top-tier Hopper architecture
|
|
||||||
- `H200` - Improved Hopper with more memory
|
|
||||||
- `B200` - Latest Blackwell architecture
|
|
||||||
|
|
||||||
See https://modal.com/pricing for pricing.
|
|
||||||
|
|
||||||
## GPU Count
|
|
||||||
|
|
||||||
Request multiple GPUs per container with `:n` syntax:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(gpu="H100:8")
|
|
||||||
def run_llama_405b():
|
|
||||||
# 8 H100 GPUs available
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Supported counts:
|
|
||||||
- B200, H200, H100, A100, L4, T4, L40S: up to 8 GPUs (up to 1,536 GB)
|
|
||||||
- A10: up to 4 GPUs (up to 96 GB)
|
|
||||||
|
|
||||||
Note: Requesting >2 GPUs may result in longer wait times.
|
|
||||||
|
|
||||||
## GPU Selection Guide
|
|
||||||
|
|
||||||
**For Inference (Recommended)**: Start with L40S
|
|
||||||
- Excellent cost/performance
|
|
||||||
- 48 GB memory
|
|
||||||
- Good for LLaMA, Stable Diffusion, etc.
|
|
||||||
|
|
||||||
**For Training**: Consider H100 or A100
|
|
||||||
- High compute throughput
|
|
||||||
- Large memory for batch processing
|
|
||||||
|
|
||||||
**For Memory-Bound Tasks**: H200 or A100-80GB
|
|
||||||
- More memory capacity
|
|
||||||
- Better for large models
|
|
||||||
|
|
||||||
## B200 GPUs
|
|
||||||
|
|
||||||
NVIDIA's flagship Blackwell chip:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(gpu="B200:8")
|
|
||||||
def run_deepseek():
|
|
||||||
# Most powerful option
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
## H200 and H100 GPUs
|
|
||||||
|
|
||||||
Hopper architecture GPUs with excellent software support:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(gpu="H100")
|
@app.function(gpu="H100")
|
||||||
def train():
|
def train():
|
||||||
...
|
import torch
|
||||||
|
assert torch.cuda.is_available()
|
||||||
|
print(f"Using: {torch.cuda.get_device_name(0)}")
|
||||||
```
|
```
|
||||||
|
|
||||||
### Automatic H200 Upgrades
|
### String Shorthand
|
||||||
|
|
||||||
Modal may upgrade `gpu="H100"` to H200 at no extra cost. H200 provides:
|
|
||||||
- 141 GB memory (vs 80 GB for H100)
|
|
||||||
- 4.8 TB/s bandwidth (vs 3.35 TB/s)
|
|
||||||
|
|
||||||
To avoid automatic upgrades (e.g., for benchmarking):
|
|
||||||
```python
|
|
||||||
@app.function(gpu="H100!")
|
|
||||||
def benchmark():
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
## A100 GPUs
|
|
||||||
|
|
||||||
Ampere architecture with 40GB or 80GB variants:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# May be automatically upgraded to 80GB
|
gpu="T4" # Single T4
|
||||||
@app.function(gpu="A100")
|
gpu="A100-80GB" # Single A100 80GB
|
||||||
def qwen_7b():
|
gpu="H100:4" # Four H100s
|
||||||
...
|
|
||||||
|
|
||||||
# Specific variants
|
|
||||||
@app.function(gpu="A100-40GB")
|
|
||||||
def model_40gb():
|
|
||||||
...
|
|
||||||
|
|
||||||
@app.function(gpu="A100-80GB")
|
|
||||||
def llama_70b():
|
|
||||||
...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## GPU Fallbacks
|
### GPU Object (Advanced)
|
||||||
|
|
||||||
Specify multiple GPU types with fallback:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(gpu=["H100", "A100-40GB:2"])
|
@app.function(gpu=modal.gpu.H100(count=2))
|
||||||
def run_on_80gb():
|
def multi_gpu():
|
||||||
# Tries H100 first, falls back to 2x A100-40GB
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
Modal respects ordering and allocates most preferred available GPU.
|
## GPU Selection Guide
|
||||||
|
|
||||||
|
### For Inference
|
||||||
|
|
||||||
|
| Model Size | Recommended GPU | Why |
|
||||||
|
|-----------|----------------|-----|
|
||||||
|
| < 7B params | T4, L4 | Cost-effective, sufficient VRAM |
|
||||||
|
| 7B-13B params | L40S | Best cost/performance, 48 GB VRAM |
|
||||||
|
| 13B-70B params | A100-80GB, H100 | Large VRAM, fast memory bandwidth |
|
||||||
|
| 70B+ params | H100:2+, H200, B200 | Multi-GPU or very large VRAM |
|
||||||
|
|
||||||
|
### For Training
|
||||||
|
|
||||||
|
| Task | Recommended GPU |
|
||||||
|
|------|----------------|
|
||||||
|
| Fine-tuning (LoRA) | L40S, A100-40GB |
|
||||||
|
| Full fine-tuning small models | A100-80GB |
|
||||||
|
| Full fine-tuning large models | H100:4+, H200 |
|
||||||
|
| Pre-training | H100:8, B200:8 |
|
||||||
|
|
||||||
|
### General Recommendation
|
||||||
|
|
||||||
|
L40S is the best default for inference workloads — it offers an excellent trade-off of cost and performance with 48 GB of GPU RAM.
|
||||||
|
|
||||||
|
## Multi-GPU
|
||||||
|
|
||||||
|
Request multiple GPUs by appending `:count`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu="H100:4")
|
||||||
|
def distributed():
|
||||||
|
import torch
|
||||||
|
print(f"GPUs available: {torch.cuda.device_count()}")
|
||||||
|
# All 4 GPUs are on the same physical machine
|
||||||
|
```
|
||||||
|
|
||||||
|
- Up to 8 GPUs for most types (up to 4 for A10)
|
||||||
|
- All GPUs attach to the same physical machine
|
||||||
|
- Requesting more than 2 GPUs may result in longer wait times
|
||||||
|
- Maximum VRAM: 8 x B200 = 1,536 GB
|
||||||
|
|
||||||
|
## GPU Fallback Chains
|
||||||
|
|
||||||
|
Specify a prioritized list of GPU types:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu=["H100", "A100-80GB", "L40S"])
|
||||||
|
def flexible():
|
||||||
|
# Modal tries H100 first, then A100-80GB, then L40S
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Useful for reducing queue times when a specific GPU isn't available.
|
||||||
|
|
||||||
|
## Auto-Upgrades
|
||||||
|
|
||||||
|
### H100 → H200
|
||||||
|
|
||||||
|
Modal may automatically upgrade H100 requests to H200 at no extra cost. To prevent this:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu="H100!") # Exclamation mark prevents auto-upgrade
|
||||||
|
def must_use_h100():
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### A100 → A100-80GB
|
||||||
|
|
||||||
|
A100-40GB requests may be upgraded to 80GB at no extra cost.
|
||||||
|
|
||||||
|
### B200+
|
||||||
|
|
||||||
|
`gpu="B200+"` allows Modal to run on B200 or B300 GPUs at B200 pricing. Requires CUDA 13.0+.
|
||||||
|
|
||||||
## Multi-GPU Training
|
## Multi-GPU Training
|
||||||
|
|
||||||
Modal supports multi-GPU training on a single node. Multi-node training is in closed beta.
|
Modal supports multi-GPU training on a single node. Multi-node training is in private beta.
|
||||||
|
|
||||||
### PyTorch Example
|
### PyTorch DDP Example
|
||||||
|
|
||||||
For frameworks that re-execute entrypoints, use subprocess or specific strategies:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(gpu="A100:2")
|
@app.function(gpu="H100:4", image=image, timeout=86400)
|
||||||
def train():
|
def train_distributed():
|
||||||
import subprocess
|
import torch
|
||||||
import sys
|
import torch.distributed as dist
|
||||||
subprocess.run(
|
|
||||||
["python", "train.py"],
|
dist.init_process_group(backend="nccl")
|
||||||
stdout=sys.stdout,
|
local_rank = int(os.environ.get("LOCAL_RANK", 0))
|
||||||
stderr=sys.stderr,
|
device = torch.device(f"cuda:{local_rank}")
|
||||||
check=True,
|
# ... training loop with DDP ...
|
||||||
)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
For PyTorch Lightning, set strategy to `ddp_spawn` or `ddp_notebook`.
|
### PyTorch Lightning
|
||||||
|
|
||||||
## Performance Considerations
|
When using frameworks that re-execute Python entrypoints (like PyTorch Lightning), either:
|
||||||
|
|
||||||
**Memory-Bound vs Compute-Bound**:
|
1. Set strategy to `ddp_spawn` or `ddp_notebook`
|
||||||
- Running models with small batch sizes is memory-bound
|
2. Or run training as a subprocess
|
||||||
- Newer GPUs have faster arithmetic than memory access
|
|
||||||
- Speedup from newer hardware may not justify cost for memory-bound workloads
|
|
||||||
|
|
||||||
**Optimization**:
|
```python
|
||||||
- Use batching when possible
|
@app.function(gpu="H100:4", image=image)
|
||||||
- Consider L40S before jumping to H100/B200
|
def train():
|
||||||
- Profile to identify bottlenecks
|
import subprocess
|
||||||
|
subprocess.run(["python", "train_script.py"], check=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Hugging Face Accelerate
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu="A100-80GB:4", image=image)
|
||||||
|
def finetune():
|
||||||
|
import subprocess
|
||||||
|
subprocess.run([
|
||||||
|
"accelerate", "launch",
|
||||||
|
"--num_processes", "4",
|
||||||
|
"train.py"
|
||||||
|
], check=True)
|
||||||
|
```
|
||||||
|
|||||||
@@ -1,261 +1,259 @@
|
|||||||
# Modal Images
|
# Modal Container Images
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Base Images](#base-images)
|
||||||
|
- [Installing Packages](#installing-packages)
|
||||||
|
- [System Packages](#system-packages)
|
||||||
|
- [Shell Commands](#shell-commands)
|
||||||
|
- [Running Python During Build](#running-python-during-build)
|
||||||
|
- [Adding Local Files](#adding-local-files)
|
||||||
|
- [Environment Variables](#environment-variables)
|
||||||
|
- [Dockerfiles](#dockerfiles)
|
||||||
|
- [Alternative Package Managers](#alternative-package-managers)
|
||||||
|
- [Image Caching](#image-caching)
|
||||||
|
- [Handling Remote-Only Imports](#handling-remote-only-imports)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Modal Images define the environment code runs in - containers with dependencies installed. Images are built from method chains starting from a base image.
|
Every Modal function runs inside a container built from an `Image`. By default, Modal uses a Debian Linux image with the same Python minor version as your local interpreter.
|
||||||
|
|
||||||
|
Images are built lazily — Modal only builds/pulls the image when a function using it is first invoked. Layers are cached for fast rebuilds.
|
||||||
|
|
||||||
## Base Images
|
## Base Images
|
||||||
|
|
||||||
Start with a base image and chain methods:
|
```python
|
||||||
|
# Default: Debian slim with your local Python version
|
||||||
|
image = modal.Image.debian_slim()
|
||||||
|
|
||||||
|
# Specific Python version
|
||||||
|
image = modal.Image.debian_slim(python_version="3.11")
|
||||||
|
|
||||||
|
# From Docker Hub
|
||||||
|
image = modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04")
|
||||||
|
|
||||||
|
# From a Dockerfile
|
||||||
|
image = modal.Image.from_dockerfile("./Dockerfile")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installing Packages
|
||||||
|
|
||||||
|
### uv (Recommended)
|
||||||
|
|
||||||
|
`uv_pip_install` uses the uv package manager for fast, reliable installs:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = (
|
image = (
|
||||||
modal.Image.debian_slim(python_version="3.13")
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
.apt_install("git")
|
.uv_pip_install(
|
||||||
.uv_pip_install("torch<3")
|
"torch==2.8.0",
|
||||||
.env({"HALT_AND_CATCH_FIRE": "0"})
|
"transformers>=4.40",
|
||||||
.run_commands("git clone https://github.com/modal-labs/agi")
|
"accelerate",
|
||||||
)
|
"scipy",
|
||||||
```
|
|
||||||
|
|
||||||
Available base images:
|
|
||||||
- `Image.debian_slim()` - Debian Linux with Python
|
|
||||||
- `Image.micromamba()` - Base with Micromamba package manager
|
|
||||||
- `Image.from_registry()` - Pull from Docker Hub, ECR, etc.
|
|
||||||
- `Image.from_dockerfile()` - Build from existing Dockerfile
|
|
||||||
|
|
||||||
## Installing Python Packages
|
|
||||||
|
|
||||||
### With uv (Recommended)
|
|
||||||
|
|
||||||
Use `.uv_pip_install()` for fast package installation:
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = (
|
|
||||||
modal.Image.debian_slim()
|
|
||||||
.uv_pip_install("pandas==2.2.0", "numpy")
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### With pip
|
|
||||||
|
|
||||||
Fallback to standard pip if needed:
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = (
|
|
||||||
modal.Image.debian_slim(python_version="3.13")
|
|
||||||
.pip_install("pandas==2.2.0", "numpy")
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
Pin dependencies tightly (e.g., `"torch==2.8.0"`) for reproducibility.
|
|
||||||
|
|
||||||
## Installing System Packages
|
|
||||||
|
|
||||||
Install Linux packages with apt:
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = modal.Image.debian_slim().apt_install("git", "curl")
|
|
||||||
```
|
|
||||||
|
|
||||||
## Setting Environment Variables
|
|
||||||
|
|
||||||
Pass a dictionary to `.env()`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = modal.Image.debian_slim().env({"PORT": "6443"})
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running Shell Commands
|
|
||||||
|
|
||||||
Execute commands during image build:
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = (
|
|
||||||
modal.Image.debian_slim()
|
|
||||||
.apt_install("git")
|
|
||||||
.run_commands("git clone https://github.com/modal-labs/gpu-glossary")
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running Python Functions at Build Time
|
|
||||||
|
|
||||||
Download model weights or perform setup:
|
|
||||||
|
|
||||||
```python
|
|
||||||
def download_models():
|
|
||||||
import diffusers
|
|
||||||
model_name = "segmind/small-sd"
|
|
||||||
pipe = diffusers.StableDiffusionPipeline.from_pretrained(model_name)
|
|
||||||
|
|
||||||
hf_cache = modal.Volume.from_name("hf-cache")
|
|
||||||
|
|
||||||
image = (
|
|
||||||
modal.Image.debian_slim()
|
|
||||||
.pip_install("diffusers[torch]", "transformers")
|
|
||||||
.run_function(
|
|
||||||
download_models,
|
|
||||||
secrets=[modal.Secret.from_name("huggingface-secret")],
|
|
||||||
volumes={"/root/.cache/huggingface": hf_cache},
|
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Adding Local Files
|
Pin versions for reproducibility. uv resolves dependencies faster than pip.
|
||||||
|
|
||||||
### Add Files or Directories
|
### pip (Fallback)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = modal.Image.debian_slim().add_local_dir(
|
image = modal.Image.debian_slim().pip_install(
|
||||||
"/user/erikbern/.aws",
|
"numpy==1.26.0",
|
||||||
remote_path="/root/.aws"
|
"pandas==2.1.0",
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
By default, files are added at container startup. Use `copy=True` to include in built image.
|
### From requirements.txt
|
||||||
|
|
||||||
### Add Python Source
|
|
||||||
|
|
||||||
Add importable Python modules:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = modal.Image.debian_slim().add_local_python_source("local_module")
|
image = modal.Image.debian_slim().pip_install_from_requirements("requirements.txt")
|
||||||
|
|
||||||
@app.function(image=image)
|
|
||||||
def f():
|
|
||||||
import local_module
|
|
||||||
local_module.do_stuff()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Using Existing Container Images
|
### Private Packages
|
||||||
|
|
||||||
### From Public Registry
|
|
||||||
|
|
||||||
```python
|
|
||||||
sklearn_image = modal.Image.from_registry("huanjason/scikit-learn")
|
|
||||||
|
|
||||||
@app.function(image=sklearn_image)
|
|
||||||
def fit_knn():
|
|
||||||
from sklearn.neighbors import KNeighborsClassifier
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Can pull from Docker Hub, Nvidia NGC, AWS ECR, GitHub ghcr.io.
|
|
||||||
|
|
||||||
### From Private Registry
|
|
||||||
|
|
||||||
Use Modal Secrets for authentication:
|
|
||||||
|
|
||||||
**Docker Hub**:
|
|
||||||
```python
|
|
||||||
secret = modal.Secret.from_name("my-docker-secret")
|
|
||||||
image = modal.Image.from_registry(
|
|
||||||
"private-repo/image:tag",
|
|
||||||
secret=secret
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
**AWS ECR**:
|
|
||||||
```python
|
|
||||||
aws_secret = modal.Secret.from_name("my-aws-secret")
|
|
||||||
image = modal.Image.from_aws_ecr(
|
|
||||||
"000000000000.dkr.ecr.us-east-1.amazonaws.com/my-private-registry:latest",
|
|
||||||
secret=aws_secret,
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### From Dockerfile
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = modal.Image.from_dockerfile("Dockerfile")
|
|
||||||
|
|
||||||
@app.function(image=image)
|
|
||||||
def fit():
|
|
||||||
import sklearn
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Can still extend with other image methods after importing.
|
|
||||||
|
|
||||||
## Using Micromamba
|
|
||||||
|
|
||||||
For coordinated installation of Python and system packages:
|
|
||||||
|
|
||||||
```python
|
|
||||||
numpyro_pymc_image = (
|
|
||||||
modal.Image.micromamba()
|
|
||||||
.micromamba_install("pymc==5.10.4", "numpyro==0.13.2", channels=["conda-forge"])
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## GPU Support at Build Time
|
|
||||||
|
|
||||||
Run build steps on GPU instances:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = (
|
image = (
|
||||||
modal.Image.debian_slim()
|
modal.Image.debian_slim()
|
||||||
.pip_install("bitsandbytes", gpu="H100")
|
.pip_install_private_repos(
|
||||||
|
"github.com/org/private-repo",
|
||||||
|
git_user="username",
|
||||||
|
secrets=[modal.Secret.from_name("github-token")],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## System Packages
|
||||||
|
|
||||||
|
Install Linux packages via apt:
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim()
|
||||||
|
.apt_install("ffmpeg", "libsndfile1", "git", "curl")
|
||||||
|
.uv_pip_install("librosa", "soundfile")
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Shell Commands
|
||||||
|
|
||||||
|
Run arbitrary commands during image build:
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim()
|
||||||
|
.run_commands(
|
||||||
|
"wget https://example.com/data.tar.gz",
|
||||||
|
"tar -xzf data.tar.gz -C /opt/data",
|
||||||
|
"rm data.tar.gz",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### With GPU
|
||||||
|
|
||||||
|
Some build steps require GPU access (e.g., compiling CUDA kernels):
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim()
|
||||||
|
.uv_pip_install("torch")
|
||||||
|
.run_commands("python -c 'import torch; torch.cuda.is_available()'", gpu="A100")
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Python During Build
|
||||||
|
|
||||||
|
Execute Python functions as build steps — useful for downloading model weights:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def download_model():
|
||||||
|
from huggingface_hub import snapshot_download
|
||||||
|
snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
|
||||||
|
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim(python_version="3.11")
|
||||||
|
.uv_pip_install("huggingface_hub", "torch", "transformers")
|
||||||
|
.run_function(download_model, secrets=[modal.Secret.from_name("huggingface")])
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
The resulting filesystem (including downloaded files) is snapshotted into the image.
|
||||||
|
|
||||||
|
## Adding Local Files
|
||||||
|
|
||||||
|
### Local Directories
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = modal.Image.debian_slim().add_local_dir(
|
||||||
|
local_path="./config",
|
||||||
|
remote_path="/root/config",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, files are added at container startup (not baked into the image layer). Use `copy=True` to bake them in.
|
||||||
|
|
||||||
|
### Local Python Modules
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = modal.Image.debian_slim().add_local_python_source("my_module")
|
||||||
|
```
|
||||||
|
|
||||||
|
This uses Python's import system to find and include the module.
|
||||||
|
|
||||||
|
### Individual Files
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = modal.Image.debian_slim().add_local_file(
|
||||||
|
local_path="./model_config.json",
|
||||||
|
remote_path="/root/config.json",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim()
|
||||||
|
.env({
|
||||||
|
"TRANSFORMERS_CACHE": "/cache",
|
||||||
|
"TOKENIZERS_PARALLELISM": "false",
|
||||||
|
"HF_HOME": "/cache/huggingface",
|
||||||
|
})
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Names and values must be strings.
|
||||||
|
|
||||||
|
## Dockerfiles
|
||||||
|
|
||||||
|
Build from existing Dockerfiles:
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = modal.Image.from_dockerfile("./Dockerfile")
|
||||||
|
|
||||||
|
# With build context
|
||||||
|
image = modal.Image.from_dockerfile("./Dockerfile", context_mount=modal.Mount.from_local_dir("."))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Alternative Package Managers
|
||||||
|
|
||||||
|
### Micromamba / Conda
|
||||||
|
|
||||||
|
For packages requiring coordinated system and Python package installs:
|
||||||
|
|
||||||
|
```python
|
||||||
|
image = (
|
||||||
|
modal.Image.micromamba(python_version="3.11")
|
||||||
|
.micromamba_install("cudatoolkit=11.8", "cudnn=8.6", channels=["conda-forge"])
|
||||||
|
.uv_pip_install("torch")
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Image Caching
|
## Image Caching
|
||||||
|
|
||||||
Images are cached per layer. Breaking cache on one layer causes cascading rebuilds for subsequent layers.
|
Modal caches images per layer (per method call). Breaking the cache on one layer cascades to all subsequent layers.
|
||||||
|
|
||||||
Define frequently-changing layers last to maximize cache reuse.
|
### Optimization Tips
|
||||||
|
|
||||||
|
1. **Order layers by change frequency**: Put stable dependencies first, frequently changing code last
|
||||||
|
2. **Pin versions**: Unpinned versions may resolve differently and break cache
|
||||||
|
3. **Separate large installs**: Put heavy packages (torch, tensorflow) in early layers
|
||||||
|
|
||||||
### Force Rebuild
|
### Force Rebuild
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = (
|
# Single layer
|
||||||
modal.Image.debian_slim()
|
image = modal.Image.debian_slim().apt_install("git", force_build=True)
|
||||||
.apt_install("git")
|
|
||||||
.pip_install("slack-sdk", force_build=True)
|
|
||||||
)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Or set environment variable:
|
|
||||||
```bash
|
```bash
|
||||||
MODAL_FORCE_BUILD=1 modal run ...
|
# All images in a run
|
||||||
|
MODAL_FORCE_BUILD=1 modal run script.py
|
||||||
|
|
||||||
|
# Rebuild without updating cache
|
||||||
|
MODAL_IGNORE_CACHE=1 modal run script.py
|
||||||
```
|
```
|
||||||
|
|
||||||
## Handling Different Local/Remote Packages
|
## Handling Remote-Only Imports
|
||||||
|
|
||||||
Import packages only available remotely inside function bodies:
|
When packages are only available in the container (not locally), use conditional imports:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(image=image)
|
@app.function(image=image)
|
||||||
def my_function():
|
def process():
|
||||||
import pandas as pd # Only imported remotely
|
import torch # Only available in the container
|
||||||
df = pd.DataFrame()
|
return torch.cuda.device_count()
|
||||||
...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Or use the imports context manager:
|
For module-level imports shared across functions, use the `Image.imports()` context manager:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
pandas_image = modal.Image.debian_slim().pip_install("pandas")
|
with image.imports():
|
||||||
|
import torch
|
||||||
with pandas_image.imports():
|
import transformers
|
||||||
import pandas as pd
|
|
||||||
|
|
||||||
@app.function(image=pandas_image)
|
|
||||||
def my_function():
|
|
||||||
df = pd.DataFrame()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Fast Pull from Registry with eStargz
|
This prevents `ImportError` locally while making the imports available in the container.
|
||||||
|
|
||||||
Improve pull performance with eStargz compression:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker buildx build --tag "<registry>/<namespace>/<repo>:<version>" \
|
|
||||||
--output type=registry,compression=estargz,force-compression=true,oci-mediatypes=true \
|
|
||||||
.
|
|
||||||
```
|
|
||||||
|
|
||||||
Supported registries:
|
|
||||||
- AWS ECR
|
|
||||||
- Docker Hub
|
|
||||||
- Google Artifact Registry
|
|
||||||
|
|||||||
@@ -1,129 +1,117 @@
|
|||||||
# CPU, Memory, and Disk Resources
|
# Modal Resource Configuration
|
||||||
|
|
||||||
## Default Resources
|
## CPU
|
||||||
|
|
||||||
Each Modal container has default reservations:
|
### Requesting CPU
|
||||||
- **CPU**: 0.125 cores
|
|
||||||
- **Memory**: 128 MiB
|
|
||||||
|
|
||||||
Containers can exceed minimum if worker has available resources.
|
|
||||||
|
|
||||||
## CPU Cores
|
|
||||||
|
|
||||||
Request CPU cores as floating-point number:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(cpu=8.0)
|
@app.function(cpu=4.0)
|
||||||
def my_function():
|
def compute():
|
||||||
# Guaranteed access to at least 8 physical cores
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
Values correspond to physical cores, not vCPUs.
|
- Values are **physical cores**, not vCPUs
|
||||||
|
- Default: 0.125 cores
|
||||||
Modal sets multi-threading environment variables based on CPU reservation:
|
- Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request
|
||||||
- `OPENBLAS_NUM_THREADS`
|
|
||||||
- `OMP_NUM_THREADS`
|
|
||||||
- `MKL_NUM_THREADS`
|
|
||||||
|
|
||||||
## Memory
|
|
||||||
|
|
||||||
Request memory in megabytes (integer):
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(memory=32768)
|
|
||||||
def my_function():
|
|
||||||
# Guaranteed access to at least 32 GiB RAM
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
## Resource Limits
|
|
||||||
|
|
||||||
### CPU Limits
|
### CPU Limits
|
||||||
|
|
||||||
Default soft CPU limit: request + 16 cores
|
- Default soft limit: 16 physical cores above the CPU request
|
||||||
- Default request: 0.125 cores → default limit: 16.125 cores
|
- Set explicit limits to prevent noisy-neighbor effects:
|
||||||
- Above limit, host throttles CPU usage
|
|
||||||
|
|
||||||
Set explicit CPU limit:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
cpu_request = 1.0
|
@app.function(cpu=4.0) # Request 4 cores
|
||||||
cpu_limit = 4.0
|
def bounded_compute():
|
||||||
|
|
||||||
@app.function(cpu=(cpu_request, cpu_limit))
|
|
||||||
def f():
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Memory
|
||||||
|
|
||||||
|
### Requesting Memory
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(memory=16384) # 16 GiB in MiB
|
||||||
|
def large_data():
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
- Value in **MiB** (megabytes)
|
||||||
|
- Default: 128 MiB
|
||||||
|
|
||||||
### Memory Limits
|
### Memory Limits
|
||||||
|
|
||||||
Set hard memory limit to OOM kill containers at threshold:
|
Set hard memory limits to OOM-kill containers that exceed them:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
mem_request = 1024 # MB
|
@app.function(memory=8192) # 8 GiB request and limit
|
||||||
mem_limit = 2048 # MB
|
def bounded_memory():
|
||||||
|
|
||||||
@app.function(memory=(mem_request, mem_limit))
|
|
||||||
def f():
|
|
||||||
# Container killed if exceeds 2048 MB
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
Useful for catching memory leaks early.
|
This prevents paying for runaway memory leaks.
|
||||||
|
|
||||||
### Disk Limits
|
## Ephemeral Disk
|
||||||
|
|
||||||
Running containers have access to many GBs of SSD disk, limited by:
|
For temporary storage within a container's lifetime:
|
||||||
1. Underlying worker's SSD capacity
|
|
||||||
2. Per-container disk quota (100s of GBs)
|
|
||||||
|
|
||||||
Hitting limits causes `OSError` on disk writes.
|
|
||||||
|
|
||||||
Request larger disk with `ephemeral_disk`:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(ephemeral_disk=10240) # 10 GiB
|
@app.function(ephemeral_disk=102400) # 100 GiB in MiB
|
||||||
def process_large_files():
|
def process_dataset():
|
||||||
|
# Temporary files at /tmp or anywhere in the container filesystem
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
Maximum disk size: 3.0 TiB (3,145,728 MiB)
|
- Value in **MiB**
|
||||||
Intended use: dataset processing
|
- Default: 512 GiB quota per container
|
||||||
|
- Maximum: 3,145,728 MiB (3 TiB)
|
||||||
|
- Data is lost when the container shuts down
|
||||||
|
- Use Volumes for persistent storage
|
||||||
|
|
||||||
|
Larger disk requests increase the memory request at a 20:1 ratio for billing purposes.
|
||||||
|
|
||||||
|
## Timeout
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(timeout=3600) # 1 hour in seconds
|
||||||
|
def long_running():
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
- Default: 300 seconds (5 minutes)
|
||||||
|
- Maximum: 86,400 seconds (24 hours)
|
||||||
|
- Function is killed when timeout expires
|
||||||
|
|
||||||
## Billing
|
## Billing
|
||||||
|
|
||||||
Charged based on whichever is higher: reservation or actual usage.
|
You are charged based on **whichever is higher**: your resource request or actual usage.
|
||||||
|
|
||||||
Disk requests increase memory request at 20:1 ratio:
|
| Resource | Billing Basis |
|
||||||
- Requesting 500 GiB disk → increases memory request to 25 GiB (if not already higher)
|
|----------|--------------|
|
||||||
|
| CPU | max(requested, used) |
|
||||||
|
| Memory | max(requested, used) |
|
||||||
|
| GPU | Time GPU is allocated |
|
||||||
|
| Disk | Increases memory billing at 20:1 ratio |
|
||||||
|
|
||||||
## Maximum Requests
|
### Cost Optimization Tips
|
||||||
|
|
||||||
Modal enforces maximums at Function creation time. Requests exceeding maximum will be rejected with `InvalidError`.
|
- Request only what you need
|
||||||
|
- Use appropriate GPU tiers (L40S over H100 for inference)
|
||||||
|
- Set `scaledown_window` to minimize idle time
|
||||||
|
- Use `min_containers=0` when cold starts are acceptable
|
||||||
|
- Batch inputs with `.map()` instead of individual `.remote()` calls
|
||||||
|
|
||||||
Contact support if you need higher limits.
|
## Complete Example
|
||||||
|
|
||||||
## Example: Resource Configuration
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
@app.function(
|
||||||
cpu=4.0, # 4 physical cores
|
cpu=8.0, # 8 physical cores
|
||||||
memory=16384, # 16 GiB RAM
|
memory=32768, # 32 GiB
|
||||||
ephemeral_disk=51200, # 50 GiB disk
|
gpu="L40S", # L40S GPU
|
||||||
timeout=3600, # 1 hour timeout
|
ephemeral_disk=204800, # 200 GiB temp disk
|
||||||
|
timeout=7200, # 2 hours
|
||||||
|
max_containers=50,
|
||||||
|
min_containers=1,
|
||||||
)
|
)
|
||||||
def process_data():
|
def full_pipeline(data_path: str):
|
||||||
# Heavy processing with large files
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
## Monitoring Resource Usage
|
|
||||||
|
|
||||||
View resource usage in Modal dashboard:
|
|
||||||
- CPU utilization
|
|
||||||
- Memory usage
|
|
||||||
- Disk usage
|
|
||||||
- GPU metrics (if applicable)
|
|
||||||
|
|
||||||
Access via https://modal.com/apps
|
|
||||||
|
|||||||
@@ -1,230 +1,173 @@
|
|||||||
# Scaling Out on Modal
|
# Modal Scaling and Concurrency
|
||||||
|
|
||||||
## Automatic Autoscaling
|
## Table of Contents
|
||||||
|
|
||||||
Every Modal Function corresponds to an autoscaling pool of containers. Modal's autoscaler:
|
- [Autoscaling](#autoscaling)
|
||||||
- Spins up containers when no capacity available
|
- [Configuration](#configuration)
|
||||||
- Spins down containers when resources idle
|
- [Parallel Execution](#parallel-execution)
|
||||||
- Scales to zero by default when no inputs to process
|
- [Concurrent Inputs](#concurrent-inputs)
|
||||||
|
- [Dynamic Batching](#dynamic-batching)
|
||||||
|
- [Dynamic Autoscaler Updates](#dynamic-autoscaler-updates)
|
||||||
|
- [Limits](#limits)
|
||||||
|
|
||||||
Autoscaling decisions are made quickly and frequently.
|
## Autoscaling
|
||||||
|
|
||||||
## Parallel Execution with `.map()`
|
Modal automatically manages a pool of containers for each function:
|
||||||
|
- Spins up containers when there's no capacity for new inputs
|
||||||
|
- Spins down idle containers to save costs
|
||||||
|
- Scales from zero (no cost when idle) to thousands of containers
|
||||||
|
|
||||||
Run function repeatedly with different inputs in parallel:
|
No configuration needed for basic autoscaling — it works out of the box.
|
||||||
|
|
||||||
```python
|
## Configuration
|
||||||
@app.function()
|
|
||||||
def evaluate_model(x):
|
|
||||||
return x ** 2
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
Fine-tune autoscaling behavior:
|
||||||
def main():
|
|
||||||
inputs = list(range(100))
|
|
||||||
# Runs 100 inputs in parallel across containers
|
|
||||||
for result in evaluate_model.map(inputs):
|
|
||||||
print(result)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Multiple Arguments with `.starmap()`
|
|
||||||
|
|
||||||
For functions with multiple arguments:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
def add(a, b):
|
|
||||||
return a + b
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
results = list(add.starmap([(1, 2), (3, 4)]))
|
|
||||||
# [3, 7]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Exception Handling
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
def may_fail(a):
|
|
||||||
if a == 2:
|
|
||||||
raise Exception("error")
|
|
||||||
return a ** 2
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
def main():
|
|
||||||
results = list(may_fail.map(
|
|
||||||
range(3),
|
|
||||||
return_exceptions=True,
|
|
||||||
wrap_returned_exceptions=False
|
|
||||||
))
|
|
||||||
# [0, 1, Exception('error')]
|
|
||||||
```
|
|
||||||
|
|
||||||
## Autoscaling Configuration
|
|
||||||
|
|
||||||
Configure autoscaler behavior with parameters:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
@app.function(
|
||||||
max_containers=100, # Upper limit on containers
|
max_containers=100, # Upper limit on container count
|
||||||
min_containers=2, # Keep warm even when inactive
|
min_containers=2, # Keep 2 warm (reduces cold starts)
|
||||||
buffer_containers=5, # Maintain buffer while active
|
buffer_containers=5, # Reserve 5 extra for burst traffic
|
||||||
scaledown_window=60, # Max idle time before scaling down (seconds)
|
scaledown_window=300, # Wait 5 min idle before shutting down
|
||||||
)
|
)
|
||||||
def my_function():
|
def handle_request(data):
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
Parameters:
|
| Parameter | Default | Description |
|
||||||
- **max_containers**: Upper limit on total containers
|
|-----------|---------|-------------|
|
||||||
- **min_containers**: Minimum kept warm even when inactive
|
| `max_containers` | Unlimited | Hard cap on total containers |
|
||||||
- **buffer_containers**: Buffer size while function active (additional inputs won't need to queue)
|
| `min_containers` | 0 | Minimum warm containers (costs money even when idle) |
|
||||||
- **scaledown_window**: Maximum idle duration before scale down (seconds)
|
| `buffer_containers` | 0 | Extra containers to prevent queuing |
|
||||||
|
| `scaledown_window` | 60 | Seconds of idle time before shutdown |
|
||||||
|
|
||||||
Trade-offs:
|
### Trade-offs
|
||||||
- Larger warm pool/buffer → Higher cost, lower latency
|
|
||||||
- Longer scaledown window → Less churn for infrequent requests
|
- Higher `min_containers` = lower latency, higher cost
|
||||||
|
- Higher `buffer_containers` = less queuing, higher cost
|
||||||
|
- Lower `scaledown_window` = faster cost savings, more cold starts
|
||||||
|
|
||||||
|
## Parallel Execution
|
||||||
|
|
||||||
|
### `.map()` — Process Many Inputs
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function()
|
||||||
|
def process(item):
|
||||||
|
return heavy_computation(item)
|
||||||
|
|
||||||
|
@app.local_entrypoint()
|
||||||
|
def main():
|
||||||
|
items = list(range(10_000))
|
||||||
|
results = list(process.map(items))
|
||||||
|
```
|
||||||
|
|
||||||
|
Modal automatically scales containers to handle the workload. Results maintain input order.
|
||||||
|
|
||||||
|
### `.map()` Options
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Unordered results (faster)
|
||||||
|
for result in process.map(items, order_outputs=False):
|
||||||
|
handle(result)
|
||||||
|
|
||||||
|
# Collect errors instead of raising
|
||||||
|
results = list(process.map(items, return_exceptions=True))
|
||||||
|
for r in results:
|
||||||
|
if isinstance(r, Exception):
|
||||||
|
print(f"Error: {r}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### `.starmap()` — Multi-Argument
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function()
|
||||||
|
def add(x, y):
|
||||||
|
return x + y
|
||||||
|
|
||||||
|
results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
|
||||||
|
# [3, 7, 11]
|
||||||
|
```
|
||||||
|
|
||||||
|
### `.spawn()` — Fire-and-Forget
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Returns immediately
|
||||||
|
call = process.spawn(large_data)
|
||||||
|
|
||||||
|
# Check status or get result later
|
||||||
|
result = call.get()
|
||||||
|
```
|
||||||
|
|
||||||
|
Up to 1 million pending `.spawn()` calls.
|
||||||
|
|
||||||
|
## Concurrent Inputs
|
||||||
|
|
||||||
|
By default, each container handles one input at a time. Use `@modal.concurrent` to handle multiple:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu="L40S")
|
||||||
|
@modal.concurrent(max_inputs=10)
|
||||||
|
async def predict(text: str):
|
||||||
|
result = await model.predict_async(text)
|
||||||
|
return result
|
||||||
|
```
|
||||||
|
|
||||||
|
This is ideal for I/O-bound workloads or async inference where a single GPU can handle multiple requests.
|
||||||
|
|
||||||
|
### With Web Endpoints
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu="L40S")
|
||||||
|
@modal.concurrent(max_inputs=20)
|
||||||
|
@modal.asgi_app()
|
||||||
|
def web_service():
|
||||||
|
return fastapi_app
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dynamic Batching
|
||||||
|
|
||||||
|
Collect inputs into batches for efficient GPU utilization:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(gpu="L40S")
|
||||||
|
@modal.batched(max_batch_size=32, wait_ms=100)
|
||||||
|
async def batch_predict(texts: list[str]):
|
||||||
|
# Called with up to 32 texts at once
|
||||||
|
embeddings = model.encode(texts)
|
||||||
|
return list(embeddings)
|
||||||
|
```
|
||||||
|
|
||||||
|
- `max_batch_size` — Maximum inputs per batch
|
||||||
|
- `wait_ms` — How long to wait for more inputs before processing
|
||||||
|
- The function receives a list and must return a list of the same length
|
||||||
|
|
||||||
## Dynamic Autoscaler Updates
|
## Dynamic Autoscaler Updates
|
||||||
|
|
||||||
Update autoscaler settings without redeployment:
|
Adjust autoscaling at runtime without redeploying:
|
||||||
|
|
||||||
```python
|
|
||||||
f = modal.Function.from_name("my-app", "f")
|
|
||||||
f.update_autoscaler(max_containers=100)
|
|
||||||
```
|
|
||||||
|
|
||||||
Settings revert to decorator configuration on next deploy, or are overridden by further updates:
|
|
||||||
|
|
||||||
```python
|
|
||||||
f.update_autoscaler(min_containers=2, max_containers=10)
|
|
||||||
f.update_autoscaler(min_containers=4) # max_containers=10 still in effect
|
|
||||||
```
|
|
||||||
|
|
||||||
### Time-Based Scaling
|
|
||||||
|
|
||||||
Adjust warm pool based on time of day:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
def inference_server():
|
def scale_up_for_peak():
|
||||||
...
|
process = modal.Function.from_name("my-app", "process")
|
||||||
|
process.update_autoscaler(min_containers=10, buffer_containers=20)
|
||||||
|
|
||||||
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
|
|
||||||
def increase_warm_pool():
|
|
||||||
inference_server.update_autoscaler(min_containers=4)
|
|
||||||
|
|
||||||
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
|
|
||||||
def decrease_warm_pool():
|
|
||||||
inference_server.update_autoscaler(min_containers=0)
|
|
||||||
```
|
|
||||||
|
|
||||||
### For Classes
|
|
||||||
|
|
||||||
Update autoscaler for specific parameter instances:
|
|
||||||
|
|
||||||
```python
|
|
||||||
MyClass = modal.Cls.from_name("my-app", "MyClass")
|
|
||||||
obj = MyClass(model_version="3.5")
|
|
||||||
obj.update_autoscaler(buffer_containers=2) # type: ignore
|
|
||||||
```
|
|
||||||
|
|
||||||
## Input Concurrency
|
|
||||||
|
|
||||||
Process multiple inputs per container with `@modal.concurrent`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
@app.function()
|
||||||
@modal.concurrent(max_inputs=100)
|
def scale_down_after_peak():
|
||||||
def my_function(input: str):
|
process = modal.Function.from_name("my-app", "process")
|
||||||
# Container can handle up to 100 concurrent inputs
|
process.update_autoscaler(min_containers=1, buffer_containers=2)
|
||||||
...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Ideal for I/O-bound workloads:
|
Settings revert to the decorator values on the next deployment.
|
||||||
- Database queries
|
|
||||||
- External API requests
|
|
||||||
- Remote Modal Function calls
|
|
||||||
|
|
||||||
### Concurrency Mechanisms
|
## Limits
|
||||||
|
|
||||||
**Synchronous Functions**: Separate threads (must be thread-safe)
|
| Resource | Limit |
|
||||||
|
|----------|-------|
|
||||||
|
| Pending inputs (unassigned) | 2,000 |
|
||||||
|
| Total inputs (running + pending) | 25,000 |
|
||||||
|
| Pending `.spawn()` inputs | 1,000,000 |
|
||||||
|
| Concurrent inputs per `.map()` | 1,000 |
|
||||||
|
| Rate limit (web endpoints) | 200 req/s |
|
||||||
|
|
||||||
```python
|
Exceeding these limits triggers `Resource Exhausted` errors. Implement retry logic for resilience.
|
||||||
@app.function()
|
|
||||||
@modal.concurrent(max_inputs=10)
|
|
||||||
def sync_function():
|
|
||||||
time.sleep(1) # Must be thread-safe
|
|
||||||
```
|
|
||||||
|
|
||||||
**Async Functions**: Separate asyncio tasks (must not block event loop)
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
@modal.concurrent(max_inputs=10)
|
|
||||||
async def async_function():
|
|
||||||
await asyncio.sleep(1) # Must not block event loop
|
|
||||||
```
|
|
||||||
|
|
||||||
### Target vs Max Inputs
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
@modal.concurrent(
|
|
||||||
max_inputs=120, # Hard limit
|
|
||||||
target_inputs=100 # Autoscaler target
|
|
||||||
)
|
|
||||||
def my_function(input: str):
|
|
||||||
# Allow 20% burst above target
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
Autoscaler aims for `target_inputs`, but containers can burst to `max_inputs` during scale-up.
|
|
||||||
|
|
||||||
## Scaling Limits
|
|
||||||
|
|
||||||
Modal enforces limits per function:
|
|
||||||
- 2,000 pending inputs (not yet assigned to containers)
|
|
||||||
- 25,000 total inputs (running + pending)
|
|
||||||
|
|
||||||
For `.spawn()` async jobs: up to 1 million pending inputs.
|
|
||||||
|
|
||||||
Exceeding limits returns `Resource Exhausted` error - retry later.
|
|
||||||
|
|
||||||
Each `.map()` invocation: max 1,000 concurrent inputs.
|
|
||||||
|
|
||||||
## Async Usage
|
|
||||||
|
|
||||||
Use async APIs for arbitrary parallel execution patterns:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
async def async_task(x):
|
|
||||||
await asyncio.sleep(1)
|
|
||||||
return x * 2
|
|
||||||
|
|
||||||
@app.local_entrypoint()
|
|
||||||
async def main():
|
|
||||||
tasks = [async_task.remote.aio(i) for i in range(100)]
|
|
||||||
results = await asyncio.gather(*tasks)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Common Gotchas
|
|
||||||
|
|
||||||
**Incorrect**: Using Python's builtin map (runs sequentially)
|
|
||||||
```python
|
|
||||||
# DON'T DO THIS
|
|
||||||
results = map(evaluate_model, inputs)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Incorrect**: Calling function first
|
|
||||||
```python
|
|
||||||
# DON'T DO THIS
|
|
||||||
results = evaluate_model(inputs).map()
|
|
||||||
```
|
|
||||||
|
|
||||||
**Correct**: Call .map() on Modal function object
|
|
||||||
```python
|
|
||||||
# DO THIS
|
|
||||||
results = evaluate_model.map(inputs)
|
|
||||||
```
|
|
||||||
|
|||||||
@@ -1,303 +1,143 @@
|
|||||||
# Scheduled Jobs and Cron
|
# Modal Scheduled Jobs
|
||||||
|
|
||||||
## Basic Scheduling
|
## Overview
|
||||||
|
|
||||||
Schedule functions to run automatically at regular intervals or specific times.
|
Modal supports running functions automatically on a schedule, either using cron syntax or fixed intervals. Deploy scheduled functions with `modal deploy` and they run unattended in the cloud.
|
||||||
|
|
||||||
### Simple Daily Schedule
|
## Schedule Types
|
||||||
|
|
||||||
|
### modal.Cron
|
||||||
|
|
||||||
|
Standard cron syntax — stable across deploys:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import modal
|
import modal
|
||||||
|
|
||||||
app = modal.App()
|
app = modal.App("scheduled-tasks")
|
||||||
|
|
||||||
@app.function(schedule=modal.Period(days=1))
|
# Daily at 9 AM UTC
|
||||||
def daily_task():
|
@app.function(schedule=modal.Cron("0 9 * * *"))
|
||||||
print("Running daily task")
|
def daily_report():
|
||||||
# Process data, send reports, etc.
|
generate_and_send_report()
|
||||||
|
|
||||||
|
# Every Monday at midnight
|
||||||
|
@app.function(schedule=modal.Cron("0 0 * * 1"))
|
||||||
|
def weekly_cleanup():
|
||||||
|
cleanup_old_data()
|
||||||
|
|
||||||
|
# Every 15 minutes
|
||||||
|
@app.function(schedule=modal.Cron("*/15 * * * *"))
|
||||||
|
def frequent_check():
|
||||||
|
check_system_health()
|
||||||
```
|
```
|
||||||
|
|
||||||
Deploy to activate:
|
#### Cron Syntax Reference
|
||||||
```bash
|
|
||||||
modal deploy script.py
|
```
|
||||||
|
┌───────────── minute (0-59)
|
||||||
|
│ ┌───────────── hour (0-23)
|
||||||
|
│ │ ┌───────────── day of month (1-31)
|
||||||
|
│ │ │ ┌───────────── month (1-12)
|
||||||
|
│ │ │ │ ┌───────────── day of week (0-6, Sun=0)
|
||||||
|
│ │ │ │ │
|
||||||
|
* * * * *
|
||||||
```
|
```
|
||||||
|
|
||||||
Function runs every 24 hours from deployment time.
|
| Pattern | Meaning |
|
||||||
|
|---------|---------|
|
||||||
|
| `0 9 * * *` | Daily at 9:00 AM UTC |
|
||||||
|
| `0 */6 * * *` | Every 6 hours |
|
||||||
|
| `*/30 * * * *` | Every 30 minutes |
|
||||||
|
| `0 0 * * 1` | Every Monday at midnight |
|
||||||
|
| `0 0 1 * *` | First day of every month |
|
||||||
|
| `0 9 * * 1-5` | Weekdays at 9 AM |
|
||||||
|
|
||||||
## Schedule Types
|
### modal.Period
|
||||||
|
|
||||||
### Period Schedules
|
Fixed interval — resets on each deploy:
|
||||||
|
|
||||||
Run at fixed intervals from deployment time:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# Every 5 hours
|
# Every 5 hours
|
||||||
@app.function(schedule=modal.Period(hours=5))
|
@app.function(schedule=modal.Period(hours=5))
|
||||||
def every_5_hours():
|
def periodic_sync():
|
||||||
...
|
sync_data()
|
||||||
|
|
||||||
# Every 30 minutes
|
# Every 30 minutes
|
||||||
@app.function(schedule=modal.Period(minutes=30))
|
@app.function(schedule=modal.Period(minutes=30))
|
||||||
def every_30_minutes():
|
def poll_updates():
|
||||||
...
|
check_for_updates()
|
||||||
|
|
||||||
# Every day
|
# Every day
|
||||||
@app.function(schedule=modal.Period(days=1))
|
@app.function(schedule=modal.Period(days=1))
|
||||||
def daily():
|
def daily_task():
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note**: Redeploying resets the period timer.
|
`modal.Period` resets its timer on each deployment. If you need a schedule that doesn't shift with deploys, use `modal.Cron`.
|
||||||
|
|
||||||
### Cron Schedules
|
## Deploying Scheduled Functions
|
||||||
|
|
||||||
Run at specific times using cron syntax:
|
Schedules only activate when deployed:
|
||||||
|
|
||||||
```python
|
|
||||||
# Every Monday at 8 AM UTC
|
|
||||||
@app.function(schedule=modal.Cron("0 8 * * 1"))
|
|
||||||
def weekly_report():
|
|
||||||
...
|
|
||||||
|
|
||||||
# Daily at 6 AM New York time
|
|
||||||
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
|
|
||||||
def morning_report():
|
|
||||||
...
|
|
||||||
|
|
||||||
# Every hour on the hour
|
|
||||||
@app.function(schedule=modal.Cron("0 * * * *"))
|
|
||||||
def hourly():
|
|
||||||
...
|
|
||||||
|
|
||||||
# Every 15 minutes
|
|
||||||
@app.function(schedule=modal.Cron("*/15 * * * *"))
|
|
||||||
def quarter_hourly():
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
**Cron syntax**: `minute hour day month day_of_week`
|
|
||||||
- Minute: 0-59
|
|
||||||
- Hour: 0-23
|
|
||||||
- Day: 1-31
|
|
||||||
- Month: 1-12
|
|
||||||
- Day of week: 0-6 (0 = Sunday)
|
|
||||||
|
|
||||||
### Timezone Support
|
|
||||||
|
|
||||||
Specify timezone for cron schedules:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(schedule=modal.Cron("0 9 * * *", timezone="Europe/London"))
|
|
||||||
def uk_morning_task():
|
|
||||||
...
|
|
||||||
|
|
||||||
@app.function(schedule=modal.Cron("0 17 * * 5", timezone="Asia/Tokyo"))
|
|
||||||
def friday_evening_jp():
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
## Deployment
|
|
||||||
|
|
||||||
### Deploy Scheduled Functions
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
modal deploy script.py
|
modal deploy script.py
|
||||||
```
|
```
|
||||||
|
|
||||||
Scheduled functions persist until explicitly stopped.
|
`modal run` and `modal serve` do not activate schedules.
|
||||||
|
|
||||||
### Programmatic Deployment
|
|
||||||
|
|
||||||
```python
|
|
||||||
if __name__ == "__main__":
|
|
||||||
app.deploy()
|
|
||||||
```
|
|
||||||
|
|
||||||
## Monitoring
|
## Monitoring
|
||||||
|
|
||||||
### View Execution Logs
|
- View scheduled runs in the **Apps** section of the Modal dashboard
|
||||||
|
- Each run appears with its status, duration, and logs
|
||||||
|
- Use the **"Run Now"** button on the dashboard to trigger manually
|
||||||
|
|
||||||
Check https://modal.com/apps for:
|
## Management
|
||||||
- Past execution logs
|
|
||||||
- Execution history
|
|
||||||
- Failure notifications
|
|
||||||
|
|
||||||
### Run Manually
|
- Schedules cannot be paused — remove the schedule and redeploy to stop
|
||||||
|
- To change a schedule, update the `schedule` parameter and redeploy
|
||||||
Trigger scheduled function immediately via dashboard "Run now" button.
|
- To stop entirely, either remove the `schedule` parameter or run `modal app stop <name>`
|
||||||
|
|
||||||
## Schedule Management
|
|
||||||
|
|
||||||
### Pausing Schedules
|
|
||||||
|
|
||||||
Schedules cannot be paused. To stop:
|
|
||||||
1. Remove `schedule` parameter
|
|
||||||
2. Redeploy app
|
|
||||||
|
|
||||||
### Updating Schedules
|
|
||||||
|
|
||||||
Change schedule parameters and redeploy:
|
|
||||||
|
|
||||||
```python
|
|
||||||
# Update from daily to weekly
|
|
||||||
@app.function(schedule=modal.Period(days=7))
|
|
||||||
def task():
|
|
||||||
...
|
|
||||||
```
|
|
||||||
|
|
||||||
```bash
|
|
||||||
modal deploy script.py
|
|
||||||
```
|
|
||||||
|
|
||||||
## Common Patterns
|
## Common Patterns
|
||||||
|
|
||||||
### Data Pipeline
|
### ETL Pipeline
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
@app.function(
|
||||||
schedule=modal.Cron("0 2 * * *"), # 2 AM daily
|
schedule=modal.Cron("0 2 * * *"), # 2 AM UTC daily
|
||||||
timeout=3600, # 1 hour timeout
|
secrets=[modal.Secret.from_name("db-creds")],
|
||||||
|
timeout=7200,
|
||||||
)
|
)
|
||||||
def etl_pipeline():
|
def etl_pipeline():
|
||||||
# Extract data from sources
|
import os
|
||||||
data = extract_data()
|
data = extract(os.environ["SOURCE_DB_URL"])
|
||||||
|
transformed = transform(data)
|
||||||
# Transform data
|
load(transformed, os.environ["DEST_DB_URL"])
|
||||||
transformed = transform_data(data)
|
|
||||||
|
|
||||||
# Load to warehouse
|
|
||||||
load_to_warehouse(transformed)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Model Retraining
|
### Model Retraining
|
||||||
|
|
||||||
```python
|
```python
|
||||||
volume = modal.Volume.from_name("models")
|
|
||||||
|
|
||||||
@app.function(
|
@app.function(
|
||||||
schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday midnight
|
schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday
|
||||||
gpu="A100",
|
gpu="H100",
|
||||||
timeout=7200, # 2 hours
|
volumes={"/data": data_vol, "/models": model_vol},
|
||||||
volumes={"/models": volume}
|
timeout=86400,
|
||||||
)
|
)
|
||||||
def retrain_model():
|
def retrain():
|
||||||
# Load latest data
|
model = train_on_latest_data("/data/training/")
|
||||||
data = load_training_data()
|
torch.save(model.state_dict(), "/models/latest.pt")
|
||||||
|
|
||||||
# Train model
|
|
||||||
model = train(data)
|
|
||||||
|
|
||||||
# Save new model
|
|
||||||
save_model(model, "/models/latest.pt")
|
|
||||||
volume.commit()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Report Generation
|
### Health Checks
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(
|
@app.function(
|
||||||
schedule=modal.Cron("0 9 * * 1"), # Monday 9 AM
|
schedule=modal.Period(minutes=5),
|
||||||
secrets=[modal.Secret.from_name("email-creds")]
|
secrets=[modal.Secret.from_name("slack-webhook")],
|
||||||
)
|
)
|
||||||
def weekly_report():
|
def health_check():
|
||||||
# Generate report
|
import os, requests
|
||||||
report = generate_analytics_report()
|
status = check_all_services()
|
||||||
|
if not status["healthy"]:
|
||||||
# Send email
|
requests.post(os.environ["SLACK_URL"], json={"text": f"Alert: {status}"})
|
||||||
send_email(
|
|
||||||
to="team@company.com",
|
|
||||||
subject="Weekly Analytics Report",
|
|
||||||
body=report
|
|
||||||
)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Data Cleanup
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(schedule=modal.Period(hours=6))
|
|
||||||
def cleanup_old_data():
|
|
||||||
# Remove data older than 30 days
|
|
||||||
cutoff = datetime.now() - timedelta(days=30)
|
|
||||||
delete_old_records(cutoff)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration with Secrets and Volumes
|
|
||||||
|
|
||||||
Scheduled functions support all function parameters:
|
|
||||||
|
|
||||||
```python
|
|
||||||
vol = modal.Volume.from_name("data")
|
|
||||||
secret = modal.Secret.from_name("api-keys")
|
|
||||||
|
|
||||||
@app.function(
|
|
||||||
schedule=modal.Cron("0 */6 * * *"), # Every 6 hours
|
|
||||||
secrets=[secret],
|
|
||||||
volumes={"/data": vol},
|
|
||||||
cpu=4.0,
|
|
||||||
memory=16384,
|
|
||||||
)
|
|
||||||
def sync_data():
|
|
||||||
import os
|
|
||||||
|
|
||||||
api_key = os.environ["API_KEY"]
|
|
||||||
|
|
||||||
# Fetch from external API
|
|
||||||
data = fetch_external_data(api_key)
|
|
||||||
|
|
||||||
# Save to volume
|
|
||||||
with open("/data/latest.json", "w") as f:
|
|
||||||
json.dump(data, f)
|
|
||||||
|
|
||||||
vol.commit()
|
|
||||||
```
|
|
||||||
|
|
||||||
## Dynamic Scheduling
|
|
||||||
|
|
||||||
Update schedules programmatically:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
def main_task():
|
|
||||||
...
|
|
||||||
|
|
||||||
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
|
|
||||||
def enable_high_traffic_mode():
|
|
||||||
main_task.update_autoscaler(min_containers=5)
|
|
||||||
|
|
||||||
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
|
|
||||||
def disable_high_traffic_mode():
|
|
||||||
main_task.update_autoscaler(min_containers=0)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Error Handling
|
|
||||||
|
|
||||||
Scheduled functions that fail will:
|
|
||||||
- Show failure in dashboard
|
|
||||||
- Send notifications (configurable)
|
|
||||||
- Retry on next scheduled run
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(
|
|
||||||
schedule=modal.Cron("0 * * * *"),
|
|
||||||
retries=3, # Retry failed runs
|
|
||||||
timeout=1800
|
|
||||||
)
|
|
||||||
def robust_task():
|
|
||||||
try:
|
|
||||||
perform_task()
|
|
||||||
except Exception as e:
|
|
||||||
# Log error
|
|
||||||
print(f"Task failed: {e}")
|
|
||||||
# Optionally send alert
|
|
||||||
send_alert(f"Scheduled task failed: {e}")
|
|
||||||
raise
|
|
||||||
```
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
1. **Set timeouts**: Always specify timeout for scheduled functions
|
|
||||||
2. **Use appropriate schedules**: Period for relative timing, Cron for absolute
|
|
||||||
3. **Monitor failures**: Check dashboard regularly for failed runs
|
|
||||||
4. **Idempotent operations**: Design tasks to handle reruns safely
|
|
||||||
5. **Resource limits**: Set appropriate CPU/memory for scheduled workloads
|
|
||||||
6. **Timezone awareness**: Specify timezone for cron schedules
|
|
||||||
|
|||||||
@@ -1,180 +1,119 @@
|
|||||||
# Secrets and Environment Variables
|
# Modal Secrets
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Modal Secrets securely deliver credentials and sensitive data to functions as environment variables. Secrets are stored encrypted and only available to your workspace.
|
||||||
|
|
||||||
## Creating Secrets
|
## Creating Secrets
|
||||||
|
|
||||||
### Via Dashboard
|
|
||||||
|
|
||||||
Create secrets at https://modal.com/secrets
|
|
||||||
|
|
||||||
Templates available for:
|
|
||||||
- Database credentials (Postgres, MongoDB)
|
|
||||||
- Cloud providers (AWS, GCP, Azure)
|
|
||||||
- ML platforms (Weights & Biases, Hugging Face)
|
|
||||||
- And more
|
|
||||||
|
|
||||||
### Via CLI
|
### Via CLI
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Create secret with key-value pairs
|
# Create with key-value pairs
|
||||||
modal secret create my-secret KEY1=value1 KEY2=value2
|
modal secret create my-api-keys API_KEY=sk-xxx DB_PASSWORD=hunter2
|
||||||
|
|
||||||
# Use environment variables
|
# Create from existing environment variables
|
||||||
modal secret create db-secret PGHOST=uri PGPASSWORD="$PGPASSWORD"
|
modal secret create my-env-keys API_KEY=$API_KEY
|
||||||
|
|
||||||
# List secrets
|
# List all secrets
|
||||||
modal secret list
|
modal secret list
|
||||||
|
|
||||||
# Delete secret
|
# Delete a secret
|
||||||
modal secret delete my-secret
|
modal secret delete my-api-keys
|
||||||
```
|
```
|
||||||
|
|
||||||
### Programmatically
|
### Via Dashboard
|
||||||
|
|
||||||
From dictionary:
|
Navigate to https://modal.com/secrets to create and manage secrets. Templates are available for common services (Postgres, MongoDB, Hugging Face, Weights & Biases, etc.).
|
||||||
|
|
||||||
|
### Programmatic (Inline)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
if modal.is_local():
|
# From a dictionary (useful for development)
|
||||||
local_secret = modal.Secret.from_dict({"FOO": os.environ["LOCAL_FOO"]})
|
secret = modal.Secret.from_dict({"API_KEY": "sk-xxx"})
|
||||||
else:
|
|
||||||
local_secret = modal.Secret.from_dict({})
|
|
||||||
|
|
||||||
@app.function(secrets=[local_secret])
|
# From a .env file
|
||||||
def some_function():
|
secret = modal.Secret.from_dotenv()
|
||||||
import os
|
|
||||||
print(os.environ["FOO"])
|
# From a named secret (created via CLI or dashboard)
|
||||||
|
secret = modal.Secret.from_name("my-api-keys")
|
||||||
```
|
```
|
||||||
|
|
||||||
From .env file:
|
## Using Secrets in Functions
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(secrets=[modal.Secret.from_dotenv()])
|
@app.function(secrets=[modal.Secret.from_name("my-api-keys")])
|
||||||
def some_function():
|
def call_api():
|
||||||
import os
|
import os
|
||||||
print(os.environ["USERNAME"])
|
api_key = os.environ["API_KEY"]
|
||||||
```
|
# Use the key
|
||||||
|
response = requests.get(url, headers={"Authorization": f"Bearer {api_key}"})
|
||||||
## Using Secrets
|
return response.json()
|
||||||
|
|
||||||
Inject secrets into functions:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(secrets=[modal.Secret.from_name("my-secret")])
|
|
||||||
def some_function():
|
|
||||||
import os
|
|
||||||
secret_key = os.environ["MY_PASSWORD"]
|
|
||||||
# Use secret
|
|
||||||
...
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Multiple Secrets
|
### Multiple Secrets
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(secrets=[
|
@app.function(secrets=[
|
||||||
|
modal.Secret.from_name("openai-keys"),
|
||||||
modal.Secret.from_name("database-creds"),
|
modal.Secret.from_name("database-creds"),
|
||||||
modal.Secret.from_name("api-keys"),
|
|
||||||
])
|
])
|
||||||
def other_function():
|
def process():
|
||||||
# All keys from both secrets available
|
import os
|
||||||
|
openai_key = os.environ["OPENAI_API_KEY"]
|
||||||
|
db_url = os.environ["DATABASE_URL"]
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
Later secrets override earlier ones if keys clash.
|
Secrets are applied in order — if two secrets define the same key, the later one wins.
|
||||||
|
|
||||||
## Environment Variables
|
### With Classes
|
||||||
|
|
||||||
### Reserved Runtime Variables
|
|
||||||
|
|
||||||
**All Containers**:
|
|
||||||
- `MODAL_CLOUD_PROVIDER` - Cloud provider (AWS/GCP/OCI)
|
|
||||||
- `MODAL_IMAGE_ID` - Image ID
|
|
||||||
- `MODAL_REGION` - Region identifier (e.g., us-east-1)
|
|
||||||
- `MODAL_TASK_ID` - Container task ID
|
|
||||||
|
|
||||||
**Function Containers**:
|
|
||||||
- `MODAL_ENVIRONMENT` - Modal Environment name
|
|
||||||
- `MODAL_IS_REMOTE` - Set to '1' in remote containers
|
|
||||||
- `MODAL_IDENTITY_TOKEN` - OIDC token for function identity
|
|
||||||
|
|
||||||
**Sandbox Containers**:
|
|
||||||
- `MODAL_SANDBOX_ID` - Sandbox ID
|
|
||||||
|
|
||||||
### Setting Environment Variables
|
|
||||||
|
|
||||||
Via Image:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = modal.Image.debian_slim().env({"PORT": "6443"})
|
@app.cls(secrets=[modal.Secret.from_name("huggingface")])
|
||||||
|
class ModelService:
|
||||||
@app.function(image=image)
|
@modal.enter()
|
||||||
def my_function():
|
def load(self):
|
||||||
import os
|
import os
|
||||||
port = os.environ["PORT"]
|
token = os.environ["HF_TOKEN"]
|
||||||
|
self.model = AutoModel.from_pretrained("model-name", token=token)
|
||||||
```
|
```
|
||||||
|
|
||||||
Via Secrets:
|
### From .env File
|
||||||
|
|
||||||
```python
|
```python
|
||||||
secret = modal.Secret.from_dict({"API_KEY": "secret-value"})
|
# Reads .env file from current directory
|
||||||
|
@app.function(secrets=[modal.Secret.from_dotenv()])
|
||||||
@app.function(secrets=[secret])
|
def local_dev():
|
||||||
def my_function():
|
|
||||||
import os
|
import os
|
||||||
api_key = os.environ["API_KEY"]
|
api_key = os.environ["API_KEY"]
|
||||||
```
|
```
|
||||||
|
|
||||||
## Common Secret Patterns
|
The `.env` file format:
|
||||||
|
|
||||||
### AWS Credentials
|
```
|
||||||
|
API_KEY=sk-xxx
|
||||||
```python
|
DATABASE_URL=postgres://user:pass@host/db
|
||||||
aws_secret = modal.Secret.from_name("my-aws-secret")
|
DEBUG=false
|
||||||
|
|
||||||
@app.function(secrets=[aws_secret])
|
|
||||||
def use_aws():
|
|
||||||
import boto3
|
|
||||||
s3 = boto3.client('s3')
|
|
||||||
# AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY automatically used
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Hugging Face Token
|
## Common Secret Templates
|
||||||
|
|
||||||
```python
|
| Service | Typical Keys |
|
||||||
hf_secret = modal.Secret.from_name("huggingface")
|
|---------|-------------|
|
||||||
|
| OpenAI | `OPENAI_API_KEY` |
|
||||||
@app.function(secrets=[hf_secret])
|
| Hugging Face | `HF_TOKEN` |
|
||||||
def download_model():
|
| AWS | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
|
||||||
from transformers import AutoModel
|
| Postgres | `PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE` |
|
||||||
# HF_TOKEN automatically used for authentication
|
| Weights & Biases | `WANDB_API_KEY` |
|
||||||
model = AutoModel.from_pretrained("private-model")
|
| GitHub | `GITHUB_TOKEN` |
|
||||||
```
|
|
||||||
|
|
||||||
### Database Credentials
|
|
||||||
|
|
||||||
```python
|
|
||||||
db_secret = modal.Secret.from_name("postgres-creds")
|
|
||||||
|
|
||||||
@app.function(secrets=[db_secret])
|
|
||||||
def query_db():
|
|
||||||
import psycopg2
|
|
||||||
conn = psycopg2.connect(
|
|
||||||
host=os.environ["PGHOST"],
|
|
||||||
port=os.environ["PGPORT"],
|
|
||||||
user=os.environ["PGUSER"],
|
|
||||||
password=os.environ["PGPASSWORD"],
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
1. **Never hardcode secrets** - Always use Modal Secrets
|
|
||||||
2. **Use specific secrets** - Create separate secrets for different purposes
|
|
||||||
3. **Rotate secrets regularly** - Update secrets periodically
|
|
||||||
4. **Minimal scope** - Only attach secrets to functions that need them
|
|
||||||
5. **Environment-specific** - Use different secrets for dev/staging/prod
|
|
||||||
|
|
||||||
## Security Notes
|
## Security Notes
|
||||||
|
|
||||||
- Secrets are encrypted at rest
|
- Secrets are encrypted at rest and in transit
|
||||||
- Only available to functions that explicitly request them
|
- Only accessible to functions in your workspace
|
||||||
- Not logged or exposed in dashboards
|
- Never log or print secret values
|
||||||
- Can be scoped to specific environments
|
- Use `.from_name()` in production (not `.from_dict()`)
|
||||||
|
- Rotate secrets regularly via the dashboard or CLI
|
||||||
|
|||||||
@@ -1,303 +1,247 @@
|
|||||||
# Modal Volumes
|
# Modal Volumes
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Creating Volumes](#creating-volumes)
|
||||||
|
- [Mounting Volumes](#mounting-volumes)
|
||||||
|
- [Reading and Writing Files](#reading-and-writing-files)
|
||||||
|
- [CLI Access](#cli-access)
|
||||||
|
- [Commits and Reloads](#commits-and-reloads)
|
||||||
|
- [Concurrent Access](#concurrent-access)
|
||||||
|
- [Volumes v2](#volumes-v2)
|
||||||
|
- [Common Patterns](#common-patterns)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Modal Volumes provide high-performance distributed file systems for Modal applications. Designed for write-once, read-many workloads like ML model weights and distributed data processing.
|
Volumes are Modal's distributed file system, optimized for write-once, read-many workloads like storing model weights and distributing them across containers.
|
||||||
|
|
||||||
|
Key characteristics:
|
||||||
|
- Persistent across function invocations and deployments
|
||||||
|
- Mountable by multiple functions simultaneously
|
||||||
|
- Background auto-commits every few seconds
|
||||||
|
- Final commit on container shutdown
|
||||||
|
|
||||||
## Creating Volumes
|
## Creating Volumes
|
||||||
|
|
||||||
|
### In Code (Lazy Creation)
|
||||||
|
|
||||||
|
```python
|
||||||
|
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
|
||||||
|
```
|
||||||
|
|
||||||
### Via CLI
|
### Via CLI
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
modal volume create my-volume
|
modal volume create my-volume
|
||||||
|
|
||||||
|
# v2 volume (beta)
|
||||||
|
modal volume create my-volume --version=2
|
||||||
```
|
```
|
||||||
|
|
||||||
For Volumes v2 (beta):
|
### Programmatic v2
|
||||||
```bash
|
|
||||||
modal volume create --version=2 my-volume
|
|
||||||
```
|
|
||||||
|
|
||||||
### From Code
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
|
|
||||||
|
|
||||||
# For v2
|
|
||||||
vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
|
vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Using Volumes
|
## Mounting Volumes
|
||||||
|
|
||||||
Attach to functions via mount points:
|
Mount volumes to functions via the `volumes` parameter:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
vol = modal.Volume.from_name("my-volume")
|
vol = modal.Volume.from_name("model-store", create_if_missing=True)
|
||||||
|
|
||||||
@app.function(volumes={"/data": vol})
|
@app.function(volumes={"/models": vol})
|
||||||
def run():
|
def use_model():
|
||||||
with open("/data/xyz.txt", "w") as f:
|
# Access files at /models/
|
||||||
f.write("hello")
|
with open("/models/config.json") as f:
|
||||||
vol.commit() # Persist changes
|
config = json.load(f)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Commits and Reloads
|
Mount multiple volumes:
|
||||||
|
|
||||||
### Commits
|
|
||||||
|
|
||||||
Persist changes to Volume:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(volumes={"/data": vol})
|
weights_vol = modal.Volume.from_name("weights")
|
||||||
def write_data():
|
data_vol = modal.Volume.from_name("datasets")
|
||||||
with open("/data/file.txt", "w") as f:
|
|
||||||
f.write("data")
|
|
||||||
vol.commit() # Make changes visible to other containers
|
|
||||||
```
|
|
||||||
|
|
||||||
**Background commits**: Modal automatically commits Volume changes every few seconds and on container shutdown.
|
@app.function(volumes={"/weights": weights_vol, "/data": data_vol})
|
||||||
|
def train():
|
||||||
### Reloads
|
|
||||||
|
|
||||||
Fetch latest changes from other containers:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(volumes={"/data": vol})
|
|
||||||
def read_data():
|
|
||||||
vol.reload() # Fetch latest changes
|
|
||||||
with open("/data/file.txt", "r") as f:
|
|
||||||
content = f.read()
|
|
||||||
```
|
|
||||||
|
|
||||||
At container creation, latest Volume state is mounted. Reload needed to see subsequent commits from other containers.
|
|
||||||
|
|
||||||
## Uploading Files
|
|
||||||
|
|
||||||
### Batch Upload (Efficient)
|
|
||||||
|
|
||||||
```python
|
|
||||||
vol = modal.Volume.from_name("my-volume")
|
|
||||||
|
|
||||||
with vol.batch_upload() as batch:
|
|
||||||
batch.put_file("local-path.txt", "/remote-path.txt")
|
|
||||||
batch.put_directory("/local/directory/", "/remote/directory")
|
|
||||||
batch.put_file(io.BytesIO(b"some data"), "/foobar")
|
|
||||||
```
|
|
||||||
|
|
||||||
### Via Image
|
|
||||||
|
|
||||||
```python
|
|
||||||
image = modal.Image.debian_slim().add_local_dir(
|
|
||||||
local_path="/home/user/my_dir",
|
|
||||||
remote_path="/app"
|
|
||||||
)
|
|
||||||
|
|
||||||
@app.function(image=image)
|
|
||||||
def process():
|
|
||||||
# Files available at /app
|
|
||||||
...
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
## Downloading Files
|
## Reading and Writing Files
|
||||||
|
|
||||||
### Via CLI
|
### Writing
|
||||||
|
|
||||||
```bash
|
|
||||||
modal volume get my-volume remote.txt local.txt
|
|
||||||
```
|
|
||||||
|
|
||||||
Max file size via CLI: No limit
|
|
||||||
Max file size via dashboard: 16 MB
|
|
||||||
|
|
||||||
### Via Python SDK
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
vol = modal.Volume.from_name("my-volume")
|
@app.function(volumes={"/data": vol})
|
||||||
|
def save_results(results):
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
for data in vol.read_file("path.txt"):
|
os.makedirs("/data/outputs", exist_ok=True)
|
||||||
print(data)
|
with open("/data/outputs/results.json", "w") as f:
|
||||||
|
json.dump(results, f)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Volume Performance
|
### Reading
|
||||||
|
|
||||||
### Volumes v1
|
|
||||||
|
|
||||||
Best for:
|
|
||||||
- <50,000 files (recommended)
|
|
||||||
- <500,000 files (hard limit)
|
|
||||||
- Sequential access patterns
|
|
||||||
- <5 concurrent writers
|
|
||||||
|
|
||||||
### Volumes v2 (Beta)
|
|
||||||
|
|
||||||
Improved for:
|
|
||||||
- Unlimited files
|
|
||||||
- Hundreds of concurrent writers
|
|
||||||
- Random access patterns
|
|
||||||
- Large files (up to 1 TiB)
|
|
||||||
|
|
||||||
Current v2 limits:
|
|
||||||
- Max file size: 1 TiB
|
|
||||||
- Max files per directory: 32,768
|
|
||||||
- Unlimited directory depth
|
|
||||||
|
|
||||||
## Model Storage
|
|
||||||
|
|
||||||
### Saving Model Weights
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
volume = modal.Volume.from_name("model-weights", create_if_missing=True)
|
@app.function(volumes={"/data": vol})
|
||||||
MODEL_DIR = "/models"
|
def load_results():
|
||||||
|
with open("/data/outputs/results.json") as f:
|
||||||
|
return json.load(f)
|
||||||
|
```
|
||||||
|
|
||||||
@app.function(volumes={MODEL_DIR: volume})
|
### Large Files (Model Weights)
|
||||||
def train():
|
|
||||||
|
```python
|
||||||
|
@app.function(volumes={"/models": vol}, gpu="L40S")
|
||||||
|
def save_model():
|
||||||
|
import torch
|
||||||
model = train_model()
|
model = train_model()
|
||||||
save_model(f"{MODEL_DIR}/my_model.pt", model)
|
torch.save(model.state_dict(), "/models/checkpoint.pt")
|
||||||
volume.commit()
|
|
||||||
|
@app.function(volumes={"/models": vol}, gpu="L40S")
|
||||||
|
def load_model():
|
||||||
|
import torch
|
||||||
|
model = MyModel()
|
||||||
|
model.load_state_dict(torch.load("/models/checkpoint.pt"))
|
||||||
|
return model
|
||||||
```
|
```
|
||||||
|
|
||||||
### Loading Model Weights
|
## CLI Access
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(volumes={MODEL_DIR: volume})
|
|
||||||
def inference(model_id: str):
|
|
||||||
try:
|
|
||||||
model = load_model(f"{MODEL_DIR}/{model_id}")
|
|
||||||
except NotFound:
|
|
||||||
volume.reload() # Fetch latest models
|
|
||||||
model = load_model(f"{MODEL_DIR}/{model_id}")
|
|
||||||
return model.run(request)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Model Checkpointing
|
|
||||||
|
|
||||||
Save checkpoints during long training jobs:
|
|
||||||
|
|
||||||
```python
|
|
||||||
volume = modal.Volume.from_name("checkpoints")
|
|
||||||
VOL_PATH = "/vol"
|
|
||||||
|
|
||||||
@app.function(
|
|
||||||
gpu="A10G",
|
|
||||||
timeout=2*60*60, # 2 hours
|
|
||||||
volumes={VOL_PATH: volume}
|
|
||||||
)
|
|
||||||
def finetune():
|
|
||||||
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
|
|
||||||
|
|
||||||
training_args = Seq2SeqTrainingArguments(
|
|
||||||
output_dir=str(VOL_PATH / "model"), # Checkpoints saved to Volume
|
|
||||||
save_steps=100,
|
|
||||||
# ... more args
|
|
||||||
)
|
|
||||||
|
|
||||||
trainer = Seq2SeqTrainer(model=model, args=training_args, ...)
|
|
||||||
trainer.train()
|
|
||||||
```
|
|
||||||
|
|
||||||
Background commits ensure checkpoints persist even if training is interrupted.
|
|
||||||
|
|
||||||
## CLI Commands
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# List files
|
# List files
|
||||||
modal volume ls my-volume
|
modal volume ls my-volume
|
||||||
|
modal volume ls my-volume /subdir/
|
||||||
|
|
||||||
# Upload
|
# Upload files
|
||||||
modal volume put my-volume local.txt remote.txt
|
modal volume put my-volume local_file.txt
|
||||||
|
modal volume put my-volume local_file.txt /remote/path/file.txt
|
||||||
|
|
||||||
# Download
|
# Download files
|
||||||
modal volume get my-volume remote.txt local.txt
|
modal volume get my-volume /remote/file.txt local_file.txt
|
||||||
|
|
||||||
# Copy within Volume
|
# Delete a volume
|
||||||
modal volume cp my-volume src.txt dst.txt
|
|
||||||
|
|
||||||
# Delete
|
|
||||||
modal volume rm my-volume file.txt
|
|
||||||
|
|
||||||
# List all volumes
|
|
||||||
modal volume list
|
|
||||||
|
|
||||||
# Delete volume
|
|
||||||
modal volume delete my-volume
|
modal volume delete my-volume
|
||||||
```
|
```
|
||||||
|
|
||||||
## Ephemeral Volumes
|
## Commits and Reloads
|
||||||
|
|
||||||
Create temporary volumes that are garbage collected:
|
Modal auto-commits volume changes in the background every few seconds and on container shutdown.
|
||||||
|
|
||||||
|
### Explicit Commit
|
||||||
|
|
||||||
|
Force an immediate commit:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
with modal.Volume.ephemeral() as vol:
|
@app.function(volumes={"/data": vol})
|
||||||
sb = modal.Sandbox.create(
|
def writer():
|
||||||
volumes={"/cache": vol},
|
with open("/data/file.txt", "w") as f:
|
||||||
app=my_app,
|
f.write("hello")
|
||||||
)
|
vol.commit() # Make immediately visible to other containers
|
||||||
# Use volume
|
```
|
||||||
# Automatically cleaned up when context exits
|
|
||||||
|
### Reload
|
||||||
|
|
||||||
|
See changes from other containers:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function(volumes={"/data": vol})
|
||||||
|
def reader():
|
||||||
|
vol.reload() # Refresh to see latest writes
|
||||||
|
with open("/data/file.txt") as f:
|
||||||
|
return f.read()
|
||||||
```
|
```
|
||||||
|
|
||||||
## Concurrent Access
|
## Concurrent Access
|
||||||
|
|
||||||
### Concurrent Reads
|
### v1 Volumes
|
||||||
|
|
||||||
Multiple containers can read simultaneously without issues.
|
- Recommended max 5 concurrent commits
|
||||||
|
- Last write wins for concurrent modifications of the same file
|
||||||
|
- Avoid concurrent modification of identical files
|
||||||
|
- Max 500,000 files (inodes)
|
||||||
|
|
||||||
### Concurrent Writes
|
### v2 Volumes
|
||||||
|
|
||||||
Supported but:
|
- Hundreds of concurrent writers (distinct files)
|
||||||
- Avoid modifying same files concurrently
|
- No file count limit
|
||||||
- Last write wins (data loss possible)
|
- Improved random access performance
|
||||||
- v1: Limit to ~5 concurrent writers
|
- Up to 1 TiB per file, 262,144 files per directory
|
||||||
- v2: Hundreds of concurrent writers supported
|
|
||||||
|
|
||||||
## Volume Errors
|
## Volumes v2
|
||||||
|
|
||||||
### "Volume Busy"
|
v2 Volumes (beta) offer significant improvements:
|
||||||
|
|
||||||
Cannot reload when files are open:
|
| Feature | v1 | v2 |
|
||||||
|
|---------|----|----|
|
||||||
|
| Max files | 500,000 | Unlimited |
|
||||||
|
| Concurrent writes | ~5 | Hundreds |
|
||||||
|
| Max file size | No limit | 1 TiB |
|
||||||
|
| Random access | Limited | Full support |
|
||||||
|
| HIPAA compliance | No | Yes |
|
||||||
|
| Hard links | No | Yes |
|
||||||
|
|
||||||
|
Enable v2:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# WRONG
|
vol = modal.Volume.from_name("my-vol-v2", create_if_missing=True, version=2)
|
||||||
f = open("/vol/data.txt", "r")
|
|
||||||
volume.reload() # ERROR: volume busy
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Common Patterns
|
||||||
|
|
||||||
|
### Model Weight Storage
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# CORRECT
|
vol = modal.Volume.from_name("model-weights", create_if_missing=True)
|
||||||
with open("/vol/data.txt", "r") as f:
|
|
||||||
data = f.read()
|
# Download once during image build
|
||||||
# File closed before reload
|
def download_weights():
|
||||||
volume.reload()
|
from huggingface_hub import snapshot_download
|
||||||
|
snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
|
||||||
|
|
||||||
|
image = (
|
||||||
|
modal.Image.debian_slim()
|
||||||
|
.uv_pip_install("huggingface_hub")
|
||||||
|
.run_function(download_weights, volumes={"/models": vol})
|
||||||
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
### "File Not Found"
|
### Training Checkpoints
|
||||||
|
|
||||||
Remember to use mount point:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
# WRONG - file saved to local disk
|
@app.function(volumes={"/checkpoints": vol}, gpu="H100", timeout=86400)
|
||||||
with open("/xyz.txt", "w") as f:
|
def train():
|
||||||
f.write("data")
|
for epoch in range(100):
|
||||||
|
train_one_epoch()
|
||||||
# CORRECT - file saved to Volume
|
torch.save(model.state_dict(), f"/checkpoints/epoch_{epoch}.pt")
|
||||||
with open("/data/xyz.txt", "w") as f:
|
vol.commit() # Save checkpoint immediately
|
||||||
f.write("data")
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Upgrading from v1 to v2
|
### Shared Data Between Functions
|
||||||
|
|
||||||
No automated migration currently. Manual steps:
|
```python
|
||||||
|
data_vol = modal.Volume.from_name("shared-data", create_if_missing=True)
|
||||||
|
|
||||||
1. Create new v2 Volume
|
@app.function(volumes={"/data": data_vol})
|
||||||
2. Copy data using `cp` or `rsync`
|
def preprocess():
|
||||||
3. Update app to use new Volume
|
# Write processed data
|
||||||
|
df.to_parquet("/data/processed.parquet")
|
||||||
|
|
||||||
```bash
|
@app.function(volumes={"/data": data_vol})
|
||||||
modal volume create --version=2 my-volume-v2
|
def analyze():
|
||||||
modal shell --volume my-volume --volume my-volume-v2
|
data_vol.reload() # Ensure we see latest data
|
||||||
|
df = pd.read_parquet("/data/processed.parquet")
|
||||||
# In shell:
|
return df.describe()
|
||||||
cp -rp /mnt/my-volume/. /mnt/my-volume-v2/.
|
|
||||||
sync /mnt/my-volume-v2
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Warning: Deployed apps reference Volumes by ID. Re-deploy after creating new Volume.
|
### Performance Tips
|
||||||
|
|
||||||
|
- Volumes are optimized for large files, not many small files
|
||||||
|
- Keep under 50,000 files and directories for best v1 performance
|
||||||
|
- Use Parquet or other columnar formats instead of many small CSVs
|
||||||
|
- For truly temporary data, use `ephemeral_disk` instead of Volumes
|
||||||
|
|||||||
@@ -1,337 +1,254 @@
|
|||||||
# Web Endpoints
|
# Modal Web Endpoints
|
||||||
|
|
||||||
## Quick Start
|
## Table of Contents
|
||||||
|
|
||||||
Create web endpoint with single decorator:
|
- [Simple Endpoints](#simple-endpoints)
|
||||||
|
- [Deployment](#deployment)
|
||||||
```python
|
- [ASGI Apps](#asgi-apps-fastapi-starlette-fasthtml)
|
||||||
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
|
- [WSGI Apps](#wsgi-apps-flask-django)
|
||||||
|
- [Custom Web Servers](#custom-web-servers)
|
||||||
@app.function(image=image)
|
- [WebSockets](#websockets)
|
||||||
@modal.fastapi_endpoint()
|
- [Authentication](#authentication)
|
||||||
def hello():
|
- [Streaming](#streaming)
|
||||||
return "Hello world!"
|
- [Concurrency](#concurrency)
|
||||||
```
|
- [Limits](#limits)
|
||||||
|
|
||||||
## Development and Deployment
|
|
||||||
|
|
||||||
### Development with `modal serve`
|
|
||||||
|
|
||||||
```bash
|
|
||||||
modal serve server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Creates ephemeral app with live-reloading. Changes to endpoints appear almost immediately.
|
|
||||||
|
|
||||||
### Deployment with `modal deploy`
|
|
||||||
|
|
||||||
```bash
|
|
||||||
modal deploy server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
Creates persistent endpoint with stable URL.
|
|
||||||
|
|
||||||
## Simple Endpoints
|
## Simple Endpoints
|
||||||
|
|
||||||
### Query Parameters
|
The easiest way to create a web endpoint:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function(image=image)
|
import modal
|
||||||
@modal.fastapi_endpoint()
|
|
||||||
def square(x: int):
|
|
||||||
return {"square": x**2}
|
|
||||||
```
|
|
||||||
|
|
||||||
Call with:
|
app = modal.App("api-service")
|
||||||
```bash
|
|
||||||
curl "https://workspace--app-square.modal.run?x=42"
|
|
||||||
```
|
|
||||||
|
|
||||||
### POST Requests
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(image=image)
|
|
||||||
@modal.fastapi_endpoint(method="POST")
|
|
||||||
def square(item: dict):
|
|
||||||
return {"square": item['x']**2}
|
|
||||||
```
|
|
||||||
|
|
||||||
Call with:
|
|
||||||
```bash
|
|
||||||
curl -X POST -H 'Content-Type: application/json' \
|
|
||||||
--data '{"x": 42}' \
|
|
||||||
https://workspace--app-square.modal.run
|
|
||||||
```
|
|
||||||
|
|
||||||
### Pydantic Models
|
|
||||||
|
|
||||||
```python
|
|
||||||
from pydantic import BaseModel
|
|
||||||
|
|
||||||
class Item(BaseModel):
|
|
||||||
name: str
|
|
||||||
qty: int = 42
|
|
||||||
|
|
||||||
@app.function()
|
@app.function()
|
||||||
@modal.fastapi_endpoint(method="POST")
|
@modal.fastapi_endpoint()
|
||||||
def process(item: Item):
|
def hello(name: str = "World"):
|
||||||
return {"processed": item.name, "quantity": item.qty}
|
return {"message": f"Hello, {name}!"}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### POST Endpoints
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function()
|
||||||
|
@modal.fastapi_endpoint(method="POST")
|
||||||
|
def predict(data: dict):
|
||||||
|
result = model.predict(data["text"])
|
||||||
|
return {"prediction": result}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Query Parameters
|
||||||
|
|
||||||
|
Parameters are automatically parsed from query strings:
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.function()
|
||||||
|
@modal.fastapi_endpoint()
|
||||||
|
def search(query: str, limit: int = 10):
|
||||||
|
return {"results": do_search(query, limit)}
|
||||||
|
```
|
||||||
|
|
||||||
|
Access via: `https://your-app.modal.run?query=hello&limit=5`
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
### Development Mode
|
||||||
|
|
||||||
|
```bash
|
||||||
|
modal serve script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
- Creates a temporary public URL
|
||||||
|
- Hot-reloads on file changes
|
||||||
|
- Perfect for development and testing
|
||||||
|
- URL expires when you stop the command
|
||||||
|
|
||||||
|
### Production Deployment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
modal deploy script.py
|
||||||
|
```
|
||||||
|
|
||||||
|
- Creates a permanent URL
|
||||||
|
- Runs persistently in the cloud
|
||||||
|
- Autoscales based on traffic
|
||||||
|
- URL format: `https://<workspace>--<app-name>-<function-name>.modal.run`
|
||||||
|
|
||||||
## ASGI Apps (FastAPI, Starlette, FastHTML)
|
## ASGI Apps (FastAPI, Starlette, FastHTML)
|
||||||
|
|
||||||
Serve full ASGI applications:
|
For full framework applications, use `@modal.asgi_app`:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = modal.Image.debian_slim().pip_install("fastapi[standard]")
|
from fastapi import FastAPI
|
||||||
|
|
||||||
@app.function(image=image)
|
web_app = FastAPI()
|
||||||
@modal.concurrent(max_inputs=100)
|
|
||||||
|
@web_app.get("/")
|
||||||
|
async def root():
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
|
@web_app.post("/predict")
|
||||||
|
async def predict(request: dict):
|
||||||
|
return {"result": model.run(request["input"])}
|
||||||
|
|
||||||
|
@app.function(image=image, gpu="L40S")
|
||||||
@modal.asgi_app()
|
@modal.asgi_app()
|
||||||
def fastapi_app():
|
def fastapi_app():
|
||||||
from fastapi import FastAPI
|
|
||||||
|
|
||||||
web_app = FastAPI()
|
|
||||||
|
|
||||||
@web_app.get("/")
|
|
||||||
async def root():
|
|
||||||
return {"message": "Hello"}
|
|
||||||
|
|
||||||
@web_app.post("/echo")
|
|
||||||
async def echo(request: Request):
|
|
||||||
body = await request.json()
|
|
||||||
return body
|
|
||||||
|
|
||||||
return web_app
|
return web_app
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### With Class Lifecycle
|
||||||
|
|
||||||
|
```python
|
||||||
|
@app.cls(gpu="L40S", image=image)
|
||||||
|
class InferenceService:
|
||||||
|
@modal.enter()
|
||||||
|
def load_model(self):
|
||||||
|
self.model = load_model()
|
||||||
|
|
||||||
|
@modal.asgi_app()
|
||||||
|
def serve(self):
|
||||||
|
from fastapi import FastAPI
|
||||||
|
app = FastAPI()
|
||||||
|
|
||||||
|
@app.post("/generate")
|
||||||
|
async def generate(request: dict):
|
||||||
|
return self.model.generate(request["prompt"])
|
||||||
|
|
||||||
|
return app
|
||||||
|
```
|
||||||
|
|
||||||
## WSGI Apps (Flask, Django)
|
## WSGI Apps (Flask, Django)
|
||||||
|
|
||||||
Serve synchronous web frameworks:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
image = modal.Image.debian_slim().pip_install("flask")
|
from flask import Flask
|
||||||
|
|
||||||
|
flask_app = Flask(__name__)
|
||||||
|
|
||||||
|
@flask_app.route("/")
|
||||||
|
def index():
|
||||||
|
return {"status": "ok"}
|
||||||
|
|
||||||
@app.function(image=image)
|
@app.function(image=image)
|
||||||
@modal.concurrent(max_inputs=100)
|
|
||||||
@modal.wsgi_app()
|
@modal.wsgi_app()
|
||||||
def flask_app():
|
def flask_server():
|
||||||
from flask import Flask, request
|
return flask_app
|
||||||
|
|
||||||
web_app = Flask(__name__)
|
|
||||||
|
|
||||||
@web_app.post("/echo")
|
|
||||||
def echo():
|
|
||||||
return request.json
|
|
||||||
|
|
||||||
return web_app
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Non-ASGI Web Servers
|
WSGI is synchronous — concurrent inputs run on separate threads.
|
||||||
|
|
||||||
For frameworks with custom network binding:
|
## Custom Web Servers
|
||||||
|
|
||||||
> ⚠️ **Security Note**: The example below uses `shell=True` for simplicity. In production environments, prefer using `subprocess.Popen()` with a list of arguments to prevent command injection vulnerabilities.
|
For non-standard web frameworks (aiohttp, Tornado, TGI):
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function(image=image, gpu="H100")
|
||||||
@modal.concurrent(max_inputs=100)
|
@modal.web_server(port=8000)
|
||||||
@modal.web_server(8000)
|
def serve():
|
||||||
def my_server():
|
|
||||||
import subprocess
|
import subprocess
|
||||||
# Must bind to 0.0.0.0, not 127.0.0.1
|
subprocess.Popen([
|
||||||
# Use list form instead of shell=True for security
|
"python", "-m", "vllm.entrypoints.openai.api_server",
|
||||||
subprocess.Popen(["python", "-m", "http.server", "-d", "/", "8000"])
|
"--model", "meta-llama/Llama-3-70B",
|
||||||
|
"--host", "0.0.0.0", # Must bind to 0.0.0.0, not localhost
|
||||||
|
"--port", "8000",
|
||||||
|
])
|
||||||
```
|
```
|
||||||
|
|
||||||
## Streaming Responses
|
The application must bind to `0.0.0.0` (not `127.0.0.1`).
|
||||||
|
|
||||||
Use FastAPI's `StreamingResponse`:
|
|
||||||
|
|
||||||
```python
|
|
||||||
import time
|
|
||||||
|
|
||||||
def event_generator():
|
|
||||||
for i in range(10):
|
|
||||||
yield f"data: event {i}\n\n".encode()
|
|
||||||
time.sleep(0.5)
|
|
||||||
|
|
||||||
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
|
|
||||||
@modal.fastapi_endpoint()
|
|
||||||
def stream():
|
|
||||||
from fastapi.responses import StreamingResponse
|
|
||||||
return StreamingResponse(
|
|
||||||
event_generator(),
|
|
||||||
media_type="text/event-stream"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Streaming from Modal Functions
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function(gpu="any")
|
|
||||||
def process_gpu():
|
|
||||||
for i in range(10):
|
|
||||||
yield f"data: result {i}\n\n".encode()
|
|
||||||
time.sleep(1)
|
|
||||||
|
|
||||||
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
|
|
||||||
@modal.fastapi_endpoint()
|
|
||||||
def hook():
|
|
||||||
from fastapi.responses import StreamingResponse
|
|
||||||
return StreamingResponse(
|
|
||||||
process_gpu.remote_gen(),
|
|
||||||
media_type="text/event-stream"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
### With .map()
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
def process_segment(i):
|
|
||||||
return f"segment {i}\n"
|
|
||||||
|
|
||||||
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
|
|
||||||
@modal.fastapi_endpoint()
|
|
||||||
def stream_parallel():
|
|
||||||
from fastapi.responses import StreamingResponse
|
|
||||||
return StreamingResponse(
|
|
||||||
process_segment.map(range(10)),
|
|
||||||
media_type="text/plain"
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## WebSockets
|
## WebSockets
|
||||||
|
|
||||||
Supported with `@web_server`, `@asgi_app`, and `@wsgi_app`. Maintains single function call per connection. Use with `@modal.concurrent` for multiple simultaneous connections.
|
Supported with `@modal.asgi_app`, `@modal.wsgi_app`, and `@modal.web_server`:
|
||||||
|
|
||||||
Full WebSocket protocol (RFC 6455) supported. Messages up to 2 MiB each.
|
```python
|
||||||
|
from fastapi import FastAPI, WebSocket
|
||||||
|
|
||||||
|
web_app = FastAPI()
|
||||||
|
|
||||||
|
@web_app.websocket("/ws")
|
||||||
|
async def websocket_endpoint(websocket: WebSocket):
|
||||||
|
await websocket.accept()
|
||||||
|
while True:
|
||||||
|
data = await websocket.receive_text()
|
||||||
|
result = process(data)
|
||||||
|
await websocket.send_text(result)
|
||||||
|
|
||||||
|
@app.function()
|
||||||
|
@modal.asgi_app()
|
||||||
|
def ws_app():
|
||||||
|
return web_app
|
||||||
|
```
|
||||||
|
|
||||||
|
- Full WebSocket protocol (RFC 6455)
|
||||||
|
- Messages up to 2 MiB each
|
||||||
|
- No RFC 8441 or RFC 7692 support yet
|
||||||
|
|
||||||
## Authentication
|
## Authentication
|
||||||
|
|
||||||
### Proxy Auth Tokens
|
### Proxy Auth Tokens (Built-in)
|
||||||
|
|
||||||
First-class authentication via Modal:
|
Modal provides first-class endpoint protection via proxy auth tokens:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function()
|
||||||
@modal.fastapi_endpoint()
|
@modal.fastapi_endpoint()
|
||||||
def protected():
|
def protected(text: str):
|
||||||
return "authenticated!"
|
return {"result": process(text)}
|
||||||
```
|
```
|
||||||
|
|
||||||
Protect with tokens in settings, pass in headers:
|
Clients include `Modal-Key` and `Modal-Secret` headers to authenticate.
|
||||||
- `Modal-Key`
|
|
||||||
- `Modal-Secret`
|
|
||||||
|
|
||||||
### Bearer Token Authentication
|
### Custom Bearer Tokens
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from fastapi import Depends, HTTPException, status
|
from fastapi import Header, HTTPException
|
||||||
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
|
|
||||||
|
|
||||||
auth_scheme = HTTPBearer()
|
@app.function(secrets=[modal.Secret.from_name("auth-secret")])
|
||||||
|
@modal.fastapi_endpoint(method="POST")
|
||||||
@app.function(secrets=[modal.Secret.from_name("auth-token")])
|
def secure_predict(data: dict, authorization: str = Header(None)):
|
||||||
@modal.fastapi_endpoint()
|
|
||||||
async def protected(token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
|
|
||||||
import os
|
import os
|
||||||
if token.credentials != os.environ["AUTH_TOKEN"]:
|
expected = os.environ["AUTH_TOKEN"]
|
||||||
raise HTTPException(
|
if authorization != f"Bearer {expected}":
|
||||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
raise HTTPException(status_code=401, detail="Unauthorized")
|
||||||
detail="Invalid token"
|
return {"result": model.predict(data["text"])}
|
||||||
)
|
|
||||||
return "success!"
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Client IP Address
|
### Client IP Access
|
||||||
|
|
||||||
|
Available for geolocation, rate limiting, and access control.
|
||||||
|
|
||||||
|
## Streaming
|
||||||
|
|
||||||
|
### Server-Sent Events (SSE)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from fastapi import Request
|
from fastapi.responses import StreamingResponse
|
||||||
|
|
||||||
@app.function()
|
@app.function(gpu="H100")
|
||||||
@modal.fastapi_endpoint()
|
@modal.fastapi_endpoint()
|
||||||
def get_ip(request: Request):
|
def stream_generate(prompt: str):
|
||||||
return f"Your IP: {request.client.host}"
|
def generate():
|
||||||
|
for token in model.stream(prompt):
|
||||||
|
yield f"data: {token}\n\n"
|
||||||
|
return StreamingResponse(generate(), media_type="text/event-stream")
|
||||||
```
|
```
|
||||||
|
|
||||||
## Web Endpoint URLs
|
## Concurrency
|
||||||
|
|
||||||
### Auto-Generated URLs
|
Handle multiple requests per container using `@modal.concurrent`:
|
||||||
|
|
||||||
Format: `https://<workspace>--<app>-<function>.modal.run`
|
|
||||||
|
|
||||||
With environment suffix: `https://<workspace>-<suffix>--<app>-<function>.modal.run`
|
|
||||||
|
|
||||||
### Custom Labels
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@app.function()
|
@app.function(gpu="L40S")
|
||||||
@modal.fastapi_endpoint(label="api")
|
@modal.concurrent(max_inputs=10)
|
||||||
def handler():
|
@modal.fastapi_endpoint(method="POST")
|
||||||
...
|
async def batch_predict(data: dict):
|
||||||
# URL: https://workspace--api.modal.run
|
return {"result": await model.predict_async(data["text"])}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Programmatic URL Retrieval
|
## Limits
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
@modal.fastapi_endpoint()
|
|
||||||
def my_endpoint():
|
|
||||||
url = my_endpoint.get_web_url()
|
|
||||||
return {"url": url}
|
|
||||||
|
|
||||||
# From deployed function
|
|
||||||
f = modal.Function.from_name("app-name", "my_endpoint")
|
|
||||||
url = f.get_web_url()
|
|
||||||
```
|
|
||||||
|
|
||||||
### Custom Domains
|
|
||||||
|
|
||||||
Available on Team and Enterprise plans:
|
|
||||||
|
|
||||||
```python
|
|
||||||
@app.function()
|
|
||||||
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
|
|
||||||
def hello(message: str):
|
|
||||||
return {"message": f"hello {message}"}
|
|
||||||
```
|
|
||||||
|
|
||||||
Multiple domains:
|
|
||||||
```python
|
|
||||||
@modal.fastapi_endpoint(custom_domains=["api.example.com", "api.example.net"])
|
|
||||||
```
|
|
||||||
|
|
||||||
Wildcard domains:
|
|
||||||
```python
|
|
||||||
@modal.fastapi_endpoint(custom_domains=["*.example.com"])
|
|
||||||
```
|
|
||||||
|
|
||||||
TLS certificates automatically generated and renewed.
|
|
||||||
|
|
||||||
## Performance
|
|
||||||
|
|
||||||
### Cold Starts
|
|
||||||
|
|
||||||
First request may experience cold start (few seconds). Modal keeps containers alive for subsequent requests.
|
|
||||||
|
|
||||||
### Scaling
|
|
||||||
|
|
||||||
- Autoscaling based on traffic
|
|
||||||
- Use `@modal.concurrent` for multiple requests per container
|
|
||||||
- Beyond concurrency limit, additional containers spin up
|
|
||||||
- Requests queue when at max containers
|
|
||||||
|
|
||||||
### Rate Limits
|
|
||||||
|
|
||||||
Default: 200 requests/second with 5-second burst multiplier
|
|
||||||
- Excess returns 429 status code
|
|
||||||
- Contact support to increase limits
|
|
||||||
|
|
||||||
### Size Limits
|
|
||||||
|
|
||||||
- Request body: up to 4 GiB
|
- Request body: up to 4 GiB
|
||||||
- Response body: unlimited
|
- Response body: unlimited
|
||||||
- WebSocket messages: up to 2 MiB
|
- Rate limit: 200 requests/second (5-second burst for new accounts)
|
||||||
|
- Cold starts occur when no containers are active (use `min_containers` to avoid)
|
||||||
|
|||||||
Reference in New Issue
Block a user