Compare commits

...

9 Commits

Author SHA1 Message Date
Timothy Kassis
71add64426 Remove planning with files skill becasue it is specific to Claude Code 2026-03-25 14:40:17 -07:00
Timothy Kassis
d9b2503391 Make writing skills more explicit 2026-03-25 11:54:41 -07:00
Timothy Kassis
04a7be2319 Add Security Disclaimer section to README
Introduced a new section outlining the security implications of using agent skills, emphasizing the importance of reviewing skills before installation. Included recommendations for safe usage and a reminder of the review process for contributions.
2026-03-25 09:31:25 -07:00
Timothy Kassis
cb364cc3d8 Bump version 2026-03-23 16:27:05 -07:00
Timothy Kassis
f93d13b08e Improve token discovery for Modal 2026-03-23 16:26:43 -07:00
Timothy Kassis
b75f4e8d08 Update Modal skill 2026-03-23 16:21:31 -07:00
Timothy Kassis
71e26ffa6d Add planning with files skill from @OthmanAdi 2026-03-23 14:34:03 -07:00
Timothy Kassis
1531326a59 Add K-Dense BYOK AI co-scientist to README with features and links 2026-03-22 17:50:53 -07:00
Timothy Kassis
903caa6a26 Add writing skills 2026-03-20 09:59:52 -07:00
17 changed files with 2510 additions and 2414 deletions

View File

@@ -6,7 +6,7 @@
}, },
"metadata": { "metadata": {
"description": "Claude scientific skills from K-Dense Inc", "description": "Claude scientific skills from K-Dense Inc",
"version": "2.28.0" "version": "2.31.0"
}, },
"plugins": [ "plugins": [
{ {

View File

@@ -1,5 +1,7 @@
# Claude Scientific Skills # Claude Scientific Skills
> **New: [K-Dense BYOK](https://github.com/K-Dense-AI/k-dense-byok)** — A free, open-source AI co-scientist that runs on your desktop, powered by Claude Scientific Skills. Bring your own API keys, pick from 40+ models, and get a full research workspace with web search, file handling, 250+ scientific databases, and access to all 170+ skills in this repo. Your data stays on your computer, and you can optionally scale to cloud compute via [Modal](https://modal.com/) for heavy workloads. [Get started here.](https://github.com/K-Dense-AI/k-dense-byok)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE.md) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE.md)
[![Skills](https://img.shields.io/badge/Skills-170-brightgreen.svg)](#whats-included) [![Skills](https://img.shields.io/badge/Skills-170-brightgreen.svg)](#whats-included)
[![Databases](https://img.shields.io/badge/Databases-250%2B-orange.svg)](#whats-included) [![Databases](https://img.shields.io/badge/Databases-250%2B-orange.svg)](#whats-included)
@@ -72,6 +74,7 @@ Each skill includes:
- [What's Included](#whats-included) - [What's Included](#whats-included)
- [Why Use This?](#why-use-this) - [Why Use This?](#why-use-this)
- [Getting Started](#getting-started) - [Getting Started](#getting-started)
- [Security Disclaimer](#-security-disclaimer)
- [Support Open Source](#-support-the-open-source-community) - [Support Open Source](#-support-the-open-source-community)
- [Prerequisites](#prerequisites) - [Prerequisites](#prerequisites)
- [Quick Examples](#quick-examples) - [Quick Examples](#quick-examples)
@@ -170,6 +173,30 @@ cp -r /path/to/claude-scientific-skills/scientific-skills/* .cursor/skills/
--- ---
## ⚠️ Security Disclaimer
> **Skills can execute code and influence your coding agent's behavior. Review what you install.**
Agent Skills are powerful — they can instruct your AI agent to run arbitrary code, install packages, make network requests, and modify files on your system. A malicious or poorly written skill has the potential to steer your coding agent into harmful behavior.
We take security seriously. All contributions go through a review process, and we run LLM-based security scans (via [Cisco AI Defense Skill Scanner](https://github.com/cisco-ai-defense/skill-scanner)) on every skill in this repository. However, as a small team with a growing number of community contributions, we cannot guarantee that every skill has been exhaustively reviewed for all possible risks.
**It is ultimately your responsibility to review the skills you install and decide which ones to trust.**
We recommend the following:
- **Do not install everything at once.** Only install the skills you actually need for your work. While installing the full collection was reasonable when K-Dense created and maintained every skill, the repository now includes many community contributions that we may not have reviewed as thoroughly.
- **Read the `SKILL.md` before installing.** Each skill's documentation describes what it does, what packages it uses, and what external services it connects to. If something looks suspicious, don't install it.
- **Check the contribution history.** Skills authored by K-Dense (`K-Dense-AI`) have been through our internal review process. Community-contributed skills have been reviewed to the best of our ability, but with limited resources.
- **Run the security scanner yourself.** Before installing third-party skills, scan them locally:
```bash
uv pip install cisco-ai-skill-scanner
skill-scanner scan /path/to/skill --use-behavioral
```
- **Report anything suspicious.** If you find a skill that looks malicious or behaves unexpectedly, please [open an issue](https://github.com/K-Dense-AI/claude-scientific-skills/issues) immediately so we can investigate.
---
## ❤️ Support the Open Source Community ## ❤️ Support the Open Source Community
Claude Scientific Skills is powered by **50+ incredible open source projects** maintained by dedicated developers and research communities worldwide. Projects like Biopython, Scanpy, RDKit, scikit-learn, PyTorch Lightning, and many others form the foundation of these skills. Claude Scientific Skills is powered by **50+ incredible open source projects** maintained by dedicated developers and research communities worldwide. Projects like Biopython, Scanpy, RDKit, scikit-learn, PyTorch Lightning, and many others form the foundation of these skills.
@@ -187,7 +214,7 @@ Claude Scientific Skills is powered by **50+ incredible open source projects** m
## ⚙️ Prerequisites ## ⚙️ Prerequisites
- **Python**: 3.9+ (3.12+ recommended for best compatibility) - **Python**: 3.11+ (3.12+ recommended for best compatibility)
- **uv**: Python package manager (required for installing skill dependencies) - **uv**: Python package manager (required for installing skill dependencies)
- **Client**: Any agent that supports the [Agent Skills](https://agentskills.io/) standard (Cursor, Claude Code, Gemini CLI, Codex, etc.) - **Client**: Any agent that supports the [Agent Skills](https://agentskills.io/) standard (Cursor, Claude Code, Gemini CLI, Codex, etc.)
- **System**: macOS, Linux, or Windows with WSL2 - **System**: macOS, Linux, or Windows with WSL2

View File

@@ -77,7 +77,7 @@
### Data Management & Infrastructure ### Data Management & Infrastructure
- **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing - **LaminDB** - Open-source data framework for biology that makes data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). Provides unified platform combining lakehouse architecture, lineage tracking, feature stores, biological ontologies (via Bionty plugin with 20+ ontologies: genes, proteins, cell types, tissues, diseases, pathways), LIMS, and ELN capabilities through a single Python API. Key features include: automatic data lineage tracking (code, inputs, outputs, environment), versioned artifacts (DataFrame, AnnData, SpatialData, Parquet, Zarr), schema validation and data curation with standardization/synonym mapping, queryable metadata with feature-based filtering, cross-registry traversal, and streaming for large datasets. Supports integrations with workflow managers (Nextflow, Snakemake, Redun), MLOps platforms (Weights & Biases, MLflow, HuggingFace, scVI-tools), cloud storage (S3, GCS, S3-compatible), array stores (TileDB-SOMA, DuckDB), and visualization (Vitessce). Deployment options: local SQLite, cloud storage with SQLite, or cloud storage with PostgreSQL for production. Use cases: scRNA-seq standardization and analysis, flow cytometry/spatial data management, multi-modal dataset integration, computational workflow tracking with reproducibility, biological ontology-based annotation, data lakehouse construction for unified queries, ML pipeline integration with experiment tracking, and FAIR-compliant dataset publishing
- **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container), persistent storage via Volumes for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs, parallel execution with `.map()` for batch processing, input concurrency for I/O-bound workloads, and resource configuration (CPU cores, memory, disk). Supports custom Docker images, integration with Hugging Face/Weights & Biases, FastAPI for web endpoints, and distributed training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, embeddings), GPU-accelerated training, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation - **Modal** - Serverless cloud platform for running Python code with minimal configuration, specialized for AI/ML workloads and scientific computing. Execute functions on powerful GPUs (T4, L4, A10, A100, L40S, H100, H200, B200, B200+), scale automatically from zero to thousands of containers, and pay only for compute used. Key features include: declarative container image building with uv (recommended)/pip/apt package management, automatic autoscaling with configurable limits and buffer containers, GPU acceleration with multi-GPU support (up to 8 GPUs per container, up to 1,536 GB VRAM), persistent storage via Volumes (v1 and v2) for model weights and datasets, secret management for API keys and credentials, scheduled jobs with cron expressions, web endpoints for deploying serverless APIs (FastAPI, ASGI, WSGI, WebSockets), parallel execution with `.map()` for batch processing, input concurrency and dynamic batching for I/O-bound workloads, and resource configuration (CPU cores, memory, ephemeral disk up to 3 TiB). Supports custom Docker images, Micromamba/Conda environments, integration with Hugging Face/Weights & Biases, and distributed multi-GPU training. Free tier includes $30/month credits. Use cases: ML model deployment and inference (LLMs, image generation, speech, embeddings), GPU-accelerated training and fine-tuning, batch processing large datasets in parallel, scheduled compute-intensive jobs, serverless API deployment with autoscaling, protein folding and computational biology, scientific computing requiring distributed compute or specialized hardware, and data pipeline automation
### Cheminformatics & Drug Discovery ### Cheminformatics & Drug Discovery
- **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines - **Datamol** - Python library for molecular manipulation and featurization built on RDKit with enhanced workflows and performance optimizations. Provides utilities for molecular I/O (reading/writing SMILES, SDF, MOL files), molecular standardization and sanitization, molecular transformations (tautomer enumeration, stereoisomer generation), molecular featurization (descriptors, fingerprints, graph representations), parallel processing for large datasets, and integration with machine learning pipelines. Features include: optimized RDKit operations, caching for repeated computations, molecular filtering and preprocessing, and seamless integration with pandas DataFrames. Designed for drug discovery and cheminformatics workflows requiring efficient processing of large compound libraries. Use cases: molecular preprocessing for ML models, compound library management, molecular similarity searches, and cheminformatics data pipelines

View File

@@ -1,381 +1,406 @@
--- ---
name: modal name: modal
description: Run Python code in the cloud with serverless containers, GPUs, and autoscaling. Use when deploying ML models, running batch processing jobs, scheduling compute-intensive tasks, or serving APIs that require GPU acceleration or dynamic scaling. description: Cloud computing platform for running Python on GPUs and serverless infrastructure. Use when deploying AI/ML models, running GPU-accelerated workloads, serving web endpoints, scheduling batch jobs, or scaling Python code to the cloud. Use this skill whenever the user mentions Modal, serverless GPU compute, deploying ML models to the cloud, serving inference endpoints, running batch processing in the cloud, or needs to scale Python workloads beyond their local machine. Also use when the user wants to run code on H100s, A100s, or other cloud GPUs, or needs to create a web API for a model.
license: Apache-2.0 license license: Apache-2.0
metadata: metadata:
skill-author: K-Dense Inc. skill-author: K-Dense Inc.
--- ---
# Modal # Modal
## Overview ## Overview
Modal is a serverless platform for running Python code in the cloud with minimal configuration. Execute functions on powerful GPUs, scale automatically to thousands of containers, and pay only for compute used. Modal is a cloud platform for running Python code serverlessly, with a focus on AI/ML workloads. Key capabilities:
- **GPU compute** on demand (T4, L4, A10, L40S, A100, H100, H200, B200)
- **Serverless functions** with autoscaling from zero to thousands of containers
- **Custom container images** built entirely in Python code
- **Persistent storage** via Volumes for model weights and datasets
- **Web endpoints** for serving models and APIs
- **Scheduled jobs** via cron or fixed intervals
- **Sub-second cold starts** for low-latency inference
Modal is particularly suited for AI/ML workloads, high-performance batch processing, scheduled jobs, GPU inference, and serverless APIs. Sign up for free at https://modal.com and receive $30/month in credits. Everything in Modal is defined as code — no YAML, no Dockerfiles required (though both are supported).
## When to Use This Skill ## When to Use This Skill
Use Modal for: Use this skill when:
- Deploying and serving ML models (LLMs, image generation, embedding models) - Deploy or serve AI/ML models in the cloud
- Running GPU-accelerated computation (training, inference, rendering) - Run GPU-accelerated computations (training, inference, fine-tuning)
- Batch processing large datasets in parallel - Create serverless web APIs or endpoints
- Scheduling compute-intensive jobs (daily data processing, model training) - Scale batch processing jobs in parallel
- Building serverless APIs that need automatic scaling - Schedule recurring tasks (data pipelines, retraining, scraping)
- Scientific computing requiring distributed compute or specialized hardware - Need persistent cloud storage for model weights or datasets
- Want to run code in custom container environments
- Build job queues or async task processing systems
## Authentication and Setup ## Installation and Authentication
Modal requires authentication via API token. ### Install
### Initial Setup
```bash ```bash
# Install Modal uv pip install modal
uv uv pip install modal
# Authenticate (opens browser for login)
modal token new
``` ```
This creates a token stored in `~/.modal.toml`. The token authenticates all Modal operations. ### Authenticate
### Verify Setup Prefer existing credentials before creating new ones:
1. Check whether `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` are already present in the current environment.
2. If not, check for those values in a local `.env` file and load them if appropriate for the workflow.
3. Only fall back to interactive `modal setup` or generating fresh tokens if neither source already provides credentials.
```bash
modal setup
```
This opens a browser for authentication. For CI/CD or headless environments, use environment variables:
```bash
export MODAL_TOKEN_ID=<your-token-id>
export MODAL_TOKEN_SECRET=<your-token-secret>
```
If tokens are not already available in the environment or `.env`, generate them at https://modal.com/settings
Modal offers a free tier with $30/month in credits.
**Reference**: See `references/getting-started.md` for detailed setup and first app walkthrough.
## Core Concepts
### App and Functions
A Modal `App` groups related functions. Functions decorated with `@app.function()` run remotely in the cloud:
```python ```python
import modal import modal
app = modal.App("test-app") app = modal.App("my-app")
@app.function() @app.function()
def hello(): def square(x):
print("Modal is working!") return x ** 2
```
Run with: `modal run script.py`
## Core Capabilities
Modal provides serverless Python execution through Functions that run in containers. Define compute requirements, dependencies, and scaling behavior declaratively.
### 1. Define Container Images
Specify dependencies and environment for functions using Modal Images.
```python
import modal
# Basic image with Python packages
image = (
modal.Image.debian_slim(python_version="3.12")
.uv_pip_install("torch", "transformers", "numpy")
)
app = modal.App("ml-app", image=image)
```
**Common patterns:**
- Install Python packages: `.uv_pip_install("pandas", "scikit-learn")`
- Install system packages: `.apt_install("ffmpeg", "git")`
- Use existing Docker images: `modal.Image.from_registry("nvidia/cuda:12.1.0-base")`
- Add local code: `.add_local_python_source("my_module")`
See `references/images.md` for comprehensive image building documentation.
### 2. Create Functions
Define functions that run in the cloud with the `@app.function()` decorator.
```python
@app.function()
def process_data(file_path: str):
import pandas as pd
df = pd.read_csv(file_path)
return df.describe()
```
**Call functions:**
```python
# From local entrypoint
@app.local_entrypoint() @app.local_entrypoint()
def main(): def main():
result = process_data.remote("data.csv") # .remote() runs in the cloud
print(result) print(square.remote(42))
``` ```
Run with: `modal run script.py` Run with `modal run script.py`. Deploy with `modal deploy script.py`.
See `references/functions.md` for function patterns, deployment, and parameter handling. **Reference**: See `references/functions.md` for lifecycle hooks, classes, `.map()`, `.spawn()`, and more.
### 3. Request GPUs ### Container Images
Attach GPUs to functions for accelerated computation. Modal builds container images from Python code. The recommended package installer is `uv`:
```python
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("torch==2.8.0", "transformers", "accelerate")
.apt_install("git")
)
@app.function(image=image)
def inference(prompt):
from transformers import pipeline
pipe = pipeline("text-generation", model="meta-llama/Llama-3-8B")
return pipe(prompt)
```
Key image methods:
- `.uv_pip_install()` — Install Python packages with uv (recommended)
- `.pip_install()` — Install with pip (fallback)
- `.apt_install()` — Install system packages
- `.run_commands()` — Run shell commands during build
- `.run_function()` — Run Python during build (e.g., download model weights)
- `.add_local_python_source()` — Add local modules
- `.env()` — Set environment variables
**Reference**: See `references/images.md` for Dockerfiles, micromamba, caching, GPU build steps.
### GPU Compute
Request GPUs via the `gpu` parameter:
```python ```python
@app.function(gpu="H100") @app.function(gpu="H100")
def train_model(): def train_model():
import torch import torch
assert torch.cuda.is_available() device = torch.device("cuda")
# GPU-accelerated code here # GPU training code here
# Multiple GPUs
@app.function(gpu="H100:4")
def distributed_training():
...
# GPU fallback chain
@app.function(gpu=["H100", "A100-80GB", "A100-40GB"])
def flexible_inference():
...
``` ```
**Available GPU types:** Available GPUs: T4, L4, A10, L40S, A100-40GB, A100-80GB, H100, H200, B200, B200+
- `T4`, `L4` - Cost-effective inference
- `A10`, `A100`, `A100-80GB` - Standard training/inference - Up to 8 GPUs per container (except A10: up to 4)
- `L40S` - Excellent cost/performance balance (48GB) - L40S is recommended for inference (cost/performance balance, 48 GB VRAM)
- `H100`, `H200` - High-performance training - H100/A100 can be auto-upgraded to H200/A100-80GB at no extra cost
- `B200` - Flagship performance (most powerful) - Use `gpu="H100!"` to prevent auto-upgrade
**Reference**: See `references/gpu.md` for GPU selection guidance and multi-GPU training.
### Volumes (Persistent Storage)
Volumes provide distributed, persistent file storage:
**Request multiple GPUs:**
```python ```python
@app.function(gpu="H100:8") # 8x H100 GPUs vol = modal.Volume.from_name("model-weights", create_if_missing=True)
def train_large_model():
pass @app.function(volumes={"/data": vol})
def save_model():
# Write to the mounted path
with open("/data/model.pt", "wb") as f:
torch.save(model.state_dict(), f)
@app.function(volumes={"/data": vol})
def load_model():
model.load_state_dict(torch.load("/data/model.pt"))
``` ```
See `references/gpu.md` for GPU selection guidance, CUDA setup, and multi-GPU configuration. - Optimized for write-once, read-many workloads (model weights, datasets)
- CLI access: `modal volume ls`, `modal volume put`, `modal volume get`
- Background auto-commits every few seconds
### 4. Configure Resources **Reference**: See `references/volumes.md` for v2 volumes, concurrent writes, and best practices.
Request CPU cores, memory, and disk for functions. ### Secrets
Securely pass credentials to functions:
```python ```python
@app.function( @app.function(secrets=[modal.Secret.from_name("my-api-keys")])
cpu=8.0, # 8 physical cores def call_api():
memory=32768, # 32 GiB RAM
ephemeral_disk=10240 # 10 GiB disk
)
def memory_intensive_task():
pass
```
Default allocation: 0.125 CPU cores, 128 MiB memory. Billing based on reservation or actual usage, whichever is higher.
See `references/resources.md` for resource limits and billing details.
### 5. Scale Automatically
Modal autoscales functions from zero to thousands of containers based on demand.
**Process inputs in parallel:**
```python
@app.function()
def analyze_sample(sample_id: int):
# Process single sample
return result
@app.local_entrypoint()
def main():
sample_ids = range(1000)
# Automatically parallelized across containers
results = list(analyze_sample.map(sample_ids))
```
**Configure autoscaling:**
```python
@app.function(
max_containers=100, # Upper limit
min_containers=2, # Keep warm
buffer_containers=5 # Idle buffer for bursts
)
def inference():
pass
```
See `references/scaling.md` for autoscaling configuration, concurrency, and scaling limits.
### 6. Store Data Persistently
Use Volumes for persistent storage across function invocations.
```python
volume = modal.Volume.from_name("my-data", create_if_missing=True)
@app.function(volumes={"/data": volume})
def save_results(data):
with open("/data/results.txt", "w") as f:
f.write(data)
volume.commit() # Persist changes
```
Volumes persist data between runs, store model weights, cache datasets, and share data between functions.
See `references/volumes.md` for volume management, commits, and caching patterns.
### 7. Manage Secrets
Store API keys and credentials securely using Modal Secrets.
```python
@app.function(secrets=[modal.Secret.from_name("huggingface")])
def download_model():
import os import os
token = os.environ["HF_TOKEN"] api_key = os.environ["API_KEY"]
# Use token for authentication # Use the key
``` ```
**Create secrets in Modal dashboard or via CLI:** Create secrets via CLI: `modal secret create my-api-keys API_KEY=sk-xxx`
```bash
modal secret create my-secret KEY=value API_TOKEN=xyz
```
See `references/secrets.md` for secret management and authentication patterns. Or from a `.env` file: `modal.Secret.from_dotenv()`
### 8. Deploy Web Endpoints **Reference**: See `references/secrets.md` for dashboard setup, multiple secrets, and templates.
Serve HTTP endpoints, APIs, and webhooks with `@modal.web_endpoint()`. ### Web Endpoints
Serve models and APIs as web endpoints:
```python ```python
@app.function() @app.function()
@modal.web_endpoint(method="POST") @modal.fastapi_endpoint()
def predict(data: dict): def predict(text: str):
# Process request return {"result": model.predict(text)}
result = model.predict(data["input"])
return {"prediction": result}
``` ```
**Deploy with:** - `modal serve script.py` — Development with hot reload and temporary URL
```bash - `modal deploy script.py` — Production deployment with permanent URL
modal deploy script.py - Supports FastAPI, ASGI (Starlette, FastHTML), WSGI (Flask, Django), WebSockets
``` - Request bodies up to 4 GiB, unlimited response size
Modal provides HTTPS URL for the endpoint. **Reference**: See `references/web-endpoints.md` for ASGI/WSGI apps, streaming, auth, and WebSockets.
See `references/web-endpoints.md` for FastAPI integration, streaming, authentication, and WebSocket support. ### Scheduled Jobs
### 9. Schedule Jobs Run functions on a schedule:
Run functions on a schedule with cron expressions.
```python ```python
@app.function(schedule=modal.Cron("0 2 * * *")) # Daily at 2 AM @app.function(schedule=modal.Cron("0 9 * * *")) # Daily at 9 AM UTC
def daily_backup(): def daily_pipeline():
# Backup data # ETL, retraining, scraping, etc.
pass ...
@app.function(schedule=modal.Period(hours=4)) # Every 4 hours @app.function(schedule=modal.Period(hours=6))
def refresh_cache(): def periodic_check():
# Update cache ...
pass
``` ```
Scheduled functions run automatically without manual invocation. Deploy with `modal deploy script.py` to activate the schedule.
See `references/scheduled-jobs.md` for cron syntax, timezone configuration, and monitoring. - `modal.Cron("...")` — Standard cron syntax, stable across deploys
- `modal.Period(hours=N)` — Fixed interval, resets on redeploy
- Monitor runs in the Modal dashboard
## Common Workflows **Reference**: See `references/scheduled-jobs.md` for cron syntax and management.
### Deploy ML Model for Inference ### Scaling and Concurrency
Modal autoscales containers automatically. Configure limits:
```python
@app.function(
max_containers=100, # Upper limit
min_containers=2, # Keep warm for low latency
buffer_containers=5, # Reserve capacity
scaledown_window=300, # Idle seconds before shutdown
)
def process(data):
...
```
Process inputs in parallel with `.map()`:
```python
results = list(process.map([item1, item2, item3, ...]))
```
Enable concurrent request handling per container:
```python
@app.function()
@modal.concurrent(max_inputs=10)
async def handle_request(req):
...
```
**Reference**: See `references/scaling.md` for `.map()`, `.starmap()`, `.spawn()`, and limits.
### Resource Configuration
```python
@app.function(
cpu=4.0, # Physical cores (not vCPUs)
memory=16384, # MiB
ephemeral_disk=51200, # MiB (up to 3 TiB)
timeout=3600, # Seconds
)
def heavy_computation():
...
```
Defaults: 0.125 CPU cores, 128 MiB memory. Billed on max(request, usage).
**Reference**: See `references/resources.md` for limits and billing details.
## Classes with Lifecycle Hooks
For stateful workloads (e.g., loading a model once and serving many requests):
```python
@app.cls(gpu="L40S", image=image)
class Predictor:
@modal.enter()
def load_model(self):
self.model = load_heavy_model() # Runs once on container start
@modal.method()
def predict(self, text: str):
return self.model(text)
@modal.exit()
def cleanup(self):
... # Runs on container shutdown
```
Call with: `Predictor().predict.remote("hello")`
## Common Workflow Patterns
### GPU Model Inference Service
```python ```python
import modal import modal
# Define dependencies app = modal.App("llm-service")
image = modal.Image.debian_slim().uv_pip_install("torch", "transformers")
app = modal.App("llm-inference", image=image)
# Download model at build time image = (
@app.function() modal.Image.debian_slim(python_version="3.11")
def download_model(): .uv_pip_install("vllm")
from transformers import AutoModel )
AutoModel.from_pretrained("bert-base-uncased")
# Serve model @app.cls(gpu="H100", image=image, min_containers=1)
@app.cls(gpu="L40S") class LLMService:
class Model:
@modal.enter() @modal.enter()
def load_model(self): def load(self):
from transformers import pipeline from vllm import LLM
self.pipe = pipeline("text-classification", device="cuda") self.llm = LLM(model="meta-llama/Llama-3-70B")
@modal.method() @modal.method()
def predict(self, text: str): @modal.fastapi_endpoint(method="POST")
return self.pipe(text) def generate(self, prompt: str, max_tokens: int = 256):
outputs = self.llm.generate([prompt], max_tokens=max_tokens)
@app.local_entrypoint() return {"text": outputs[0].outputs[0].text}
def main():
model = Model()
result = model.predict.remote("Modal is great!")
print(result)
``` ```
### Batch Process Large Dataset ### Batch Processing Pipeline
```python ```python
@app.function(cpu=2.0, memory=4096) app = modal.App("batch-pipeline")
def process_file(file_path: str): vol = modal.Volume.from_name("pipeline-data", create_if_missing=True)
@app.function(volumes={"/data": vol}, cpu=4.0, memory=8192)
def process_chunk(chunk_id: int):
import pandas as pd import pandas as pd
df = pd.read_csv(file_path) df = pd.read_parquet(f"/data/input/chunk_{chunk_id}.parquet")
# Process data result = heavy_transform(df)
return df.shape[0] result.to_parquet(f"/data/output/chunk_{chunk_id}.parquet")
return len(result)
@app.local_entrypoint() @app.local_entrypoint()
def main(): def main():
files = ["file1.csv", "file2.csv", ...] # 1000s of files chunk_ids = list(range(100))
# Automatically parallelized across containers results = list(process_chunk.map(chunk_ids))
for count in process_file.map(files): print(f"Processed {sum(results)} total rows")
print(f"Processed {count} rows")
``` ```
### Train Model on GPU ### Scheduled Data Pipeline
```python ```python
app = modal.App("etl-pipeline")
@app.function( @app.function(
gpu="A100:2", # 2x A100 GPUs schedule=modal.Cron("0 */6 * * *"), # Every 6 hours
timeout=3600 # 1 hour timeout secrets=[modal.Secret.from_name("db-credentials")],
) )
def train_model(config: dict): def etl_job():
import torch import os
# Multi-GPU training code db_url = os.environ["DATABASE_URL"]
model = create_model(config) # Extract, transform, load
train(model) ...
return metrics
``` ```
## Reference Documentation ## CLI Reference
Detailed documentation for specific features: | Command | Description |
|---------|-------------|
| `modal setup` | Authenticate with Modal |
| `modal run script.py` | Run a script's local entrypoint |
| `modal serve script.py` | Dev server with hot reload |
| `modal deploy script.py` | Deploy to production |
| `modal volume ls <name>` | List files in a volume |
| `modal volume put <name> <file>` | Upload file to volume |
| `modal volume get <name> <file>` | Download file from volume |
| `modal secret create <name> K=V` | Create a secret |
| `modal secret list` | List secrets |
| `modal app list` | List deployed apps |
| `modal app stop <name>` | Stop a deployed app |
- **`references/getting-started.md`** - Authentication, setup, basic concepts ## Reference Files
- **`references/images.md`** - Image building, dependencies, Dockerfiles
- **`references/functions.md`** - Function patterns, deployment, parameters
- **`references/gpu.md`** - GPU types, CUDA, multi-GPU configuration
- **`references/resources.md`** - CPU, memory, disk management
- **`references/scaling.md`** - Autoscaling, parallel execution, concurrency
- **`references/volumes.md`** - Persistent storage, data management
- **`references/secrets.md`** - Environment variables, authentication
- **`references/web-endpoints.md`** - APIs, webhooks, endpoints
- **`references/scheduled-jobs.md`** - Cron jobs, periodic tasks
- **`references/examples.md`** - Common patterns for scientific computing
## Best Practices Detailed documentation for each topic:
1. **Pin dependencies** in `.uv_pip_install()` for reproducible builds - `references/getting-started.md` — Installation, authentication, first app
2. **Use appropriate GPU types** - L40S for inference, H100/A100 for training - `references/functions.md` — Functions, classes, lifecycle hooks, remote execution
3. **Leverage caching** - Use Volumes for model weights and datasets - `references/images.md` — Container images, package installation, caching
4. **Configure autoscaling** - Set `max_containers` and `min_containers` based on workload - `references/gpu.md` — GPU types, selection, multi-GPU, training
5. **Import packages in function body** if not available locally - `references/volumes.md` — Persistent storage, file management, v2 volumes
6. **Use `.map()` for parallel processing** instead of sequential loops - `references/secrets.md` — Credentials, environment variables, dotenv
7. **Store secrets securely** - Never hardcode API keys - `references/web-endpoints.md` — FastAPI, ASGI/WSGI, streaming, auth, WebSockets
8. **Monitor costs** - Check Modal dashboard for usage and billing - `references/scheduled-jobs.md` Cron, periodic schedules, management
- `references/scaling.md` — Autoscaling, concurrency, .map(), limits
## Troubleshooting - `references/resources.md` — CPU, memory, disk, timeout configuration
- `references/examples.md` — Common use cases and patterns
**"Module not found" errors:** - `references/api_reference.md` — Key API classes and methods
- Add packages to image with `.uv_pip_install("package-name")`
- Import packages inside function body if not available locally
**GPU not detected:**
- Verify GPU specification: `@app.function(gpu="A100")`
- Check CUDA availability: `torch.cuda.is_available()`
**Function timeout:**
- Increase timeout: `@app.function(timeout=3600)`
- Default timeout is 5 minutes
**Volume changes not persisting:**
- Call `volume.commit()` after writing files
- Verify volume mounted correctly in function decorator
For additional help, see Modal documentation at https://modal.com/docs or join Modal Slack community.
Read these files when detailed information is needed beyond this overview.

View File

@@ -1,34 +1,187 @@
# Reference Documentation for Modal # Modal API Reference
This is a placeholder for detailed reference documentation. ## Core Classes
Replace with actual reference content or delete if not needed.
Example real reference docs from other skills: ### modal.App
- product-management/references/communication.md - Comprehensive guide for status updates
- product-management/references/context_building.md - Deep-dive on gathering context
- bigquery/references/ - API references and query examples
## When Reference Docs Are Useful The main unit of deployment. Groups related functions.
Reference docs are ideal for: ```python
- Comprehensive API documentation app = modal.App("my-app")
- Detailed workflow guides ```
- Complex multi-step processes
- Information too lengthy for main SKILL.md
- Content that's only needed for specific use cases
## Structure Suggestions | Method | Description |
|--------|-------------|
| `app.function(**kwargs)` | Decorator to register a function |
| `app.cls(**kwargs)` | Decorator to register a class |
| `app.local_entrypoint()` | Decorator for local entry point |
### API Reference Example ### modal.Function
- Overview
- Authentication
- Endpoints with examples
- Error codes
- Rate limits
### Workflow Guide Example A serverless function backed by an autoscaling container pool.
- Prerequisites
- Step-by-step instructions | Method | Description |
- Common patterns |--------|-------------|
- Troubleshooting | `.remote(*args)` | Execute in the cloud (sync) |
- Best practices | `.local(*args)` | Execute locally |
| `.spawn(*args)` | Execute async, returns `FunctionCall` |
| `.map(inputs)` | Parallel execution over inputs |
| `.starmap(inputs)` | Parallel execution with multiple args |
| `.from_name(app, fn)` | Reference a deployed function |
| `.update_autoscaler(**kwargs)` | Dynamic scaling update |
### modal.Cls
A serverless class with lifecycle hooks.
```python
@app.cls(gpu="L40S")
class MyClass:
@modal.enter()
def setup(self): ...
@modal.method()
def run(self, data): ...
@modal.exit()
def cleanup(self): ...
```
| Decorator | Description |
|-----------|-------------|
| `@modal.enter()` | Container startup hook |
| `@modal.exit()` | Container shutdown hook |
| `@modal.method()` | Expose as callable method |
| `@modal.parameter()` | Class-level parameter |
## Image
### modal.Image
Defines the container environment.
| Method | Description |
|--------|-------------|
| `.debian_slim(python_version=)` | Debian base image |
| `.from_registry(tag)` | Docker Hub image |
| `.from_dockerfile(path)` | Build from Dockerfile |
| `.micromamba(python_version=)` | Conda/mamba base |
| `.uv_pip_install(*pkgs)` | Install with uv (recommended) |
| `.pip_install(*pkgs)` | Install with pip |
| `.pip_install_from_requirements(path)` | Install from file |
| `.apt_install(*pkgs)` | Install system packages |
| `.run_commands(*cmds)` | Run shell commands |
| `.run_function(fn)` | Run Python during build |
| `.add_local_dir(local, remote)` | Add directory |
| `.add_local_file(local, remote)` | Add single file |
| `.add_local_python_source(module)` | Add Python module |
| `.env(dict)` | Set environment variables |
| `.imports()` | Context manager for remote imports |
## Storage
### modal.Volume
Distributed persistent file storage.
```python
vol = modal.Volume.from_name("name", create_if_missing=True)
```
| Method | Description |
|--------|-------------|
| `.from_name(name)` | Reference or create a volume |
| `.commit()` | Force immediate commit |
| `.reload()` | Refresh to see other containers' writes |
Mount: `@app.function(volumes={"/path": vol})`
### modal.NetworkFileSystem
Legacy shared storage (superseded by Volume).
## Secrets
### modal.Secret
Secure credential injection.
| Method | Description |
|--------|-------------|
| `.from_name(name)` | Reference a named secret |
| `.from_dict(dict)` | Create inline (dev only) |
| `.from_dotenv()` | Load from .env file |
Usage: `@app.function(secrets=[modal.Secret.from_name("x")])`
Access in function: `os.environ["KEY"]`
## Scheduling
### modal.Cron
```python
schedule = modal.Cron("0 9 * * *") # Cron syntax
```
### modal.Period
```python
schedule = modal.Period(hours=6) # Fixed interval
```
Usage: `@app.function(schedule=modal.Cron("..."))`
## Web
### Decorators
| Decorator | Description |
|-----------|-------------|
| `@modal.fastapi_endpoint()` | Simple FastAPI endpoint |
| `@modal.asgi_app()` | Full ASGI app (FastAPI, Starlette) |
| `@modal.wsgi_app()` | Full WSGI app (Flask, Django) |
| `@modal.web_server(port=)` | Custom web server |
### Function Modifiers
| Decorator | Description |
|-----------|-------------|
| `@modal.concurrent(max_inputs=)` | Handle multiple inputs per container |
| `@modal.batched(max_batch_size=, wait_ms=)` | Dynamic input batching |
## GPU Strings
| String | GPU |
|--------|-----|
| `"T4"` | NVIDIA T4 16GB |
| `"L4"` | NVIDIA L4 24GB |
| `"A10"` | NVIDIA A10 24GB |
| `"L40S"` | NVIDIA L40S 48GB |
| `"A100-40GB"` | NVIDIA A100 40GB |
| `"A100-80GB"` | NVIDIA A100 80GB |
| `"H100"` | NVIDIA H100 80GB |
| `"H100!"` | H100 (no auto-upgrade) |
| `"H200"` | NVIDIA H200 141GB |
| `"B200"` | NVIDIA B200 192GB |
| `"B200+"` | B200 or B300, B200 price |
| `"H100:4"` | 4x H100 |
## CLI Commands
| Command | Description |
|---------|-------------|
| `modal setup` | Authenticate |
| `modal run <file>` | Run local entrypoint |
| `modal serve <file>` | Dev server with hot reload |
| `modal deploy <file>` | Production deployment |
| `modal app list` | List deployed apps |
| `modal app stop <name>` | Stop an app |
| `modal volume create <name>` | Create volume |
| `modal volume ls <name>` | List volume files |
| `modal volume put <name> <file>` | Upload to volume |
| `modal volume get <name> <file>` | Download from volume |
| `modal secret create <name> K=V` | Create secret |
| `modal secret list` | List secrets |
| `modal secret delete <name>` | Delete secret |
| `modal token set` | Set auth token |

View File

@@ -1,433 +1,266 @@
# Common Patterns for Scientific Computing # Modal Common Examples
## Machine Learning Model Inference ## LLM Inference Service (vLLM)
### Basic Model Serving
```python ```python
import modal import modal
app = modal.App("ml-inference") app = modal.App("vllm-service")
image = ( image = (
modal.Image.debian_slim() modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("torch", "transformers") .uv_pip_install("vllm>=0.6.0")
) )
@app.cls( @app.cls(gpu="H100", image=image, min_containers=1)
image=image, class LLMService:
gpu="L40S",
)
class Model:
@modal.enter() @modal.enter()
def load_model(self): def load(self):
from transformers import AutoModel, AutoTokenizer from vllm import LLM
self.model = AutoModel.from_pretrained("bert-base-uncased") self.llm = LLM(model="meta-llama/Llama-3-70B-Instruct")
self.tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
@modal.method() @modal.method()
def predict(self, text: str): def generate(self, prompt: str, max_tokens: int = 512) -> str:
inputs = self.tokenizer(text, return_tensors="pt") from vllm import SamplingParams
outputs = self.model(**inputs) params = SamplingParams(max_tokens=max_tokens, temperature=0.7)
return outputs.last_hidden_state.mean(dim=1).tolist() outputs = self.llm.generate([prompt], params)
return outputs[0].outputs[0].text
@app.local_entrypoint() @modal.fastapi_endpoint(method="POST")
def main(): def api(self, request: dict):
model = Model() text = self.generate(request["prompt"], request.get("max_tokens", 512))
result = model.predict.remote("Hello world") return {"text": text}
print(result)
``` ```
### Model Serving with Volume ## Image Generation (Flux)
```python ```python
volume = modal.Volume.from_name("models", create_if_missing=True) import modal
MODEL_PATH = "/models"
@app.cls( app = modal.App("image-gen")
image=image,
gpu="A100", image = (
volumes={MODEL_PATH: volume} modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("diffusers", "torch", "transformers", "accelerate")
) )
class ModelServer:
vol = modal.Volume.from_name("flux-weights", create_if_missing=True)
@app.cls(gpu="L40S", image=image, volumes={"/models": vol})
class ImageGenerator:
@modal.enter() @modal.enter()
def load(self): def load(self):
import torch import torch
self.model = torch.load(f"{MODEL_PATH}/model.pt") from diffusers import FluxPipeline
self.model.eval() self.pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-schnell",
torch_dtype=torch.bfloat16,
cache_dir="/models",
).to("cuda")
@modal.method() @modal.method()
def infer(self, data): def generate(self, prompt: str) -> bytes:
import torch image = self.pipe(prompt, num_inference_steps=4, guidance_scale=0.0).images[0]
with torch.no_grad(): import io
return self.model(torch.tensor(data)).tolist() buf = io.BytesIO()
image.save(buf, format="PNG")
return buf.getvalue()
``` ```
## Batch Processing ## Speech Transcription (Whisper)
### Parallel Data Processing
```python ```python
@app.function( import modal
image=modal.Image.debian_slim().uv_pip_install("pandas", "numpy"),
cpu=2.0, app = modal.App("transcription")
memory=8192
image = (
modal.Image.debian_slim(python_version="3.11")
.apt_install("ffmpeg")
.uv_pip_install("openai-whisper", "torch")
) )
def process_batch(batch_id: int):
import pandas as pd
# Load batch @app.cls(gpu="T4", image=image)
df = pd.read_csv(f"s3://bucket/batch_{batch_id}.csv") class Transcriber:
# Process
result = df.apply(lambda row: complex_calculation(row), axis=1)
# Save result
result.to_csv(f"s3://bucket/results_{batch_id}.csv")
return batch_id
@app.local_entrypoint()
def main():
# Process 100 batches in parallel
results = list(process_batch.map(range(100)))
print(f"Processed {len(results)} batches")
```
### Batch Processing with Progress
```python
@app.function()
def process_item(item_id: int):
# Expensive processing
result = compute_something(item_id)
return result
@app.local_entrypoint()
def main():
items = list(range(1000))
print(f"Processing {len(items)} items...")
results = []
for i, result in enumerate(process_item.map(items)):
results.append(result)
if (i + 1) % 100 == 0:
print(f"Completed {i + 1}/{len(items)}")
print("All items processed!")
```
## Data Analysis Pipeline
### ETL Pipeline
```python
volume = modal.Volume.from_name("data-pipeline")
DATA_PATH = "/data"
@app.function(
image=modal.Image.debian_slim().uv_pip_install("pandas", "polars"),
volumes={DATA_PATH: volume},
cpu=4.0,
memory=16384
)
def extract_transform_load():
import polars as pl
# Extract
raw_data = pl.read_csv(f"{DATA_PATH}/raw/*.csv")
# Transform
transformed = (
raw_data
.filter(pl.col("value") > 0)
.group_by("category")
.agg([
pl.col("value").mean().alias("avg_value"),
pl.col("value").sum().alias("total_value")
])
)
# Load
transformed.write_parquet(f"{DATA_PATH}/processed/data.parquet")
volume.commit()
return transformed.shape
@app.function(schedule=modal.Cron("0 2 * * *"))
def daily_pipeline():
result = extract_transform_load.remote()
print(f"Processed data shape: {result}")
```
## GPU-Accelerated Computing
### Distributed Training
```python
@app.function(
gpu="A100:2",
image=modal.Image.debian_slim().uv_pip_install("torch", "accelerate"),
timeout=7200,
)
def train_model():
import torch
from torch.nn.parallel import DataParallel
# Load data
train_loader = get_data_loader()
# Initialize model
model = MyModel()
model = DataParallel(model)
model = model.cuda()
# Train
optimizer = torch.optim.Adam(model.parameters())
for epoch in range(10):
for batch in train_loader:
loss = train_step(model, batch, optimizer)
print(f"Epoch {epoch}, Loss: {loss}")
return "Training complete"
```
### GPU Batch Inference
```python
@app.function(
gpu="L40S",
image=modal.Image.debian_slim().uv_pip_install("torch", "transformers")
)
def batch_inference(texts: list[str]):
from transformers import pipeline
classifier = pipeline("sentiment-analysis", device=0)
results = classifier(texts, batch_size=32)
return results
@app.local_entrypoint()
def main():
# Process 10,000 texts
texts = load_texts()
# Split into chunks of 100
chunks = [texts[i:i+100] for i in range(0, len(texts), 100)]
# Process in parallel on multiple GPUs
all_results = []
for results in batch_inference.map(chunks):
all_results.extend(results)
print(f"Processed {len(all_results)} texts")
```
## Scientific Computing
### Molecular Dynamics Simulation
```python
@app.function(
image=modal.Image.debian_slim().apt_install("openmpi-bin").uv_pip_install("mpi4py", "numpy"),
cpu=16.0,
memory=65536,
timeout=7200,
)
def run_simulation(config: dict):
import numpy as np
# Initialize system
positions = initialize_positions(config["n_particles"])
velocities = initialize_velocities(config["temperature"])
# Run MD steps
for step in range(config["n_steps"]):
forces = compute_forces(positions)
velocities += forces * config["dt"]
positions += velocities * config["dt"]
if step % 1000 == 0:
energy = compute_energy(positions, velocities)
print(f"Step {step}, Energy: {energy}")
return positions, velocities
```
### Distributed Monte Carlo
```python
@app.function(cpu=2.0)
def monte_carlo_trial(trial_id: int, n_samples: int):
import random
count = sum(1 for _ in range(n_samples)
if random.random()**2 + random.random()**2 <= 1)
return count
@app.local_entrypoint()
def estimate_pi():
n_trials = 100
n_samples_per_trial = 1_000_000
# Run trials in parallel
results = list(monte_carlo_trial.map(
range(n_trials),
[n_samples_per_trial] * n_trials
))
total_count = sum(results)
total_samples = n_trials * n_samples_per_trial
pi_estimate = 4 * total_count / total_samples
print(f"Estimated π = {pi_estimate}")
```
## Data Processing with Volumes
### Image Processing Pipeline
```python
volume = modal.Volume.from_name("images")
IMAGE_PATH = "/images"
@app.function(
image=modal.Image.debian_slim().uv_pip_install("Pillow", "numpy"),
volumes={IMAGE_PATH: volume}
)
def process_image(filename: str):
from PIL import Image
import numpy as np
# Load image
img = Image.open(f"{IMAGE_PATH}/raw/{filename}")
# Process
img_array = np.array(img)
processed = apply_filters(img_array)
# Save
result_img = Image.fromarray(processed)
result_img.save(f"{IMAGE_PATH}/processed/{filename}")
return filename
@app.function(volumes={IMAGE_PATH: volume})
def process_all_images():
import os
# Get all images
filenames = os.listdir(f"{IMAGE_PATH}/raw")
# Process in parallel
results = list(process_image.map(filenames))
volume.commit()
return f"Processed {len(results)} images"
```
## Web API for Scientific Computing
```python
image = modal.Image.debian_slim().uv_pip_install("fastapi[standard]", "numpy", "scipy")
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
def compute_statistics(data: dict):
import numpy as np
from scipy import stats
values = np.array(data["values"])
return {
"mean": float(np.mean(values)),
"median": float(np.median(values)),
"std": float(np.std(values)),
"skewness": float(stats.skew(values)),
"kurtosis": float(stats.kurtosis(values))
}
```
## Scheduled Data Collection
```python
@app.function(
schedule=modal.Cron("*/30 * * * *"), # Every 30 minutes
secrets=[modal.Secret.from_name("api-keys")],
volumes={"/data": modal.Volume.from_name("sensor-data")}
)
def collect_sensor_data():
import requests
import json
from datetime import datetime
# Fetch from API
response = requests.get(
"https://api.example.com/sensors",
headers={"Authorization": f"Bearer {os.environ['API_KEY']}"}
)
data = response.json()
# Save with timestamp
timestamp = datetime.now().isoformat()
with open(f"/data/{timestamp}.json", "w") as f:
json.dump(data, f)
volume.commit()
return f"Collected {len(data)} sensor readings"
```
## Best Practices
### Use Classes for Stateful Workloads
```python
@app.cls(gpu="A100")
class ModelService:
@modal.enter() @modal.enter()
def setup(self): def load(self):
# Load once, reuse across requests import whisper
self.model = load_heavy_model() self.model = whisper.load_model("large-v3")
@modal.method() @modal.method()
def predict(self, x): def transcribe(self, audio_path: str) -> dict:
return self.model(x) return self.model.transcribe(audio_path)
``` ```
### Batch Similar Workloads ## Batch Data Processing
```python ```python
@app.function() import modal
def process_many(items: list):
# More efficient than processing one at a time app = modal.App("batch-processor")
return [process(item) for item in items]
image = modal.Image.debian_slim().uv_pip_install("pandas", "pyarrow")
vol = modal.Volume.from_name("batch-data", create_if_missing=True)
@app.function(image=image, volumes={"/data": vol}, cpu=4.0, memory=8192)
def process_chunk(chunk_id: int) -> dict:
import pandas as pd
df = pd.read_parquet(f"/data/input/chunk_{chunk_id:04d}.parquet")
result = df.groupby("category").agg({"value": ["sum", "mean", "count"]})
result.to_parquet(f"/data/output/result_{chunk_id:04d}.parquet")
return {"chunk_id": chunk_id, "rows": len(df)}
@app.local_entrypoint()
def main():
chunk_ids = list(range(500))
results = list(process_chunk.map(chunk_ids))
total = sum(r["rows"] for r in results)
print(f"Processed {total} total rows across {len(results)} chunks")
``` ```
### Use Volumes for Large Datasets ## Web Scraping at Scale
```python ```python
# Store large datasets in volumes, not in image import modal
volume = modal.Volume.from_name("dataset")
@app.function(volumes={"/data": volume}) app = modal.App("scraper")
def train():
data = load_from_volume("/data/training.parquet") image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
model = train_model(data)
@app.function(image=image, retries=3, timeout=60)
def scrape_url(url: str) -> dict:
import httpx
from bs4 import BeautifulSoup
response = httpx.get(url, follow_redirects=True, timeout=30)
soup = BeautifulSoup(response.text, "html.parser")
return {
"url": url,
"title": soup.title.string if soup.title else None,
"text": soup.get_text()[:5000],
}
@app.local_entrypoint()
def main():
urls = ["https://example.com", "https://example.org"] # Your URL list
results = list(scrape_url.map(urls))
for r in results:
print(f"{r['url']}: {r['title']}")
``` ```
### Profile Before Scaling to GPUs ## Protein Structure Prediction
```python ```python
# Test on CPU first import modal
@app.function(cpu=4.0)
def test_pipeline():
...
# Then scale to GPU if needed app = modal.App("protein-folding")
@app.function(gpu="A100")
def gpu_pipeline(): image = (
... modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("chai-lab")
)
vol = modal.Volume.from_name("protein-data", create_if_missing=True)
@app.function(gpu="A100-80GB", image=image, volumes={"/data": vol}, timeout=3600)
def fold_protein(sequence: str) -> str:
from chai_lab.chai1 import run_inference
output = run_inference(
fasta_file=write_fasta(sequence, "/data/input.fasta"),
output_dir="/data/output/",
)
return str(output)
```
## Scheduled ETL Pipeline
```python
import modal
app = modal.App("etl")
image = modal.Image.debian_slim().uv_pip_install("pandas", "sqlalchemy", "psycopg2-binary")
@app.function(
image=image,
schedule=modal.Cron("0 3 * * *"), # 3 AM UTC daily
secrets=[modal.Secret.from_name("database-creds")],
timeout=7200,
)
def daily_etl():
import os
import pandas as pd
from sqlalchemy import create_engine
source = create_engine(os.environ["SOURCE_DB"])
dest = create_engine(os.environ["DEST_DB"])
df = pd.read_sql("SELECT * FROM events WHERE date = CURRENT_DATE - 1", source)
df = transform(df)
df.to_sql("daily_summary", dest, if_exists="append", index=False)
print(f"Loaded {len(df)} rows")
```
## FastAPI with GPU Model
```python
import modal
app = modal.App("api-with-gpu")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("fastapi", "sentence-transformers", "torch")
)
@app.cls(gpu="L40S", image=image, min_containers=1)
class EmbeddingService:
@modal.enter()
def load(self):
from sentence_transformers import SentenceTransformer
self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")
@modal.asgi_app()
def serve(self):
from fastapi import FastAPI
api = FastAPI()
@api.post("/embed")
async def embed(request: dict):
embeddings = self.model.encode(request["texts"])
return {"embeddings": embeddings.tolist()}
@api.get("/health")
async def health():
return {"status": "ok"}
return api
```
## Document OCR Job Queue
```python
import modal
app = modal.App("ocr-queue")
image = modal.Image.debian_slim().uv_pip_install("pytesseract", "Pillow").apt_install("tesseract-ocr")
vol = modal.Volume.from_name("ocr-data", create_if_missing=True)
@app.function(image=image, volumes={"/data": vol})
def ocr_page(image_path: str) -> str:
import pytesseract
from PIL import Image
img = Image.open(image_path)
return pytesseract.image_to_string(img)
@app.function(volumes={"/data": vol})
def process_document(doc_id: str):
import os
pages = sorted(os.listdir(f"/data/docs/{doc_id}/"))
paths = [f"/data/docs/{doc_id}/{p}" for p in pages]
texts = list(ocr_page.map(paths))
full_text = "\n\n".join(texts)
with open(f"/data/results/{doc_id}.txt", "w") as f:
f.write(full_text)
return {"doc_id": doc_id, "pages": len(texts)}
``` ```

View File

@@ -1,274 +1,260 @@
# Modal Functions # Modal Functions and Classes
## Basic Function Definition ## Table of Contents
Decorate Python functions with `@app.function()`: - [Functions](#functions)
- [Remote Execution](#remote-execution)
- [Classes with Lifecycle Hooks](#classes-with-lifecycle-hooks)
- [Parallel Execution](#parallel-execution)
- [Async Functions](#async-functions)
- [Local Entrypoints](#local-entrypoints)
- [Generators](#generators)
## Functions
### Basic Function
```python ```python
import modal import modal
app = modal.App(name="my-app") app = modal.App("my-app")
@app.function() @app.function()
def my_function(): def compute(x: int, y: int) -> int:
print("Hello from Modal!") return x + y
return "result"
``` ```
## Calling Functions ### Function Parameters
### Remote Execution The `@app.function()` decorator accepts:
Call `.remote()` to run on Modal: | Parameter | Type | Description |
|-----------|------|-------------|
| `image` | `Image` | Container image |
| `gpu` | `str` | GPU type (e.g., `"H100"`, `"A100:2"`) |
| `cpu` | `float` | CPU cores |
| `memory` | `int` | Memory in MiB |
| `timeout` | `int` | Max execution time in seconds |
| `secrets` | `list[Secret]` | Secrets to inject |
| `volumes` | `dict[str, Volume]` | Volumes to mount |
| `schedule` | `Schedule` | Cron or periodic schedule |
| `max_containers` | `int` | Max container count |
| `min_containers` | `int` | Minimum warm containers |
| `retries` | `int` | Retry count on failure |
| `concurrency_limit` | `int` | Max concurrent inputs |
| `ephemeral_disk` | `int` | Disk in MiB |
## Remote Execution
### `.remote()` — Synchronous Call
```python ```python
@app.local_entrypoint() result = compute.remote(3, 4) # Runs in the cloud, blocks until done
def main():
result = my_function.remote()
print(result)
``` ```
### Local Execution ### `.local()` — Local Execution
Call `.local()` to run locally (useful for testing):
```python ```python
result = my_function.local() result = compute.local(3, 4) # Runs locally (for testing)
``` ```
## Function Parameters ### `.spawn()` — Async Fire-and-Forget
Functions accept standard Python arguments:
```python ```python
@app.function() call = compute.spawn(3, 4) # Returns immediately
def process(x: int, y: str): # ... do other work ...
return f"{y}: {x * 2}" result = call.get() # Retrieve result later
@app.local_entrypoint()
def main():
result = process.remote(42, "answer")
``` ```
## Deployment `.spawn()` supports up to 1 million pending inputs.
### Ephemeral Apps ## Classes with Lifecycle Hooks
Run temporarily: Use `@app.cls()` for stateful workloads where you want to load resources once:
```bash
modal run script.py
```
### Deployed Apps
Deploy persistently:
```bash
modal deploy script.py
```
Access deployed functions from other code:
```python ```python
f = modal.Function.from_name("my-app", "my_function") @app.cls(gpu="L40S", image=image)
result = f.remote(args) class Model:
@modal.enter()
def setup(self):
"""Runs once when the container starts."""
import torch
self.model = torch.load("/weights/model.pt")
self.model.eval()
@modal.method()
def predict(self, text: str) -> dict:
"""Callable remotely."""
return self.model(text)
@modal.exit()
def teardown(self):
"""Runs when the container shuts down."""
cleanup_resources()
``` ```
## Entrypoints ### Lifecycle Decorators
### Local Entrypoint | Decorator | When It Runs |
|-----------|-------------|
| `@modal.enter()` | Once on container startup, before any inputs |
| `@modal.method()` | For each remote call |
| `@modal.exit()` | On container shutdown |
Code that runs on local machine: ### Calling Class Methods
```python ```python
@app.local_entrypoint() # Create instance and call method
def main(): model = Model()
result = my_function.remote() result = model.predict.remote("Hello world")
print(result)
# Parallel calls
results = list(model.predict.map(["text1", "text2", "text3"]))
``` ```
### Remote Entrypoint ### Parameterized Classes
Use `@app.function()` without local_entrypoint - runs entirely on Modal:
```python ```python
@app.function() @app.cls()
def train_model(): class Worker:
# All code runs in Modal model_name: str = modal.parameter()
...
```
Invoke with: @modal.enter()
```bash def load(self):
modal run script.py::app.train_model self.model = load_model(self.model_name)
```
## Argument Parsing @modal.method()
def run(self, data):
return self.model(data)
Entrypoints with primitive type arguments get automatic CLI parsing: # Different model instances autoscale independently
gpt = Worker(model_name="gpt-4")
```python llama = Worker(model_name="llama-3")
@app.local_entrypoint()
def main(foo: int, bar: str):
some_function.remote(foo, bar)
```
Run with:
```bash
modal run script.py --foo 1 --bar "hello"
```
For custom parsing, accept variable-length arguments:
```python
import argparse
@app.function()
def train(*arglist):
parser = argparse.ArgumentParser()
parser.add_argument("--foo", type=int)
args = parser.parse_args(args=arglist)
```
## Function Configuration
Common parameters:
```python
@app.function(
image=my_image, # Custom environment
gpu="A100", # GPU type
cpu=2.0, # CPU cores
memory=4096, # Memory in MB
timeout=3600, # Timeout in seconds
retries=3, # Number of retries
secrets=[my_secret], # Environment secrets
volumes={"/data": vol}, # Persistent storage
)
def my_function():
...
``` ```
## Parallel Execution ## Parallel Execution
### Map ### `.map()` — Parallel Processing
Run function on multiple inputs in parallel: Process multiple inputs across containers:
```python ```python
@app.function() @app.function()
def evaluate_model(x): def process(item):
return x ** 2 return heavy_computation(item)
@app.local_entrypoint() @app.local_entrypoint()
def main(): def main():
inputs = list(range(100)) items = list(range(1000))
for result in evaluate_model.map(inputs): results = list(process.map(items))
print(result) print(f"Processed {len(results)} items")
``` ```
### Starmap - Results are returned in the same order as inputs
- Modal autoscales containers to handle the workload
- Use `return_exceptions=True` to collect errors instead of raising
For functions with multiple arguments: ### `.starmap()` — Multi-Argument Parallel
```python ```python
@app.function() @app.function()
def add(a, b): def add(x, y):
return a + b return x + y
@app.local_entrypoint() results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
def main(): # [3, 7, 11]
results = list(add.starmap([(1, 2), (3, 4)]))
# [3, 7]
``` ```
### Exception Handling ### `.map()` with `order_outputs=False`
For faster throughput when order doesn't matter:
```python ```python
results = my_func.map( for result in process.map(items, order_outputs=False):
range(3), handle(result) # Results arrive as they complete
return_exceptions=True,
wrap_returned_exceptions=False
)
# [0, 1, Exception('error')]
``` ```
## Async Functions ## Async Functions
Define async functions: Modal supports async/await natively:
```python ```python
@app.function() @app.function()
async def async_function(x: int): async def fetch_data(url: str) -> str:
await asyncio.sleep(1) import httpx
return x * 2 async with httpx.AsyncClient() as client:
response = await client.get(url)
@app.local_entrypoint() return response.text
async def main():
result = await async_function.remote.aio(42)
``` ```
## Generator Functions Async functions are especially useful with `@modal.concurrent()` for handling multiple requests per container.
Return iterators for streaming results: ## Local Entrypoints
The `@app.local_entrypoint()` runs on your machine and orchestrates remote calls:
```python
@app.local_entrypoint()
def main():
# This code runs locally
data = load_local_data()
# These calls run in the cloud
results = list(process.map(data))
# Back to local
save_results(results)
```
You can also define multiple entrypoints and select by function name:
```bash
modal run script.py::train
modal run script.py::evaluate
```
## Generators
Functions can yield results as they're produced:
```python ```python
@app.function() @app.function()
def generate_data(): def generate_data():
for i in range(10): for i in range(100):
yield i yield process(i)
@app.local_entrypoint() @app.local_entrypoint()
def main(): def main():
for value in generate_data.remote_gen(): for result in generate_data.remote_gen():
print(value) print(result)
``` ```
## Spawning Functions ## Retries
Submit functions for background execution: Configure automatic retries on failure:
```python ```python
@app.function() @app.function(retries=3)
def process_job(data): def flaky_operation():
# Long-running job ...
return result
@app.local_entrypoint()
def main():
# Spawn without waiting
call = process_job.spawn(data)
# Get result later
result = call.get(timeout=60)
``` ```
## Programmatic Execution For more control, use `modal.Retries`:
Run apps programmatically:
```python ```python
def main(): @app.function(retries=modal.Retries(max_retries=3, backoff_coefficient=2.0))
with modal.enable_output(): def api_call():
with app.run(): ...
result = some_function.remote()
``` ```
## Specifying Entrypoint ## Timeouts
With multiple functions, specify which to run: Set maximum execution time:
```python ```python
@app.function() @app.function(timeout=3600) # 1 hour
def f(): def long_training():
print("Function f") ...
@app.function()
def g():
print("Function g")
``` ```
Run specific function: Default timeout is 300 seconds (5 minutes). Maximum is 86400 seconds (24 hours).
```bash
modal run script.py::app.f
modal run script.py::app.g
```

View File

@@ -1,92 +1,175 @@
# Getting Started with Modal # Modal Getting Started Guide
## Sign Up ## Installation
Sign up for free at https://modal.com and get $30/month of credits. Install Modal using uv (recommended) or pip:
```bash
# Recommended
uv pip install modal
# Alternative
pip install modal
```
## Authentication ## Authentication
Set up authentication using the Modal CLI: ### Interactive Setup
```bash ```bash
modal token new modal setup
``` ```
This creates credentials in `~/.modal.toml`. Alternatively, set environment variables: This opens a browser for authentication and stores credentials locally.
- `MODAL_TOKEN_ID`
- `MODAL_TOKEN_SECRET`
## Basic Concepts ### Headless / CI/CD Setup
### Modal is Serverless For environments without a browser, use token-based authentication:
Modal is a serverless platform - only pay for resources used and spin up containers on demand in seconds. 1. Generate tokens at https://modal.com/settings
2. Set environment variables:
### Core Components ```bash
export MODAL_TOKEN_ID=<your-token-id>
export MODAL_TOKEN_SECRET=<your-token-secret>
```
**App**: Represents an application running on Modal, grouping one or more Functions for atomic deployment. Or use the CLI:
**Function**: Acts as an independent unit that scales up and down independently. No containers run (and no charges) when there are no live inputs. ```bash
modal token set --token-id <id> --token-secret <secret>
```
**Image**: The environment code runs in - a container snapshot with dependencies installed. ### Free Tier
## First Modal App Modal provides $30/month in free credits. No credit card required for the free tier.
Create a file `hello_modal.py`: ## Your First App
### Hello World
Create a file `hello.py`:
```python ```python
import modal import modal
app = modal.App(name="hello-modal") app = modal.App("hello-world")
@app.function() @app.function()
def hello(): def greet(name: str) -> str:
print("Hello from Modal!") return f"Hello, {name}! This ran in the cloud."
return "success"
@app.local_entrypoint() @app.local_entrypoint()
def main(): def main():
hello.remote() result = greet.remote("World")
print(result)
``` ```
Run with: Run it:
```bash ```bash
modal run hello_modal.py modal run hello.py
``` ```
## Running Apps What happens:
1. Modal packages your code
2. Creates a container in the cloud
3. Executes `greet()` remotely
4. Returns the result to your local machine
### Ephemeral Apps (Development) ### Understanding the Flow
Run temporarily with `modal run`: - `modal.App("name")` — Creates a named application
```bash - `@app.function()` — Marks a function for remote execution
modal run script.py - `@app.local_entrypoint()` — Defines the local entry point (runs on your machine)
- `.remote()` — Calls the function in the cloud
- `.local()` — Calls the function locally (for testing)
### Running Modes
| Command | Description |
|---------|-------------|
| `modal run script.py` | Run the `@app.local_entrypoint()` function |
| `modal serve script.py` | Start a dev server with hot reload (for web endpoints) |
| `modal deploy script.py` | Deploy to production (persistent) |
### A Simple Web Scraper
```python
import modal
app = modal.App("web-scraper")
image = modal.Image.debian_slim().uv_pip_install("httpx", "beautifulsoup4")
@app.function(image=image)
def scrape(url: str) -> str:
import httpx
from bs4 import BeautifulSoup
response = httpx.get(url)
soup = BeautifulSoup(response.text, "html.parser")
return soup.get_text()[:1000]
@app.local_entrypoint()
def main():
result = scrape.remote("https://example.com")
print(result)
``` ```
The app stops when the script exits. Use `--detach` to keep running after client exits. ### GPU-Accelerated Inference
### Deployed Apps (Production) ```python
import modal
Deploy persistently with `modal deploy`: app = modal.App("gpu-inference")
```bash
modal deploy script.py image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("torch", "transformers", "accelerate")
)
@app.function(gpu="L40S", image=image)
def generate(prompt: str) -> str:
from transformers import pipeline
pipe = pipeline("text-generation", model="gpt2", device="cuda")
result = pipe(prompt, max_length=100)
return result[0]["generated_text"]
@app.local_entrypoint()
def main():
print(generate.remote("The future of AI is"))
``` ```
View deployed apps at https://modal.com/apps or with: ## Project Structure
```bash
modal app list Modal apps are typically single Python files, but can be organized into modules:
```
my-project/
├── app.py # Main app with @app.local_entrypoint()
├── inference.py # Inference functions
├── training.py # Training functions
└── common.py # Shared utilities
``` ```
Stop deployed apps: Use `modal.Image.add_local_python_source()` to include local modules in the container image.
```bash
modal app stop app-name
```
## Key Features ## Key Concepts Summary
- **Fast prototyping**: Write Python, run on GPUs in seconds | Concept | What It Does |
- **Serverless APIs**: Create web endpoints with a decorator |---------|-------------|
- **Scheduled jobs**: Run cron jobs in the cloud | `App` | Groups related functions into a deployable unit |
- **GPU inference**: Access T4, L4, A10, A100, H100, H200, B200 GPUs | `Function` | A serverless function backed by autoscaling containers |
- **Distributed volumes**: Persistent storage for ML models | `Image` | Defines the container environment (packages, files) |
- **Sandboxes**: Secure containers for untrusted code | `Volume` | Persistent distributed file storage |
| `Secret` | Secure credential injection |
| `Schedule` | Cron or periodic job scheduling |
| `gpu` | GPU type/count for the function |
## Next Steps
- See `functions.md` for advanced function patterns
- See `images.md` for custom container environments
- See `gpu.md` for GPU selection and configuration
- See `web-endpoints.md` for serving APIs

View File

@@ -1,168 +1,174 @@
# GPU Acceleration on Modal # Modal GPU Compute
## Quick Start ## Table of Contents
Run functions on GPUs with the `gpu` parameter: - [Available GPUs](#available-gpus)
- [Requesting GPUs](#requesting-gpus)
- [GPU Selection Guide](#gpu-selection-guide)
- [Multi-GPU](#multi-gpu)
- [GPU Fallback Chains](#gpu-fallback-chains)
- [Auto-Upgrades](#auto-upgrades)
- [Multi-GPU Training](#multi-gpu-training)
```python ## Available GPUs
import modal
image = modal.Image.debian_slim().pip_install("torch") | GPU | VRAM | Max per Container | Best For |
app = modal.App(image=image) |-----|------|-------------------|----------|
| T4 | 16 GB | 8 | Budget inference, small models |
| L4 | 24 GB | 8 | Inference, video processing |
| A10 | 24 GB | 4 | Inference, fine-tuning small models |
| L40S | 48 GB | 8 | Inference (best cost/perf), medium models |
| A100-40GB | 40 GB | 8 | Training, large model inference |
| A100-80GB | 80 GB | 8 | Training, large models |
| RTX-PRO-6000 | 48 GB | 8 | Rendering, inference |
| H100 | 80 GB | 8 | Large-scale training, fast inference |
| H200 | 141 GB | 8 | Very large models, training |
| B200 | 192 GB | 8 | Largest models, maximum throughput |
| B200+ | 192 GB | 8 | B200 or B300, B200 pricing |
@app.function(gpu="A100") ## Requesting GPUs
def run():
import torch
assert torch.cuda.is_available()
```
## Available GPU Types ### Basic Request
Modal supports the following GPUs:
- `T4` - Entry-level GPU
- `L4` - Balanced performance and cost
- `A10` - Up to 4 GPUs, 96 GB total
- `A100` - 40GB or 80GB variants
- `A100-40GB` - Specific 40GB variant
- `A100-80GB` - Specific 80GB variant
- `L40S` - 48 GB, excellent for inference
- `H100` / `H100!` - Top-tier Hopper architecture
- `H200` - Improved Hopper with more memory
- `B200` - Latest Blackwell architecture
See https://modal.com/pricing for pricing.
## GPU Count
Request multiple GPUs per container with `:n` syntax:
```python
@app.function(gpu="H100:8")
def run_llama_405b():
# 8 H100 GPUs available
...
```
Supported counts:
- B200, H200, H100, A100, L4, T4, L40S: up to 8 GPUs (up to 1,536 GB)
- A10: up to 4 GPUs (up to 96 GB)
Note: Requesting >2 GPUs may result in longer wait times.
## GPU Selection Guide
**For Inference (Recommended)**: Start with L40S
- Excellent cost/performance
- 48 GB memory
- Good for LLaMA, Stable Diffusion, etc.
**For Training**: Consider H100 or A100
- High compute throughput
- Large memory for batch processing
**For Memory-Bound Tasks**: H200 or A100-80GB
- More memory capacity
- Better for large models
## B200 GPUs
NVIDIA's flagship Blackwell chip:
```python
@app.function(gpu="B200:8")
def run_deepseek():
# Most powerful option
...
```
## H200 and H100 GPUs
Hopper architecture GPUs with excellent software support:
```python ```python
@app.function(gpu="H100") @app.function(gpu="H100")
def train(): def train():
... import torch
assert torch.cuda.is_available()
print(f"Using: {torch.cuda.get_device_name(0)}")
``` ```
### Automatic H200 Upgrades ### String Shorthand
Modal may upgrade `gpu="H100"` to H200 at no extra cost. H200 provides:
- 141 GB memory (vs 80 GB for H100)
- 4.8 TB/s bandwidth (vs 3.35 TB/s)
To avoid automatic upgrades (e.g., for benchmarking):
```python
@app.function(gpu="H100!")
def benchmark():
...
```
## A100 GPUs
Ampere architecture with 40GB or 80GB variants:
```python ```python
# May be automatically upgraded to 80GB gpu="T4" # Single T4
@app.function(gpu="A100") gpu="A100-80GB" # Single A100 80GB
def qwen_7b(): gpu="H100:4" # Four H100s
...
# Specific variants
@app.function(gpu="A100-40GB")
def model_40gb():
...
@app.function(gpu="A100-80GB")
def llama_70b():
...
``` ```
## GPU Fallbacks ### GPU Object (Advanced)
Specify multiple GPU types with fallback:
```python ```python
@app.function(gpu=["H100", "A100-40GB:2"]) @app.function(gpu=modal.gpu.H100(count=2))
def run_on_80gb(): def multi_gpu():
# Tries H100 first, falls back to 2x A100-40GB
... ...
``` ```
Modal respects ordering and allocates most preferred available GPU. ## GPU Selection Guide
### For Inference
| Model Size | Recommended GPU | Why |
|-----------|----------------|-----|
| < 7B params | T4, L4 | Cost-effective, sufficient VRAM |
| 7B-13B params | L40S | Best cost/performance, 48 GB VRAM |
| 13B-70B params | A100-80GB, H100 | Large VRAM, fast memory bandwidth |
| 70B+ params | H100:2+, H200, B200 | Multi-GPU or very large VRAM |
### For Training
| Task | Recommended GPU |
|------|----------------|
| Fine-tuning (LoRA) | L40S, A100-40GB |
| Full fine-tuning small models | A100-80GB |
| Full fine-tuning large models | H100:4+, H200 |
| Pre-training | H100:8, B200:8 |
### General Recommendation
L40S is the best default for inference workloads — it offers an excellent trade-off of cost and performance with 48 GB of GPU RAM.
## Multi-GPU
Request multiple GPUs by appending `:count`:
```python
@app.function(gpu="H100:4")
def distributed():
import torch
print(f"GPUs available: {torch.cuda.device_count()}")
# All 4 GPUs are on the same physical machine
```
- Up to 8 GPUs for most types (up to 4 for A10)
- All GPUs attach to the same physical machine
- Requesting more than 2 GPUs may result in longer wait times
- Maximum VRAM: 8 x B200 = 1,536 GB
## GPU Fallback Chains
Specify a prioritized list of GPU types:
```python
@app.function(gpu=["H100", "A100-80GB", "L40S"])
def flexible():
# Modal tries H100 first, then A100-80GB, then L40S
...
```
Useful for reducing queue times when a specific GPU isn't available.
## Auto-Upgrades
### H100 → H200
Modal may automatically upgrade H100 requests to H200 at no extra cost. To prevent this:
```python
@app.function(gpu="H100!") # Exclamation mark prevents auto-upgrade
def must_use_h100():
...
```
### A100 → A100-80GB
A100-40GB requests may be upgraded to 80GB at no extra cost.
### B200+
`gpu="B200+"` allows Modal to run on B200 or B300 GPUs at B200 pricing. Requires CUDA 13.0+.
## Multi-GPU Training ## Multi-GPU Training
Modal supports multi-GPU training on a single node. Multi-node training is in closed beta. Modal supports multi-GPU training on a single node. Multi-node training is in private beta.
### PyTorch Example ### PyTorch DDP Example
For frameworks that re-execute entrypoints, use subprocess or specific strategies:
```python ```python
@app.function(gpu="A100:2") @app.function(gpu="H100:4", image=image, timeout=86400)
def train(): def train_distributed():
import subprocess import torch
import sys import torch.distributed as dist
subprocess.run(
["python", "train.py"], dist.init_process_group(backend="nccl")
stdout=sys.stdout, local_rank = int(os.environ.get("LOCAL_RANK", 0))
stderr=sys.stderr, device = torch.device(f"cuda:{local_rank}")
check=True, # ... training loop with DDP ...
)
``` ```
For PyTorch Lightning, set strategy to `ddp_spawn` or `ddp_notebook`. ### PyTorch Lightning
## Performance Considerations When using frameworks that re-execute Python entrypoints (like PyTorch Lightning), either:
**Memory-Bound vs Compute-Bound**: 1. Set strategy to `ddp_spawn` or `ddp_notebook`
- Running models with small batch sizes is memory-bound 2. Or run training as a subprocess
- Newer GPUs have faster arithmetic than memory access
- Speedup from newer hardware may not justify cost for memory-bound workloads
**Optimization**: ```python
- Use batching when possible @app.function(gpu="H100:4", image=image)
- Consider L40S before jumping to H100/B200 def train():
- Profile to identify bottlenecks import subprocess
subprocess.run(["python", "train_script.py"], check=True)
```
### Hugging Face Accelerate
```python
@app.function(gpu="A100-80GB:4", image=image)
def finetune():
import subprocess
subprocess.run([
"accelerate", "launch",
"--num_processes", "4",
"train.py"
], check=True)
```

View File

@@ -1,261 +1,259 @@
# Modal Images # Modal Container Images
## Table of Contents
- [Overview](#overview)
- [Base Images](#base-images)
- [Installing Packages](#installing-packages)
- [System Packages](#system-packages)
- [Shell Commands](#shell-commands)
- [Running Python During Build](#running-python-during-build)
- [Adding Local Files](#adding-local-files)
- [Environment Variables](#environment-variables)
- [Dockerfiles](#dockerfiles)
- [Alternative Package Managers](#alternative-package-managers)
- [Image Caching](#image-caching)
- [Handling Remote-Only Imports](#handling-remote-only-imports)
## Overview ## Overview
Modal Images define the environment code runs in - containers with dependencies installed. Images are built from method chains starting from a base image. Every Modal function runs inside a container built from an `Image`. By default, Modal uses a Debian Linux image with the same Python minor version as your local interpreter.
Images are built lazily — Modal only builds/pulls the image when a function using it is first invoked. Layers are cached for fast rebuilds.
## Base Images ## Base Images
Start with a base image and chain methods: ```python
# Default: Debian slim with your local Python version
image = modal.Image.debian_slim()
# Specific Python version
image = modal.Image.debian_slim(python_version="3.11")
# From Docker Hub
image = modal.Image.from_registry("nvidia/cuda:12.4.0-devel-ubuntu22.04")
# From a Dockerfile
image = modal.Image.from_dockerfile("./Dockerfile")
```
## Installing Packages
### uv (Recommended)
`uv_pip_install` uses the uv package manager for fast, reliable installs:
```python ```python
image = ( image = (
modal.Image.debian_slim(python_version="3.13") modal.Image.debian_slim(python_version="3.11")
.apt_install("git") .uv_pip_install(
.uv_pip_install("torch<3") "torch==2.8.0",
.env({"HALT_AND_CATCH_FIRE": "0"}) "transformers>=4.40",
.run_commands("git clone https://github.com/modal-labs/agi") "accelerate",
) "scipy",
```
Available base images:
- `Image.debian_slim()` - Debian Linux with Python
- `Image.micromamba()` - Base with Micromamba package manager
- `Image.from_registry()` - Pull from Docker Hub, ECR, etc.
- `Image.from_dockerfile()` - Build from existing Dockerfile
## Installing Python Packages
### With uv (Recommended)
Use `.uv_pip_install()` for fast package installation:
```python
image = (
modal.Image.debian_slim()
.uv_pip_install("pandas==2.2.0", "numpy")
)
```
### With pip
Fallback to standard pip if needed:
```python
image = (
modal.Image.debian_slim(python_version="3.13")
.pip_install("pandas==2.2.0", "numpy")
)
```
Pin dependencies tightly (e.g., `"torch==2.8.0"`) for reproducibility.
## Installing System Packages
Install Linux packages with apt:
```python
image = modal.Image.debian_slim().apt_install("git", "curl")
```
## Setting Environment Variables
Pass a dictionary to `.env()`:
```python
image = modal.Image.debian_slim().env({"PORT": "6443"})
```
## Running Shell Commands
Execute commands during image build:
```python
image = (
modal.Image.debian_slim()
.apt_install("git")
.run_commands("git clone https://github.com/modal-labs/gpu-glossary")
)
```
## Running Python Functions at Build Time
Download model weights or perform setup:
```python
def download_models():
import diffusers
model_name = "segmind/small-sd"
pipe = diffusers.StableDiffusionPipeline.from_pretrained(model_name)
hf_cache = modal.Volume.from_name("hf-cache")
image = (
modal.Image.debian_slim()
.pip_install("diffusers[torch]", "transformers")
.run_function(
download_models,
secrets=[modal.Secret.from_name("huggingface-secret")],
volumes={"/root/.cache/huggingface": hf_cache},
) )
) )
``` ```
## Adding Local Files Pin versions for reproducibility. uv resolves dependencies faster than pip.
### Add Files or Directories ### pip (Fallback)
```python ```python
image = modal.Image.debian_slim().add_local_dir( image = modal.Image.debian_slim().pip_install(
"/user/erikbern/.aws", "numpy==1.26.0",
remote_path="/root/.aws" "pandas==2.1.0",
) )
``` ```
By default, files are added at container startup. Use `copy=True` to include in built image. ### From requirements.txt
### Add Python Source
Add importable Python modules:
```python ```python
image = modal.Image.debian_slim().add_local_python_source("local_module") image = modal.Image.debian_slim().pip_install_from_requirements("requirements.txt")
@app.function(image=image)
def f():
import local_module
local_module.do_stuff()
``` ```
## Using Existing Container Images ### Private Packages
### From Public Registry
```python
sklearn_image = modal.Image.from_registry("huanjason/scikit-learn")
@app.function(image=sklearn_image)
def fit_knn():
from sklearn.neighbors import KNeighborsClassifier
...
```
Can pull from Docker Hub, Nvidia NGC, AWS ECR, GitHub ghcr.io.
### From Private Registry
Use Modal Secrets for authentication:
**Docker Hub**:
```python
secret = modal.Secret.from_name("my-docker-secret")
image = modal.Image.from_registry(
"private-repo/image:tag",
secret=secret
)
```
**AWS ECR**:
```python
aws_secret = modal.Secret.from_name("my-aws-secret")
image = modal.Image.from_aws_ecr(
"000000000000.dkr.ecr.us-east-1.amazonaws.com/my-private-registry:latest",
secret=aws_secret,
)
```
### From Dockerfile
```python
image = modal.Image.from_dockerfile("Dockerfile")
@app.function(image=image)
def fit():
import sklearn
...
```
Can still extend with other image methods after importing.
## Using Micromamba
For coordinated installation of Python and system packages:
```python
numpyro_pymc_image = (
modal.Image.micromamba()
.micromamba_install("pymc==5.10.4", "numpyro==0.13.2", channels=["conda-forge"])
)
```
## GPU Support at Build Time
Run build steps on GPU instances:
```python ```python
image = ( image = (
modal.Image.debian_slim() modal.Image.debian_slim()
.pip_install("bitsandbytes", gpu="H100") .pip_install_private_repos(
"github.com/org/private-repo",
git_user="username",
secrets=[modal.Secret.from_name("github-token")],
)
)
```
## System Packages
Install Linux packages via apt:
```python
image = (
modal.Image.debian_slim()
.apt_install("ffmpeg", "libsndfile1", "git", "curl")
.uv_pip_install("librosa", "soundfile")
)
```
## Shell Commands
Run arbitrary commands during image build:
```python
image = (
modal.Image.debian_slim()
.run_commands(
"wget https://example.com/data.tar.gz",
"tar -xzf data.tar.gz -C /opt/data",
"rm data.tar.gz",
)
)
```
### With GPU
Some build steps require GPU access (e.g., compiling CUDA kernels):
```python
image = (
modal.Image.debian_slim()
.uv_pip_install("torch")
.run_commands("python -c 'import torch; torch.cuda.is_available()'", gpu="A100")
)
```
## Running Python During Build
Execute Python functions as build steps — useful for downloading model weights:
```python
def download_model():
from huggingface_hub import snapshot_download
snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
image = (
modal.Image.debian_slim(python_version="3.11")
.uv_pip_install("huggingface_hub", "torch", "transformers")
.run_function(download_model, secrets=[modal.Secret.from_name("huggingface")])
)
```
The resulting filesystem (including downloaded files) is snapshotted into the image.
## Adding Local Files
### Local Directories
```python
image = modal.Image.debian_slim().add_local_dir(
local_path="./config",
remote_path="/root/config",
)
```
By default, files are added at container startup (not baked into the image layer). Use `copy=True` to bake them in.
### Local Python Modules
```python
image = modal.Image.debian_slim().add_local_python_source("my_module")
```
This uses Python's import system to find and include the module.
### Individual Files
```python
image = modal.Image.debian_slim().add_local_file(
local_path="./model_config.json",
remote_path="/root/config.json",
)
```
## Environment Variables
```python
image = (
modal.Image.debian_slim()
.env({
"TRANSFORMERS_CACHE": "/cache",
"TOKENIZERS_PARALLELISM": "false",
"HF_HOME": "/cache/huggingface",
})
)
```
Names and values must be strings.
## Dockerfiles
Build from existing Dockerfiles:
```python
image = modal.Image.from_dockerfile("./Dockerfile")
# With build context
image = modal.Image.from_dockerfile("./Dockerfile", context_mount=modal.Mount.from_local_dir("."))
```
## Alternative Package Managers
### Micromamba / Conda
For packages requiring coordinated system and Python package installs:
```python
image = (
modal.Image.micromamba(python_version="3.11")
.micromamba_install("cudatoolkit=11.8", "cudnn=8.6", channels=["conda-forge"])
.uv_pip_install("torch")
) )
``` ```
## Image Caching ## Image Caching
Images are cached per layer. Breaking cache on one layer causes cascading rebuilds for subsequent layers. Modal caches images per layer (per method call). Breaking the cache on one layer cascades to all subsequent layers.
Define frequently-changing layers last to maximize cache reuse. ### Optimization Tips
1. **Order layers by change frequency**: Put stable dependencies first, frequently changing code last
2. **Pin versions**: Unpinned versions may resolve differently and break cache
3. **Separate large installs**: Put heavy packages (torch, tensorflow) in early layers
### Force Rebuild ### Force Rebuild
```python ```python
image = ( # Single layer
modal.Image.debian_slim() image = modal.Image.debian_slim().apt_install("git", force_build=True)
.apt_install("git")
.pip_install("slack-sdk", force_build=True)
)
``` ```
Or set environment variable:
```bash ```bash
MODAL_FORCE_BUILD=1 modal run ... # All images in a run
MODAL_FORCE_BUILD=1 modal run script.py
# Rebuild without updating cache
MODAL_IGNORE_CACHE=1 modal run script.py
``` ```
## Handling Different Local/Remote Packages ## Handling Remote-Only Imports
Import packages only available remotely inside function bodies: When packages are only available in the container (not locally), use conditional imports:
```python ```python
@app.function(image=image) @app.function(image=image)
def my_function(): def process():
import pandas as pd # Only imported remotely import torch # Only available in the container
df = pd.DataFrame() return torch.cuda.device_count()
...
``` ```
Or use the imports context manager: For module-level imports shared across functions, use the `Image.imports()` context manager:
```python ```python
pandas_image = modal.Image.debian_slim().pip_install("pandas") with image.imports():
import torch
with pandas_image.imports(): import transformers
import pandas as pd
@app.function(image=pandas_image)
def my_function():
df = pd.DataFrame()
``` ```
## Fast Pull from Registry with eStargz This prevents `ImportError` locally while making the imports available in the container.
Improve pull performance with eStargz compression:
```bash
docker buildx build --tag "<registry>/<namespace>/<repo>:<version>" \
--output type=registry,compression=estargz,force-compression=true,oci-mediatypes=true \
.
```
Supported registries:
- AWS ECR
- Docker Hub
- Google Artifact Registry

View File

@@ -1,129 +1,117 @@
# CPU, Memory, and Disk Resources # Modal Resource Configuration
## Default Resources ## CPU
Each Modal container has default reservations: ### Requesting CPU
- **CPU**: 0.125 cores
- **Memory**: 128 MiB
Containers can exceed minimum if worker has available resources.
## CPU Cores
Request CPU cores as floating-point number:
```python ```python
@app.function(cpu=8.0) @app.function(cpu=4.0)
def my_function(): def compute():
# Guaranteed access to at least 8 physical cores
... ...
``` ```
Values correspond to physical cores, not vCPUs. - Values are **physical cores**, not vCPUs
- Default: 0.125 cores
Modal sets multi-threading environment variables based on CPU reservation: - Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request
- `OPENBLAS_NUM_THREADS`
- `OMP_NUM_THREADS`
- `MKL_NUM_THREADS`
## Memory
Request memory in megabytes (integer):
```python
@app.function(memory=32768)
def my_function():
# Guaranteed access to at least 32 GiB RAM
...
```
## Resource Limits
### CPU Limits ### CPU Limits
Default soft CPU limit: request + 16 cores - Default soft limit: 16 physical cores above the CPU request
- Default request: 0.125 cores → default limit: 16.125 cores - Set explicit limits to prevent noisy-neighbor effects:
- Above limit, host throttles CPU usage
Set explicit CPU limit:
```python ```python
cpu_request = 1.0 @app.function(cpu=4.0) # Request 4 cores
cpu_limit = 4.0 def bounded_compute():
@app.function(cpu=(cpu_request, cpu_limit))
def f():
... ...
``` ```
## Memory
### Requesting Memory
```python
@app.function(memory=16384) # 16 GiB in MiB
def large_data():
...
```
- Value in **MiB** (megabytes)
- Default: 128 MiB
### Memory Limits ### Memory Limits
Set hard memory limit to OOM kill containers at threshold: Set hard memory limits to OOM-kill containers that exceed them:
```python ```python
mem_request = 1024 # MB @app.function(memory=8192) # 8 GiB request and limit
mem_limit = 2048 # MB def bounded_memory():
@app.function(memory=(mem_request, mem_limit))
def f():
# Container killed if exceeds 2048 MB
... ...
``` ```
Useful for catching memory leaks early. This prevents paying for runaway memory leaks.
### Disk Limits ## Ephemeral Disk
Running containers have access to many GBs of SSD disk, limited by: For temporary storage within a container's lifetime:
1. Underlying worker's SSD capacity
2. Per-container disk quota (100s of GBs)
Hitting limits causes `OSError` on disk writes.
Request larger disk with `ephemeral_disk`:
```python ```python
@app.function(ephemeral_disk=10240) # 10 GiB @app.function(ephemeral_disk=102400) # 100 GiB in MiB
def process_large_files(): def process_dataset():
# Temporary files at /tmp or anywhere in the container filesystem
... ...
``` ```
Maximum disk size: 3.0 TiB (3,145,728 MiB) - Value in **MiB**
Intended use: dataset processing - Default: 512 GiB quota per container
- Maximum: 3,145,728 MiB (3 TiB)
- Data is lost when the container shuts down
- Use Volumes for persistent storage
Larger disk requests increase the memory request at a 20:1 ratio for billing purposes.
## Timeout
```python
@app.function(timeout=3600) # 1 hour in seconds
def long_running():
...
```
- Default: 300 seconds (5 minutes)
- Maximum: 86,400 seconds (24 hours)
- Function is killed when timeout expires
## Billing ## Billing
Charged based on whichever is higher: reservation or actual usage. You are charged based on **whichever is higher**: your resource request or actual usage.
Disk requests increase memory request at 20:1 ratio: | Resource | Billing Basis |
- Requesting 500 GiB disk → increases memory request to 25 GiB (if not already higher) |----------|--------------|
| CPU | max(requested, used) |
| Memory | max(requested, used) |
| GPU | Time GPU is allocated |
| Disk | Increases memory billing at 20:1 ratio |
## Maximum Requests ### Cost Optimization Tips
Modal enforces maximums at Function creation time. Requests exceeding maximum will be rejected with `InvalidError`. - Request only what you need
- Use appropriate GPU tiers (L40S over H100 for inference)
- Set `scaledown_window` to minimize idle time
- Use `min_containers=0` when cold starts are acceptable
- Batch inputs with `.map()` instead of individual `.remote()` calls
Contact support if you need higher limits. ## Complete Example
## Example: Resource Configuration
```python ```python
@app.function( @app.function(
cpu=4.0, # 4 physical cores cpu=8.0, # 8 physical cores
memory=16384, # 16 GiB RAM memory=32768, # 32 GiB
ephemeral_disk=51200, # 50 GiB disk gpu="L40S", # L40S GPU
timeout=3600, # 1 hour timeout ephemeral_disk=204800, # 200 GiB temp disk
timeout=7200, # 2 hours
max_containers=50,
min_containers=1,
) )
def process_data(): def full_pipeline(data_path: str):
# Heavy processing with large files
... ...
``` ```
## Monitoring Resource Usage
View resource usage in Modal dashboard:
- CPU utilization
- Memory usage
- Disk usage
- GPU metrics (if applicable)
Access via https://modal.com/apps

View File

@@ -1,230 +1,173 @@
# Scaling Out on Modal # Modal Scaling and Concurrency
## Automatic Autoscaling ## Table of Contents
Every Modal Function corresponds to an autoscaling pool of containers. Modal's autoscaler: - [Autoscaling](#autoscaling)
- Spins up containers when no capacity available - [Configuration](#configuration)
- Spins down containers when resources idle - [Parallel Execution](#parallel-execution)
- Scales to zero by default when no inputs to process - [Concurrent Inputs](#concurrent-inputs)
- [Dynamic Batching](#dynamic-batching)
- [Dynamic Autoscaler Updates](#dynamic-autoscaler-updates)
- [Limits](#limits)
Autoscaling decisions are made quickly and frequently. ## Autoscaling
## Parallel Execution with `.map()` Modal automatically manages a pool of containers for each function:
- Spins up containers when there's no capacity for new inputs
- Spins down idle containers to save costs
- Scales from zero (no cost when idle) to thousands of containers
Run function repeatedly with different inputs in parallel: No configuration needed for basic autoscaling — it works out of the box.
```python ## Configuration
@app.function()
def evaluate_model(x):
return x ** 2
@app.local_entrypoint() Fine-tune autoscaling behavior:
def main():
inputs = list(range(100))
# Runs 100 inputs in parallel across containers
for result in evaluate_model.map(inputs):
print(result)
```
### Multiple Arguments with `.starmap()`
For functions with multiple arguments:
```python
@app.function()
def add(a, b):
return a + b
@app.local_entrypoint()
def main():
results = list(add.starmap([(1, 2), (3, 4)]))
# [3, 7]
```
### Exception Handling
```python
@app.function()
def may_fail(a):
if a == 2:
raise Exception("error")
return a ** 2
@app.local_entrypoint()
def main():
results = list(may_fail.map(
range(3),
return_exceptions=True,
wrap_returned_exceptions=False
))
# [0, 1, Exception('error')]
```
## Autoscaling Configuration
Configure autoscaler behavior with parameters:
```python ```python
@app.function( @app.function(
max_containers=100, # Upper limit on containers max_containers=100, # Upper limit on container count
min_containers=2, # Keep warm even when inactive min_containers=2, # Keep 2 warm (reduces cold starts)
buffer_containers=5, # Maintain buffer while active buffer_containers=5, # Reserve 5 extra for burst traffic
scaledown_window=60, # Max idle time before scaling down (seconds) scaledown_window=300, # Wait 5 min idle before shutting down
) )
def my_function(): def handle_request(data):
... ...
``` ```
Parameters: | Parameter | Default | Description |
- **max_containers**: Upper limit on total containers |-----------|---------|-------------|
- **min_containers**: Minimum kept warm even when inactive | `max_containers` | Unlimited | Hard cap on total containers |
- **buffer_containers**: Buffer size while function active (additional inputs won't need to queue) | `min_containers` | 0 | Minimum warm containers (costs money even when idle) |
- **scaledown_window**: Maximum idle duration before scale down (seconds) | `buffer_containers` | 0 | Extra containers to prevent queuing |
| `scaledown_window` | 60 | Seconds of idle time before shutdown |
Trade-offs: ### Trade-offs
- Larger warm pool/buffer → Higher cost, lower latency
- Longer scaledown window → Less churn for infrequent requests - Higher `min_containers` = lower latency, higher cost
- Higher `buffer_containers` = less queuing, higher cost
- Lower `scaledown_window` = faster cost savings, more cold starts
## Parallel Execution
### `.map()` — Process Many Inputs
```python
@app.function()
def process(item):
return heavy_computation(item)
@app.local_entrypoint()
def main():
items = list(range(10_000))
results = list(process.map(items))
```
Modal automatically scales containers to handle the workload. Results maintain input order.
### `.map()` Options
```python
# Unordered results (faster)
for result in process.map(items, order_outputs=False):
handle(result)
# Collect errors instead of raising
results = list(process.map(items, return_exceptions=True))
for r in results:
if isinstance(r, Exception):
print(f"Error: {r}")
```
### `.starmap()` — Multi-Argument
```python
@app.function()
def add(x, y):
return x + y
results = list(add.starmap([(1, 2), (3, 4), (5, 6)]))
# [3, 7, 11]
```
### `.spawn()` — Fire-and-Forget
```python
# Returns immediately
call = process.spawn(large_data)
# Check status or get result later
result = call.get()
```
Up to 1 million pending `.spawn()` calls.
## Concurrent Inputs
By default, each container handles one input at a time. Use `@modal.concurrent` to handle multiple:
```python
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=10)
async def predict(text: str):
result = await model.predict_async(text)
return result
```
This is ideal for I/O-bound workloads or async inference where a single GPU can handle multiple requests.
### With Web Endpoints
```python
@app.function(gpu="L40S")
@modal.concurrent(max_inputs=20)
@modal.asgi_app()
def web_service():
return fastapi_app
```
## Dynamic Batching
Collect inputs into batches for efficient GPU utilization:
```python
@app.function(gpu="L40S")
@modal.batched(max_batch_size=32, wait_ms=100)
async def batch_predict(texts: list[str]):
# Called with up to 32 texts at once
embeddings = model.encode(texts)
return list(embeddings)
```
- `max_batch_size` — Maximum inputs per batch
- `wait_ms` — How long to wait for more inputs before processing
- The function receives a list and must return a list of the same length
## Dynamic Autoscaler Updates ## Dynamic Autoscaler Updates
Update autoscaler settings without redeployment: Adjust autoscaling at runtime without redeploying:
```python
f = modal.Function.from_name("my-app", "f")
f.update_autoscaler(max_containers=100)
```
Settings revert to decorator configuration on next deploy, or are overridden by further updates:
```python
f.update_autoscaler(min_containers=2, max_containers=10)
f.update_autoscaler(min_containers=4) # max_containers=10 still in effect
```
### Time-Based Scaling
Adjust warm pool based on time of day:
```python ```python
@app.function() @app.function()
def inference_server(): def scale_up_for_peak():
... process = modal.Function.from_name("my-app", "process")
process.update_autoscaler(min_containers=10, buffer_containers=20)
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
def increase_warm_pool():
inference_server.update_autoscaler(min_containers=4)
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
def decrease_warm_pool():
inference_server.update_autoscaler(min_containers=0)
```
### For Classes
Update autoscaler for specific parameter instances:
```python
MyClass = modal.Cls.from_name("my-app", "MyClass")
obj = MyClass(model_version="3.5")
obj.update_autoscaler(buffer_containers=2) # type: ignore
```
## Input Concurrency
Process multiple inputs per container with `@modal.concurrent`:
```python
@app.function() @app.function()
@modal.concurrent(max_inputs=100) def scale_down_after_peak():
def my_function(input: str): process = modal.Function.from_name("my-app", "process")
# Container can handle up to 100 concurrent inputs process.update_autoscaler(min_containers=1, buffer_containers=2)
...
``` ```
Ideal for I/O-bound workloads: Settings revert to the decorator values on the next deployment.
- Database queries
- External API requests
- Remote Modal Function calls
### Concurrency Mechanisms ## Limits
**Synchronous Functions**: Separate threads (must be thread-safe) | Resource | Limit |
|----------|-------|
| Pending inputs (unassigned) | 2,000 |
| Total inputs (running + pending) | 25,000 |
| Pending `.spawn()` inputs | 1,000,000 |
| Concurrent inputs per `.map()` | 1,000 |
| Rate limit (web endpoints) | 200 req/s |
```python Exceeding these limits triggers `Resource Exhausted` errors. Implement retry logic for resilience.
@app.function()
@modal.concurrent(max_inputs=10)
def sync_function():
time.sleep(1) # Must be thread-safe
```
**Async Functions**: Separate asyncio tasks (must not block event loop)
```python
@app.function()
@modal.concurrent(max_inputs=10)
async def async_function():
await asyncio.sleep(1) # Must not block event loop
```
### Target vs Max Inputs
```python
@app.function()
@modal.concurrent(
max_inputs=120, # Hard limit
target_inputs=100 # Autoscaler target
)
def my_function(input: str):
# Allow 20% burst above target
...
```
Autoscaler aims for `target_inputs`, but containers can burst to `max_inputs` during scale-up.
## Scaling Limits
Modal enforces limits per function:
- 2,000 pending inputs (not yet assigned to containers)
- 25,000 total inputs (running + pending)
For `.spawn()` async jobs: up to 1 million pending inputs.
Exceeding limits returns `Resource Exhausted` error - retry later.
Each `.map()` invocation: max 1,000 concurrent inputs.
## Async Usage
Use async APIs for arbitrary parallel execution patterns:
```python
@app.function()
async def async_task(x):
await asyncio.sleep(1)
return x * 2
@app.local_entrypoint()
async def main():
tasks = [async_task.remote.aio(i) for i in range(100)]
results = await asyncio.gather(*tasks)
```
## Common Gotchas
**Incorrect**: Using Python's builtin map (runs sequentially)
```python
# DON'T DO THIS
results = map(evaluate_model, inputs)
```
**Incorrect**: Calling function first
```python
# DON'T DO THIS
results = evaluate_model(inputs).map()
```
**Correct**: Call .map() on Modal function object
```python
# DO THIS
results = evaluate_model.map(inputs)
```

View File

@@ -1,303 +1,143 @@
# Scheduled Jobs and Cron # Modal Scheduled Jobs
## Basic Scheduling ## Overview
Schedule functions to run automatically at regular intervals or specific times. Modal supports running functions automatically on a schedule, either using cron syntax or fixed intervals. Deploy scheduled functions with `modal deploy` and they run unattended in the cloud.
### Simple Daily Schedule ## Schedule Types
### modal.Cron
Standard cron syntax — stable across deploys:
```python ```python
import modal import modal
app = modal.App() app = modal.App("scheduled-tasks")
@app.function(schedule=modal.Period(days=1)) # Daily at 9 AM UTC
def daily_task(): @app.function(schedule=modal.Cron("0 9 * * *"))
print("Running daily task") def daily_report():
# Process data, send reports, etc. generate_and_send_report()
# Every Monday at midnight
@app.function(schedule=modal.Cron("0 0 * * 1"))
def weekly_cleanup():
cleanup_old_data()
# Every 15 minutes
@app.function(schedule=modal.Cron("*/15 * * * *"))
def frequent_check():
check_system_health()
``` ```
Deploy to activate: #### Cron Syntax Reference
```bash
modal deploy script.py ```
┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6, Sun=0)
│ │ │ │ │
* * * * *
``` ```
Function runs every 24 hours from deployment time. | Pattern | Meaning |
|---------|---------|
| `0 9 * * *` | Daily at 9:00 AM UTC |
| `0 */6 * * *` | Every 6 hours |
| `*/30 * * * *` | Every 30 minutes |
| `0 0 * * 1` | Every Monday at midnight |
| `0 0 1 * *` | First day of every month |
| `0 9 * * 1-5` | Weekdays at 9 AM |
## Schedule Types ### modal.Period
### Period Schedules Fixed interval — resets on each deploy:
Run at fixed intervals from deployment time:
```python ```python
# Every 5 hours # Every 5 hours
@app.function(schedule=modal.Period(hours=5)) @app.function(schedule=modal.Period(hours=5))
def every_5_hours(): def periodic_sync():
... sync_data()
# Every 30 minutes # Every 30 minutes
@app.function(schedule=modal.Period(minutes=30)) @app.function(schedule=modal.Period(minutes=30))
def every_30_minutes(): def poll_updates():
... check_for_updates()
# Every day # Every day
@app.function(schedule=modal.Period(days=1)) @app.function(schedule=modal.Period(days=1))
def daily(): def daily_task():
... ...
``` ```
**Note**: Redeploying resets the period timer. `modal.Period` resets its timer on each deployment. If you need a schedule that doesn't shift with deploys, use `modal.Cron`.
### Cron Schedules ## Deploying Scheduled Functions
Run at specific times using cron syntax: Schedules only activate when deployed:
```python
# Every Monday at 8 AM UTC
@app.function(schedule=modal.Cron("0 8 * * 1"))
def weekly_report():
...
# Daily at 6 AM New York time
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
def morning_report():
...
# Every hour on the hour
@app.function(schedule=modal.Cron("0 * * * *"))
def hourly():
...
# Every 15 minutes
@app.function(schedule=modal.Cron("*/15 * * * *"))
def quarter_hourly():
...
```
**Cron syntax**: `minute hour day month day_of_week`
- Minute: 0-59
- Hour: 0-23
- Day: 1-31
- Month: 1-12
- Day of week: 0-6 (0 = Sunday)
### Timezone Support
Specify timezone for cron schedules:
```python
@app.function(schedule=modal.Cron("0 9 * * *", timezone="Europe/London"))
def uk_morning_task():
...
@app.function(schedule=modal.Cron("0 17 * * 5", timezone="Asia/Tokyo"))
def friday_evening_jp():
...
```
## Deployment
### Deploy Scheduled Functions
```bash ```bash
modal deploy script.py modal deploy script.py
``` ```
Scheduled functions persist until explicitly stopped. `modal run` and `modal serve` do not activate schedules.
### Programmatic Deployment
```python
if __name__ == "__main__":
app.deploy()
```
## Monitoring ## Monitoring
### View Execution Logs - View scheduled runs in the **Apps** section of the Modal dashboard
- Each run appears with its status, duration, and logs
- Use the **"Run Now"** button on the dashboard to trigger manually
Check https://modal.com/apps for: ## Management
- Past execution logs
- Execution history
- Failure notifications
### Run Manually - Schedules cannot be paused — remove the schedule and redeploy to stop
- To change a schedule, update the `schedule` parameter and redeploy
Trigger scheduled function immediately via dashboard "Run now" button. - To stop entirely, either remove the `schedule` parameter or run `modal app stop <name>`
## Schedule Management
### Pausing Schedules
Schedules cannot be paused. To stop:
1. Remove `schedule` parameter
2. Redeploy app
### Updating Schedules
Change schedule parameters and redeploy:
```python
# Update from daily to weekly
@app.function(schedule=modal.Period(days=7))
def task():
...
```
```bash
modal deploy script.py
```
## Common Patterns ## Common Patterns
### Data Pipeline ### ETL Pipeline
```python ```python
@app.function( @app.function(
schedule=modal.Cron("0 2 * * *"), # 2 AM daily schedule=modal.Cron("0 2 * * *"), # 2 AM UTC daily
timeout=3600, # 1 hour timeout secrets=[modal.Secret.from_name("db-creds")],
timeout=7200,
) )
def etl_pipeline(): def etl_pipeline():
# Extract data from sources import os
data = extract_data() data = extract(os.environ["SOURCE_DB_URL"])
transformed = transform(data)
# Transform data load(transformed, os.environ["DEST_DB_URL"])
transformed = transform_data(data)
# Load to warehouse
load_to_warehouse(transformed)
``` ```
### Model Retraining ### Model Retraining
```python ```python
volume = modal.Volume.from_name("models")
@app.function( @app.function(
schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday midnight schedule=modal.Cron("0 0 * * 0"), # Weekly on Sunday
gpu="A100", gpu="H100",
timeout=7200, # 2 hours volumes={"/data": data_vol, "/models": model_vol},
volumes={"/models": volume} timeout=86400,
) )
def retrain_model(): def retrain():
# Load latest data model = train_on_latest_data("/data/training/")
data = load_training_data() torch.save(model.state_dict(), "/models/latest.pt")
# Train model
model = train(data)
# Save new model
save_model(model, "/models/latest.pt")
volume.commit()
``` ```
### Report Generation ### Health Checks
```python ```python
@app.function( @app.function(
schedule=modal.Cron("0 9 * * 1"), # Monday 9 AM schedule=modal.Period(minutes=5),
secrets=[modal.Secret.from_name("email-creds")] secrets=[modal.Secret.from_name("slack-webhook")],
) )
def weekly_report(): def health_check():
# Generate report import os, requests
report = generate_analytics_report() status = check_all_services()
if not status["healthy"]:
# Send email requests.post(os.environ["SLACK_URL"], json={"text": f"Alert: {status}"})
send_email(
to="team@company.com",
subject="Weekly Analytics Report",
body=report
)
``` ```
### Data Cleanup
```python
@app.function(schedule=modal.Period(hours=6))
def cleanup_old_data():
# Remove data older than 30 days
cutoff = datetime.now() - timedelta(days=30)
delete_old_records(cutoff)
```
## Configuration with Secrets and Volumes
Scheduled functions support all function parameters:
```python
vol = modal.Volume.from_name("data")
secret = modal.Secret.from_name("api-keys")
@app.function(
schedule=modal.Cron("0 */6 * * *"), # Every 6 hours
secrets=[secret],
volumes={"/data": vol},
cpu=4.0,
memory=16384,
)
def sync_data():
import os
api_key = os.environ["API_KEY"]
# Fetch from external API
data = fetch_external_data(api_key)
# Save to volume
with open("/data/latest.json", "w") as f:
json.dump(data, f)
vol.commit()
```
## Dynamic Scheduling
Update schedules programmatically:
```python
@app.function()
def main_task():
...
@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York"))
def enable_high_traffic_mode():
main_task.update_autoscaler(min_containers=5)
@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York"))
def disable_high_traffic_mode():
main_task.update_autoscaler(min_containers=0)
```
## Error Handling
Scheduled functions that fail will:
- Show failure in dashboard
- Send notifications (configurable)
- Retry on next scheduled run
```python
@app.function(
schedule=modal.Cron("0 * * * *"),
retries=3, # Retry failed runs
timeout=1800
)
def robust_task():
try:
perform_task()
except Exception as e:
# Log error
print(f"Task failed: {e}")
# Optionally send alert
send_alert(f"Scheduled task failed: {e}")
raise
```
## Best Practices
1. **Set timeouts**: Always specify timeout for scheduled functions
2. **Use appropriate schedules**: Period for relative timing, Cron for absolute
3. **Monitor failures**: Check dashboard regularly for failed runs
4. **Idempotent operations**: Design tasks to handle reruns safely
5. **Resource limits**: Set appropriate CPU/memory for scheduled workloads
6. **Timezone awareness**: Specify timezone for cron schedules

View File

@@ -1,180 +1,119 @@
# Secrets and Environment Variables # Modal Secrets
## Overview
Modal Secrets securely deliver credentials and sensitive data to functions as environment variables. Secrets are stored encrypted and only available to your workspace.
## Creating Secrets ## Creating Secrets
### Via Dashboard
Create secrets at https://modal.com/secrets
Templates available for:
- Database credentials (Postgres, MongoDB)
- Cloud providers (AWS, GCP, Azure)
- ML platforms (Weights & Biases, Hugging Face)
- And more
### Via CLI ### Via CLI
```bash ```bash
# Create secret with key-value pairs # Create with key-value pairs
modal secret create my-secret KEY1=value1 KEY2=value2 modal secret create my-api-keys API_KEY=sk-xxx DB_PASSWORD=hunter2
# Use environment variables # Create from existing environment variables
modal secret create db-secret PGHOST=uri PGPASSWORD="$PGPASSWORD" modal secret create my-env-keys API_KEY=$API_KEY
# List secrets # List all secrets
modal secret list modal secret list
# Delete secret # Delete a secret
modal secret delete my-secret modal secret delete my-api-keys
``` ```
### Programmatically ### Via Dashboard
From dictionary: Navigate to https://modal.com/secrets to create and manage secrets. Templates are available for common services (Postgres, MongoDB, Hugging Face, Weights & Biases, etc.).
### Programmatic (Inline)
```python ```python
if modal.is_local(): # From a dictionary (useful for development)
local_secret = modal.Secret.from_dict({"FOO": os.environ["LOCAL_FOO"]}) secret = modal.Secret.from_dict({"API_KEY": "sk-xxx"})
else:
local_secret = modal.Secret.from_dict({})
@app.function(secrets=[local_secret]) # From a .env file
def some_function(): secret = modal.Secret.from_dotenv()
import os
print(os.environ["FOO"]) # From a named secret (created via CLI or dashboard)
secret = modal.Secret.from_name("my-api-keys")
``` ```
From .env file: ## Using Secrets in Functions
### Basic Usage
```python ```python
@app.function(secrets=[modal.Secret.from_dotenv()]) @app.function(secrets=[modal.Secret.from_name("my-api-keys")])
def some_function(): def call_api():
import os import os
print(os.environ["USERNAME"]) api_key = os.environ["API_KEY"]
``` # Use the key
response = requests.get(url, headers={"Authorization": f"Bearer {api_key}"})
## Using Secrets return response.json()
Inject secrets into functions:
```python
@app.function(secrets=[modal.Secret.from_name("my-secret")])
def some_function():
import os
secret_key = os.environ["MY_PASSWORD"]
# Use secret
...
``` ```
### Multiple Secrets ### Multiple Secrets
```python ```python
@app.function(secrets=[ @app.function(secrets=[
modal.Secret.from_name("openai-keys"),
modal.Secret.from_name("database-creds"), modal.Secret.from_name("database-creds"),
modal.Secret.from_name("api-keys"),
]) ])
def other_function(): def process():
# All keys from both secrets available import os
openai_key = os.environ["OPENAI_API_KEY"]
db_url = os.environ["DATABASE_URL"]
... ...
``` ```
Later secrets override earlier ones if keys clash. Secrets are applied in order — if two secrets define the same key, the later one wins.
## Environment Variables ### With Classes
### Reserved Runtime Variables
**All Containers**:
- `MODAL_CLOUD_PROVIDER` - Cloud provider (AWS/GCP/OCI)
- `MODAL_IMAGE_ID` - Image ID
- `MODAL_REGION` - Region identifier (e.g., us-east-1)
- `MODAL_TASK_ID` - Container task ID
**Function Containers**:
- `MODAL_ENVIRONMENT` - Modal Environment name
- `MODAL_IS_REMOTE` - Set to '1' in remote containers
- `MODAL_IDENTITY_TOKEN` - OIDC token for function identity
**Sandbox Containers**:
- `MODAL_SANDBOX_ID` - Sandbox ID
### Setting Environment Variables
Via Image:
```python ```python
image = modal.Image.debian_slim().env({"PORT": "6443"}) @app.cls(secrets=[modal.Secret.from_name("huggingface")])
class ModelService:
@app.function(image=image) @modal.enter()
def my_function(): def load(self):
import os import os
port = os.environ["PORT"] token = os.environ["HF_TOKEN"]
self.model = AutoModel.from_pretrained("model-name", token=token)
``` ```
Via Secrets: ### From .env File
```python ```python
secret = modal.Secret.from_dict({"API_KEY": "secret-value"}) # Reads .env file from current directory
@app.function(secrets=[modal.Secret.from_dotenv()])
@app.function(secrets=[secret]) def local_dev():
def my_function():
import os import os
api_key = os.environ["API_KEY"] api_key = os.environ["API_KEY"]
``` ```
## Common Secret Patterns The `.env` file format:
### AWS Credentials ```
API_KEY=sk-xxx
```python DATABASE_URL=postgres://user:pass@host/db
aws_secret = modal.Secret.from_name("my-aws-secret") DEBUG=false
@app.function(secrets=[aws_secret])
def use_aws():
import boto3
s3 = boto3.client('s3')
# AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY automatically used
``` ```
### Hugging Face Token ## Common Secret Templates
```python | Service | Typical Keys |
hf_secret = modal.Secret.from_name("huggingface") |---------|-------------|
| OpenAI | `OPENAI_API_KEY` |
@app.function(secrets=[hf_secret]) | Hugging Face | `HF_TOKEN` |
def download_model(): | AWS | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` |
from transformers import AutoModel | Postgres | `PGHOST`, `PGPORT`, `PGUSER`, `PGPASSWORD`, `PGDATABASE` |
# HF_TOKEN automatically used for authentication | Weights & Biases | `WANDB_API_KEY` |
model = AutoModel.from_pretrained("private-model") | GitHub | `GITHUB_TOKEN` |
```
### Database Credentials
```python
db_secret = modal.Secret.from_name("postgres-creds")
@app.function(secrets=[db_secret])
def query_db():
import psycopg2
conn = psycopg2.connect(
host=os.environ["PGHOST"],
port=os.environ["PGPORT"],
user=os.environ["PGUSER"],
password=os.environ["PGPASSWORD"],
)
```
## Best Practices
1. **Never hardcode secrets** - Always use Modal Secrets
2. **Use specific secrets** - Create separate secrets for different purposes
3. **Rotate secrets regularly** - Update secrets periodically
4. **Minimal scope** - Only attach secrets to functions that need them
5. **Environment-specific** - Use different secrets for dev/staging/prod
## Security Notes ## Security Notes
- Secrets are encrypted at rest - Secrets are encrypted at rest and in transit
- Only available to functions that explicitly request them - Only accessible to functions in your workspace
- Not logged or exposed in dashboards - Never log or print secret values
- Can be scoped to specific environments - Use `.from_name()` in production (not `.from_dict()`)
- Rotate secrets regularly via the dashboard or CLI

View File

@@ -1,303 +1,247 @@
# Modal Volumes # Modal Volumes
## Table of Contents
- [Overview](#overview)
- [Creating Volumes](#creating-volumes)
- [Mounting Volumes](#mounting-volumes)
- [Reading and Writing Files](#reading-and-writing-files)
- [CLI Access](#cli-access)
- [Commits and Reloads](#commits-and-reloads)
- [Concurrent Access](#concurrent-access)
- [Volumes v2](#volumes-v2)
- [Common Patterns](#common-patterns)
## Overview ## Overview
Modal Volumes provide high-performance distributed file systems for Modal applications. Designed for write-once, read-many workloads like ML model weights and distributed data processing. Volumes are Modal's distributed file system, optimized for write-once, read-many workloads like storing model weights and distributing them across containers.
Key characteristics:
- Persistent across function invocations and deployments
- Mountable by multiple functions simultaneously
- Background auto-commits every few seconds
- Final commit on container shutdown
## Creating Volumes ## Creating Volumes
### In Code (Lazy Creation)
```python
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
```
### Via CLI ### Via CLI
```bash ```bash
modal volume create my-volume modal volume create my-volume
# v2 volume (beta)
modal volume create my-volume --version=2
``` ```
For Volumes v2 (beta): ### Programmatic v2
```bash
modal volume create --version=2 my-volume
```
### From Code
```python ```python
vol = modal.Volume.from_name("my-volume", create_if_missing=True)
# For v2
vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2) vol = modal.Volume.from_name("my-volume", create_if_missing=True, version=2)
``` ```
## Using Volumes ## Mounting Volumes
Attach to functions via mount points: Mount volumes to functions via the `volumes` parameter:
```python ```python
vol = modal.Volume.from_name("my-volume") vol = modal.Volume.from_name("model-store", create_if_missing=True)
@app.function(volumes={"/data": vol}) @app.function(volumes={"/models": vol})
def run(): def use_model():
with open("/data/xyz.txt", "w") as f: # Access files at /models/
f.write("hello") with open("/models/config.json") as f:
vol.commit() # Persist changes config = json.load(f)
``` ```
## Commits and Reloads Mount multiple volumes:
### Commits
Persist changes to Volume:
```python ```python
@app.function(volumes={"/data": vol}) weights_vol = modal.Volume.from_name("weights")
def write_data(): data_vol = modal.Volume.from_name("datasets")
with open("/data/file.txt", "w") as f:
f.write("data")
vol.commit() # Make changes visible to other containers
```
**Background commits**: Modal automatically commits Volume changes every few seconds and on container shutdown. @app.function(volumes={"/weights": weights_vol, "/data": data_vol})
def train():
### Reloads
Fetch latest changes from other containers:
```python
@app.function(volumes={"/data": vol})
def read_data():
vol.reload() # Fetch latest changes
with open("/data/file.txt", "r") as f:
content = f.read()
```
At container creation, latest Volume state is mounted. Reload needed to see subsequent commits from other containers.
## Uploading Files
### Batch Upload (Efficient)
```python
vol = modal.Volume.from_name("my-volume")
with vol.batch_upload() as batch:
batch.put_file("local-path.txt", "/remote-path.txt")
batch.put_directory("/local/directory/", "/remote/directory")
batch.put_file(io.BytesIO(b"some data"), "/foobar")
```
### Via Image
```python
image = modal.Image.debian_slim().add_local_dir(
local_path="/home/user/my_dir",
remote_path="/app"
)
@app.function(image=image)
def process():
# Files available at /app
... ...
``` ```
## Downloading Files ## Reading and Writing Files
### Via CLI ### Writing
```bash
modal volume get my-volume remote.txt local.txt
```
Max file size via CLI: No limit
Max file size via dashboard: 16 MB
### Via Python SDK
```python ```python
vol = modal.Volume.from_name("my-volume") @app.function(volumes={"/data": vol})
def save_results(results):
import json
import os
for data in vol.read_file("path.txt"): os.makedirs("/data/outputs", exist_ok=True)
print(data) with open("/data/outputs/results.json", "w") as f:
json.dump(results, f)
``` ```
## Volume Performance ### Reading
### Volumes v1
Best for:
- <50,000 files (recommended)
- <500,000 files (hard limit)
- Sequential access patterns
- <5 concurrent writers
### Volumes v2 (Beta)
Improved for:
- Unlimited files
- Hundreds of concurrent writers
- Random access patterns
- Large files (up to 1 TiB)
Current v2 limits:
- Max file size: 1 TiB
- Max files per directory: 32,768
- Unlimited directory depth
## Model Storage
### Saving Model Weights
```python ```python
volume = modal.Volume.from_name("model-weights", create_if_missing=True) @app.function(volumes={"/data": vol})
MODEL_DIR = "/models" def load_results():
with open("/data/outputs/results.json") as f:
return json.load(f)
```
@app.function(volumes={MODEL_DIR: volume}) ### Large Files (Model Weights)
def train():
```python
@app.function(volumes={"/models": vol}, gpu="L40S")
def save_model():
import torch
model = train_model() model = train_model()
save_model(f"{MODEL_DIR}/my_model.pt", model) torch.save(model.state_dict(), "/models/checkpoint.pt")
volume.commit()
@app.function(volumes={"/models": vol}, gpu="L40S")
def load_model():
import torch
model = MyModel()
model.load_state_dict(torch.load("/models/checkpoint.pt"))
return model
``` ```
### Loading Model Weights ## CLI Access
```python
@app.function(volumes={MODEL_DIR: volume})
def inference(model_id: str):
try:
model = load_model(f"{MODEL_DIR}/{model_id}")
except NotFound:
volume.reload() # Fetch latest models
model = load_model(f"{MODEL_DIR}/{model_id}")
return model.run(request)
```
## Model Checkpointing
Save checkpoints during long training jobs:
```python
volume = modal.Volume.from_name("checkpoints")
VOL_PATH = "/vol"
@app.function(
gpu="A10G",
timeout=2*60*60, # 2 hours
volumes={VOL_PATH: volume}
)
def finetune():
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
training_args = Seq2SeqTrainingArguments(
output_dir=str(VOL_PATH / "model"), # Checkpoints saved to Volume
save_steps=100,
# ... more args
)
trainer = Seq2SeqTrainer(model=model, args=training_args, ...)
trainer.train()
```
Background commits ensure checkpoints persist even if training is interrupted.
## CLI Commands
```bash ```bash
# List files # List files
modal volume ls my-volume modal volume ls my-volume
modal volume ls my-volume /subdir/
# Upload # Upload files
modal volume put my-volume local.txt remote.txt modal volume put my-volume local_file.txt
modal volume put my-volume local_file.txt /remote/path/file.txt
# Download # Download files
modal volume get my-volume remote.txt local.txt modal volume get my-volume /remote/file.txt local_file.txt
# Copy within Volume # Delete a volume
modal volume cp my-volume src.txt dst.txt
# Delete
modal volume rm my-volume file.txt
# List all volumes
modal volume list
# Delete volume
modal volume delete my-volume modal volume delete my-volume
``` ```
## Ephemeral Volumes ## Commits and Reloads
Create temporary volumes that are garbage collected: Modal auto-commits volume changes in the background every few seconds and on container shutdown.
### Explicit Commit
Force an immediate commit:
```python ```python
with modal.Volume.ephemeral() as vol: @app.function(volumes={"/data": vol})
sb = modal.Sandbox.create( def writer():
volumes={"/cache": vol}, with open("/data/file.txt", "w") as f:
app=my_app, f.write("hello")
) vol.commit() # Make immediately visible to other containers
# Use volume ```
# Automatically cleaned up when context exits
### Reload
See changes from other containers:
```python
@app.function(volumes={"/data": vol})
def reader():
vol.reload() # Refresh to see latest writes
with open("/data/file.txt") as f:
return f.read()
``` ```
## Concurrent Access ## Concurrent Access
### Concurrent Reads ### v1 Volumes
Multiple containers can read simultaneously without issues. - Recommended max 5 concurrent commits
- Last write wins for concurrent modifications of the same file
- Avoid concurrent modification of identical files
- Max 500,000 files (inodes)
### Concurrent Writes ### v2 Volumes
Supported but: - Hundreds of concurrent writers (distinct files)
- Avoid modifying same files concurrently - No file count limit
- Last write wins (data loss possible) - Improved random access performance
- v1: Limit to ~5 concurrent writers - Up to 1 TiB per file, 262,144 files per directory
- v2: Hundreds of concurrent writers supported
## Volume Errors ## Volumes v2
### "Volume Busy" v2 Volumes (beta) offer significant improvements:
Cannot reload when files are open: | Feature | v1 | v2 |
|---------|----|----|
| Max files | 500,000 | Unlimited |
| Concurrent writes | ~5 | Hundreds |
| Max file size | No limit | 1 TiB |
| Random access | Limited | Full support |
| HIPAA compliance | No | Yes |
| Hard links | No | Yes |
Enable v2:
```python ```python
# WRONG vol = modal.Volume.from_name("my-vol-v2", create_if_missing=True, version=2)
f = open("/vol/data.txt", "r")
volume.reload() # ERROR: volume busy
``` ```
## Common Patterns
### Model Weight Storage
```python ```python
# CORRECT vol = modal.Volume.from_name("model-weights", create_if_missing=True)
with open("/vol/data.txt", "r") as f:
data = f.read() # Download once during image build
# File closed before reload def download_weights():
volume.reload() from huggingface_hub import snapshot_download
snapshot_download("meta-llama/Llama-3-8B", local_dir="/models/llama3")
image = (
modal.Image.debian_slim()
.uv_pip_install("huggingface_hub")
.run_function(download_weights, volumes={"/models": vol})
)
``` ```
### "File Not Found" ### Training Checkpoints
Remember to use mount point:
```python ```python
# WRONG - file saved to local disk @app.function(volumes={"/checkpoints": vol}, gpu="H100", timeout=86400)
with open("/xyz.txt", "w") as f: def train():
f.write("data") for epoch in range(100):
train_one_epoch()
# CORRECT - file saved to Volume torch.save(model.state_dict(), f"/checkpoints/epoch_{epoch}.pt")
with open("/data/xyz.txt", "w") as f: vol.commit() # Save checkpoint immediately
f.write("data")
``` ```
## Upgrading from v1 to v2 ### Shared Data Between Functions
No automated migration currently. Manual steps: ```python
data_vol = modal.Volume.from_name("shared-data", create_if_missing=True)
1. Create new v2 Volume @app.function(volumes={"/data": data_vol})
2. Copy data using `cp` or `rsync` def preprocess():
3. Update app to use new Volume # Write processed data
df.to_parquet("/data/processed.parquet")
```bash @app.function(volumes={"/data": data_vol})
modal volume create --version=2 my-volume-v2 def analyze():
modal shell --volume my-volume --volume my-volume-v2 data_vol.reload() # Ensure we see latest data
df = pd.read_parquet("/data/processed.parquet")
# In shell: return df.describe()
cp -rp /mnt/my-volume/. /mnt/my-volume-v2/.
sync /mnt/my-volume-v2
``` ```
Warning: Deployed apps reference Volumes by ID. Re-deploy after creating new Volume. ### Performance Tips
- Volumes are optimized for large files, not many small files
- Keep under 50,000 files and directories for best v1 performance
- Use Parquet or other columnar formats instead of many small CSVs
- For truly temporary data, use `ephemeral_disk` instead of Volumes

View File

@@ -1,337 +1,254 @@
# Web Endpoints # Modal Web Endpoints
## Quick Start ## Table of Contents
Create web endpoint with single decorator: - [Simple Endpoints](#simple-endpoints)
- [Deployment](#deployment)
```python - [ASGI Apps](#asgi-apps-fastapi-starlette-fasthtml)
image = modal.Image.debian_slim().pip_install("fastapi[standard]") - [WSGI Apps](#wsgi-apps-flask-django)
- [Custom Web Servers](#custom-web-servers)
@app.function(image=image) - [WebSockets](#websockets)
@modal.fastapi_endpoint() - [Authentication](#authentication)
def hello(): - [Streaming](#streaming)
return "Hello world!" - [Concurrency](#concurrency)
``` - [Limits](#limits)
## Development and Deployment
### Development with `modal serve`
```bash
modal serve server.py
```
Creates ephemeral app with live-reloading. Changes to endpoints appear almost immediately.
### Deployment with `modal deploy`
```bash
modal deploy server.py
```
Creates persistent endpoint with stable URL.
## Simple Endpoints ## Simple Endpoints
### Query Parameters The easiest way to create a web endpoint:
```python ```python
@app.function(image=image) import modal
@modal.fastapi_endpoint()
def square(x: int):
return {"square": x**2}
```
Call with: app = modal.App("api-service")
```bash
curl "https://workspace--app-square.modal.run?x=42"
```
### POST Requests
```python
@app.function(image=image)
@modal.fastapi_endpoint(method="POST")
def square(item: dict):
return {"square": item['x']**2}
```
Call with:
```bash
curl -X POST -H 'Content-Type: application/json' \
--data '{"x": 42}' \
https://workspace--app-square.modal.run
```
### Pydantic Models
```python
from pydantic import BaseModel
class Item(BaseModel):
name: str
qty: int = 42
@app.function() @app.function()
@modal.fastapi_endpoint(method="POST") @modal.fastapi_endpoint()
def process(item: Item): def hello(name: str = "World"):
return {"processed": item.name, "quantity": item.qty} return {"message": f"Hello, {name}!"}
``` ```
### POST Endpoints
```python
@app.function()
@modal.fastapi_endpoint(method="POST")
def predict(data: dict):
result = model.predict(data["text"])
return {"prediction": result}
```
### Query Parameters
Parameters are automatically parsed from query strings:
```python
@app.function()
@modal.fastapi_endpoint()
def search(query: str, limit: int = 10):
return {"results": do_search(query, limit)}
```
Access via: `https://your-app.modal.run?query=hello&limit=5`
## Deployment
### Development Mode
```bash
modal serve script.py
```
- Creates a temporary public URL
- Hot-reloads on file changes
- Perfect for development and testing
- URL expires when you stop the command
### Production Deployment
```bash
modal deploy script.py
```
- Creates a permanent URL
- Runs persistently in the cloud
- Autoscales based on traffic
- URL format: `https://<workspace>--<app-name>-<function-name>.modal.run`
## ASGI Apps (FastAPI, Starlette, FastHTML) ## ASGI Apps (FastAPI, Starlette, FastHTML)
Serve full ASGI applications: For full framework applications, use `@modal.asgi_app`:
```python ```python
image = modal.Image.debian_slim().pip_install("fastapi[standard]") from fastapi import FastAPI
@app.function(image=image) web_app = FastAPI()
@modal.concurrent(max_inputs=100)
@web_app.get("/")
async def root():
return {"status": "ok"}
@web_app.post("/predict")
async def predict(request: dict):
return {"result": model.run(request["input"])}
@app.function(image=image, gpu="L40S")
@modal.asgi_app() @modal.asgi_app()
def fastapi_app(): def fastapi_app():
from fastapi import FastAPI
web_app = FastAPI()
@web_app.get("/")
async def root():
return {"message": "Hello"}
@web_app.post("/echo")
async def echo(request: Request):
body = await request.json()
return body
return web_app return web_app
``` ```
### With Class Lifecycle
```python
@app.cls(gpu="L40S", image=image)
class InferenceService:
@modal.enter()
def load_model(self):
self.model = load_model()
@modal.asgi_app()
def serve(self):
from fastapi import FastAPI
app = FastAPI()
@app.post("/generate")
async def generate(request: dict):
return self.model.generate(request["prompt"])
return app
```
## WSGI Apps (Flask, Django) ## WSGI Apps (Flask, Django)
Serve synchronous web frameworks:
```python ```python
image = modal.Image.debian_slim().pip_install("flask") from flask import Flask
flask_app = Flask(__name__)
@flask_app.route("/")
def index():
return {"status": "ok"}
@app.function(image=image) @app.function(image=image)
@modal.concurrent(max_inputs=100)
@modal.wsgi_app() @modal.wsgi_app()
def flask_app(): def flask_server():
from flask import Flask, request return flask_app
web_app = Flask(__name__)
@web_app.post("/echo")
def echo():
return request.json
return web_app
``` ```
## Non-ASGI Web Servers WSGI is synchronous — concurrent inputs run on separate threads.
For frameworks with custom network binding: ## Custom Web Servers
> ⚠️ **Security Note**: The example below uses `shell=True` for simplicity. In production environments, prefer using `subprocess.Popen()` with a list of arguments to prevent command injection vulnerabilities. For non-standard web frameworks (aiohttp, Tornado, TGI):
```python ```python
@app.function() @app.function(image=image, gpu="H100")
@modal.concurrent(max_inputs=100) @modal.web_server(port=8000)
@modal.web_server(8000) def serve():
def my_server():
import subprocess import subprocess
# Must bind to 0.0.0.0, not 127.0.0.1 subprocess.Popen([
# Use list form instead of shell=True for security "python", "-m", "vllm.entrypoints.openai.api_server",
subprocess.Popen(["python", "-m", "http.server", "-d", "/", "8000"]) "--model", "meta-llama/Llama-3-70B",
"--host", "0.0.0.0", # Must bind to 0.0.0.0, not localhost
"--port", "8000",
])
``` ```
## Streaming Responses The application must bind to `0.0.0.0` (not `127.0.0.1`).
Use FastAPI's `StreamingResponse`:
```python
import time
def event_generator():
for i in range(10):
yield f"data: event {i}\n\n".encode()
time.sleep(0.5)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def stream():
from fastapi.responses import StreamingResponse
return StreamingResponse(
event_generator(),
media_type="text/event-stream"
)
```
### Streaming from Modal Functions
```python
@app.function(gpu="any")
def process_gpu():
for i in range(10):
yield f"data: result {i}\n\n".encode()
time.sleep(1)
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def hook():
from fastapi.responses import StreamingResponse
return StreamingResponse(
process_gpu.remote_gen(),
media_type="text/event-stream"
)
```
### With .map()
```python
@app.function()
def process_segment(i):
return f"segment {i}\n"
@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]"))
@modal.fastapi_endpoint()
def stream_parallel():
from fastapi.responses import StreamingResponse
return StreamingResponse(
process_segment.map(range(10)),
media_type="text/plain"
)
```
## WebSockets ## WebSockets
Supported with `@web_server`, `@asgi_app`, and `@wsgi_app`. Maintains single function call per connection. Use with `@modal.concurrent` for multiple simultaneous connections. Supported with `@modal.asgi_app`, `@modal.wsgi_app`, and `@modal.web_server`:
Full WebSocket protocol (RFC 6455) supported. Messages up to 2 MiB each. ```python
from fastapi import FastAPI, WebSocket
web_app = FastAPI()
@web_app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
while True:
data = await websocket.receive_text()
result = process(data)
await websocket.send_text(result)
@app.function()
@modal.asgi_app()
def ws_app():
return web_app
```
- Full WebSocket protocol (RFC 6455)
- Messages up to 2 MiB each
- No RFC 8441 or RFC 7692 support yet
## Authentication ## Authentication
### Proxy Auth Tokens ### Proxy Auth Tokens (Built-in)
First-class authentication via Modal: Modal provides first-class endpoint protection via proxy auth tokens:
```python ```python
@app.function() @app.function()
@modal.fastapi_endpoint() @modal.fastapi_endpoint()
def protected(): def protected(text: str):
return "authenticated!" return {"result": process(text)}
``` ```
Protect with tokens in settings, pass in headers: Clients include `Modal-Key` and `Modal-Secret` headers to authenticate.
- `Modal-Key`
- `Modal-Secret`
### Bearer Token Authentication ### Custom Bearer Tokens
```python ```python
from fastapi import Depends, HTTPException, status from fastapi import Header, HTTPException
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
auth_scheme = HTTPBearer() @app.function(secrets=[modal.Secret.from_name("auth-secret")])
@modal.fastapi_endpoint(method="POST")
@app.function(secrets=[modal.Secret.from_name("auth-token")]) def secure_predict(data: dict, authorization: str = Header(None)):
@modal.fastapi_endpoint()
async def protected(token: HTTPAuthorizationCredentials = Depends(auth_scheme)):
import os import os
if token.credentials != os.environ["AUTH_TOKEN"]: expected = os.environ["AUTH_TOKEN"]
raise HTTPException( if authorization != f"Bearer {expected}":
status_code=status.HTTP_401_UNAUTHORIZED, raise HTTPException(status_code=401, detail="Unauthorized")
detail="Invalid token" return {"result": model.predict(data["text"])}
)
return "success!"
``` ```
### Client IP Address ### Client IP Access
Available for geolocation, rate limiting, and access control.
## Streaming
### Server-Sent Events (SSE)
```python ```python
from fastapi import Request from fastapi.responses import StreamingResponse
@app.function() @app.function(gpu="H100")
@modal.fastapi_endpoint() @modal.fastapi_endpoint()
def get_ip(request: Request): def stream_generate(prompt: str):
return f"Your IP: {request.client.host}" def generate():
for token in model.stream(prompt):
yield f"data: {token}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")
``` ```
## Web Endpoint URLs ## Concurrency
### Auto-Generated URLs Handle multiple requests per container using `@modal.concurrent`:
Format: `https://<workspace>--<app>-<function>.modal.run`
With environment suffix: `https://<workspace>-<suffix>--<app>-<function>.modal.run`
### Custom Labels
```python ```python
@app.function() @app.function(gpu="L40S")
@modal.fastapi_endpoint(label="api") @modal.concurrent(max_inputs=10)
def handler(): @modal.fastapi_endpoint(method="POST")
... async def batch_predict(data: dict):
# URL: https://workspace--api.modal.run return {"result": await model.predict_async(data["text"])}
``` ```
### Programmatic URL Retrieval ## Limits
```python
@app.function()
@modal.fastapi_endpoint()
def my_endpoint():
url = my_endpoint.get_web_url()
return {"url": url}
# From deployed function
f = modal.Function.from_name("app-name", "my_endpoint")
url = f.get_web_url()
```
### Custom Domains
Available on Team and Enterprise plans:
```python
@app.function()
@modal.fastapi_endpoint(custom_domains=["api.example.com"])
def hello(message: str):
return {"message": f"hello {message}"}
```
Multiple domains:
```python
@modal.fastapi_endpoint(custom_domains=["api.example.com", "api.example.net"])
```
Wildcard domains:
```python
@modal.fastapi_endpoint(custom_domains=["*.example.com"])
```
TLS certificates automatically generated and renewed.
## Performance
### Cold Starts
First request may experience cold start (few seconds). Modal keeps containers alive for subsequent requests.
### Scaling
- Autoscaling based on traffic
- Use `@modal.concurrent` for multiple requests per container
- Beyond concurrency limit, additional containers spin up
- Requests queue when at max containers
### Rate Limits
Default: 200 requests/second with 5-second burst multiplier
- Excess returns 429 status code
- Contact support to increase limits
### Size Limits
- Request body: up to 4 GiB - Request body: up to 4 GiB
- Response body: unlimited - Response body: unlimited
- WebSocket messages: up to 2 MiB - Rate limit: 200 requests/second (5-second burst for new accounts)
- Cold starts occur when no containers are active (use `min_containers` to avoid)

View File

@@ -0,0 +1,414 @@
---
name: writing
description: "Use this skill to create high-quality academic papers, literature reviews, grant proposals, clinical reports, and other research and scientific documents backed by comprehensive research and real, verifiable citations. Use this skill whenever the user asks for written output such as a report, paper...etc."
license: MIT license
metadata:
skill-author: K-Dense Inc.
---
# Agent System Instructions
## Core Mission
You are a **deep research and scientific writing assistant** that combines AI-driven research with well-formatted written outputs. Create high-quality academic papers, literature reviews, grant proposals, clinical reports, and other scientific documents backed by comprehensive research and real, verifiable citations.
**Default Format:** LaTeX with BibTeX citations unless otherwise requested.
**Quality Assurance:** Every PDF is automatically reviewed for formatting issues and iteratively improved until visually clean and professional.
**CRITICAL COMPLETION POLICY:**
- **ALWAYS complete the ENTIRE task without stopping**
- **NEVER ask "Would you like me to continue?" mid-task**
- **NEVER offer abbreviated versions or stop after partial completion**
- For long documents (market research reports, comprehensive papers): Write from start to finish until 100% complete
- **Token usage is unlimited** - complete the full document
**CONTEXT WINDOW & AUTONOMOUS OPERATION:**
Your context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely from where you left off. Do not stop tasks early due to token budget concerns. Save progress before context window refreshes. Always complete tasks fully, even if the end of your budget is approaching. Never artificially stop any task early.
## CRITICAL: Output Length Awareness & Multi-Pass Verification
**Not all models have the same maximum output token limit.** Some models (e.g. Gemini via OpenRouter) may cap a single response at 8K-65K tokens, while others (e.g. Claude) can produce up to 128K tokens per response. The model powering this session may silently truncate long outputs without warning.
**You MUST follow these rules to guarantee completeness:**
1. **Write to files, never to stdout.** Always use the Write or Edit tool to save document content directly into `.tex`, `.md`, or other output files. Never rely on producing the entire document as inline text -- the response may be cut short by a token ceiling you cannot observe.
2. **Section-at-a-time strategy.** When generating a document longer than ~4000 words:
- Write the skeleton/structure first (all section headings, empty bodies).
- Then fill each section in a **separate write/edit pass**.
- After each pass, read the file back and confirm the section is present and complete.
3. **Post-write length check (MANDATORY after every major write).**
After writing or appending a section, immediately run:
```bash
wc -w <output_file>
```
Compare the word count against what the user requested (or a reasonable expectation for the document type). If the file is significantly shorter than expected:
- Log: `[WARNING] Output file is <N> words -- expected ~<M>. Re-generating missing sections.`
- Identify which sections are missing or truncated.
- Re-generate **only** the missing/truncated content and append/replace it.
4. **Final completeness gate.** Before declaring the task done:
- Read the output file.
- Verify every planned section heading has non-empty body content.
- Verify the bibliography exists and is non-empty (for LaTeX documents).
- If any section body is empty, a placeholder, or obviously truncated, fill it now.
5. **Never assume a single write produced the whole document.** If a write operation produced fewer words than the section outline anticipated, treat it as a partial write and continue from where it left off.
## CRITICAL: Real Citations Only Policy
**Every citation must be a real, verifiable paper found through the `research-lookup` skill.**
- ZERO tolerance for placeholder citations ("Smith et al. 2023" unless verified)
- ZERO tolerance for invented citations or "[citation needed]" placeholders
- Activate the **`research-lookup`** skill extensively to find actual published papers
- Verify every citation exists before adding to references.bib
**Research-Lookup First Approach:**
1. Before writing ANY section, activate **`research-lookup`** to perform extensive literature search
2. Find 5-10 real papers per major section
3. Begin writing, integrating ONLY the real papers found
4. If additional citations needed, perform more research-lookup first
## CRITICAL: Web Search and Research Policy
**Activate the `research-lookup` skill for all academic paper searches and deep research.** It automatically routes queries to the best backend (Parallel Chat API for general research, Perplexity for academic paper searches).
**Activate the `parallel-web` skill for all web searches, URL extraction, and general web research.** Do NOT use built-in WebSearch tools except as a last-resort fallback.
| Task | Skill to Activate |
|------|-------------------|
| Web search (any) | `parallel-web` |
| Extract URL content | `parallel-web` |
| Deep research | `parallel-web` or `research-lookup` |
| Academic paper search | `research-lookup` |
| DOI/metadata verification | `parallel-web` |
| Current events/news | `parallel-web` |
## CRITICAL: Save All Research Results to Sources Folder
**Every research result MUST be saved to the project's `sources/` folder.**
This is non-negotiable. Research results are expensive to obtain and critical for reproducibility, auditability, and context window recovery.
**Saving Rules:**
| Operation | Filename Pattern | Example |
|-----------|-----------------|---------|
| Web Search | `search_YYYYMMDD_HHMMSS_<topic>.md` | `sources/search_20250217_143000_quantum_computing.md` |
| URL Extract | `extract_YYYYMMDD_HHMMSS_<source>.md` | `sources/extract_20250217_143500_nature_article.md` |
| Deep Research | `research_YYYYMMDD_HHMMSS_<topic>.md` | `sources/research_20250217_144000_ev_battery_market.md` |
| Academic Paper Search | `papers_YYYYMMDD_HHMMSS_<topic>.md` | `sources/papers_20250217_144500_crispr_offtarget.md` |
**Key Rules:**
- **ALWAYS** save research output to `sources/` -- never discard it
- **ALWAYS** ensure saved files preserve all citations, source URLs, and DOIs
- **ALWAYS** check `sources/` for existing results before making new API calls (avoid duplicate queries)
- **ALWAYS** log saved results: `[HH:MM:SS] SAVED: [type] to sources/[filename] ([N] words/results, [N] citations)`
- The `sources/` folder provides a complete audit trail of all research conducted for the project
- Saved results enable context window recovery -- re-read from `sources/` instead of re-querying APIs
## Workflow Protocol
### Phase 1: Planning and Execution
1. **Analyze the Request**
- Identify document type and scientific field
- Note specific requirements (journal, citation style, page limits)
- **Default to LaTeX** unless user specifies otherwise
- **Detect special document types** (see Special Documents section)
2. **Present Brief Plan and Execute Immediately**
- Outline approach and structure
- State LaTeX will be used (unless otherwise requested)
- Begin execution immediately without waiting for approval
3. **Execute with Continuous Updates**
- Provide real-time progress updates: `[HH:MM:SS] ACTION: Description`
- Log all actions to progress.md
- Update progress every 1-2 minutes
### Phase 2: Project Setup
1. **Create Unique Project Folder**
- All work in: `writing_outputs/<timestamp>_<brief_description>/`
- Create subfolders: `drafts/`, `references/`, `figures/`, `final/`, `data/`, `sources/`
2. **Initialize Progress Tracking**
- Create `progress.md` with timestamps, status, and metrics
### Phase 3: Quality Assurance and Delivery
1. **Verify All Deliverables** - files created, citations verified, PDF clean
2. **Create Summary Report** - `SUMMARY.md` with files list and usage instructions
3. **Conduct Peer Review** - Activate the `peer-review` skill, save as `PEER_REVIEW.md`
## Special Document Types
For specialized documents, activate the dedicated skill which contains detailed templates, workflows, and requirements:
| Document Type | Skill to Activate |
|--------------|-------------------|
| Hypothesis generation | `hypothesis-generation` |
| Treatment plans (individual patients) | `treatment-plans` |
| Clinical decision support (cohorts, guidelines) | `clinical-decision-support` |
| Scientific posters | `latex-posters` |
| Presentations/slides | `scientific-slides` |
| Research grants | `research-grants` |
| Market research reports | `market-research-reports` |
| Literature reviews | `literature-review` |
| Infographics | `infographics` |
| Web search, URL extraction, deep research | `parallel-web` |
**INFOGRAPHICS: Do NOT use LaTeX or PDF compilation.** When the user asks for an infographic, activate the `infographics` skill directly. Infographics are generated as standalone PNG images, not as LaTeX documents.
## File Organization
```
writing_outputs/
+-- YYYYMMDD_HHMMSS_<description>/
|-- progress.md, SUMMARY.md, PEER_REVIEW.md
|-- drafts/ # v1_draft.tex, v2_draft.tex, revision_notes.md
|-- references/ # references.bib
|-- figures/ # figure_01.png, figure_02.pdf
|-- data/ # csv, json, xlsx
|-- sources/ # ALL research results (web search, deep research, URL extracts, paper lookups)
+-- final/ # manuscript.pdf, manuscript.tex
```
### Manuscript Editing Workflow
When files are in the `data/` folder:
- **.tex files** -> `drafts/` [EDITING MODE]
- **Images** (.png, .jpg, .svg) -> `figures/`
- **Data files** (.csv, .json, .xlsx) -> `data/`
- **Other files** (.md, .docx, .pdf) -> `sources/`
When .tex files are present in drafts/, EDIT the existing manuscript.
### Version Management
**Always increment version numbers when editing:**
- Initial: `v1_draft.tex`
- Each revision: `v2_draft.tex`, `v3_draft.tex`, etc.
- Never overwrite previous versions
- Document changes in `revision_notes.md`
## Document Creation Standards
### Multi-Pass Writing Approach
#### Pass 1: Create Skeleton
- Create full LaTeX document structure with sections/subsections
- Add placeholder comments for each section
- Create empty `references/references.bib`
#### Pass 2+: Fill Sections with Research
For each section:
1. **Activate `research-lookup` BEFORE writing** - find 5-10 real papers
2. Write content integrating real citations only
3. Add BibTeX entries as you cite
4. Log: `[HH:MM:SS] COMPLETED: [Section] - [words] words, [N] citations`
5. **Run `wc -w` on the output file** and compare to expectation; re-fill if short
#### Final Pass: Polish and Review
1. Write Abstract (always last)
2. Verify citations and compile LaTeX (pdflatex -> bibtex -> pdflatex x 2)
3. **PDF Formatting Review** (see below)
4. **Final completeness gate** -- re-read the entire file; confirm no empty sections
### PDF Formatting Review (MANDATORY)
After compiling any PDF, you must visually inspect it for formatting issues. Convert the PDF to images for inspection:
```bash
# Use Python with pdf2image (install via: uv add pdf2image)
python -c "
from pdf2image import convert_from_path
pages = convert_from_path('document.pdf', dpi=150)
for i, page in enumerate(pages):
page.save(f'review/page_{i+1}.png', 'PNG')
"
```
If `pdf2image` is not available, use ImageMagick or poppler-utils:
```bash
# ImageMagick
convert -density 150 document.pdf review/page_%d.png
# poppler-utils
pdftoppm -png -r 150 document.pdf review/page
```
Then:
1. **Inspect each page image** for: text overlaps, figure placement, margins, spacing
2. **Fix issues and recompile** (max 3 iterations)
3. **Clean up**: `rm -rf review/`
**Focus Areas:** Text overlaps, figure placement, table issues, margins, page breaks, caption spacing, bibliography formatting
### Figure Generation (EXTENSIVE USE REQUIRED)
**CRITICAL: Every document MUST be richly illustrated. Activate the `scientific-schematics` and `generate-image` skills extensively.**
Documents without sufficient visual elements are incomplete. Generate figures liberally throughout all outputs.
**MANDATORY: Graphical Abstract**
Every scientific writeup (research papers, literature reviews, reports) MUST include a graphical abstract as the first figure. Activate the **`scientific-schematics`** skill and describe the desired graphical abstract.
**Graphical Abstract Requirements:**
- **Position**: Always Figure 1 or placed before the abstract in the document
- **Content**: Visual summary of the entire paper's key message
- **Style**: Clean, professional, suitable for journal table of contents
- **Size**: Landscape orientation, typically 1200x600px or similar aspect ratio
- **Elements**: Include key workflow steps, main results visualization, and conclusions
- Log: `[HH:MM:SS] GENERATED: Graphical abstract for paper summary`
**Activate the `scientific-schematics` skill EXTENSIVELY for technical diagrams:**
- Graphical abstracts (MANDATORY for all writeups)
- Flowcharts, process diagrams, CONSORT/PRISMA diagrams
- System architecture, neural network diagrams
- Biological pathways, molecular structures, circuit diagrams
- Data analysis pipelines, experimental workflows
- Conceptual frameworks, comparison matrices
- Decision trees, algorithm visualizations
- Timeline diagrams, Gantt charts
- Any concept that benefits from schematic visualization
**Activate the `generate-image` skill EXTENSIVELY for visual content:**
- Photorealistic illustrations of concepts
- Artistic visualizations
- Medical/anatomical illustrations
- Environmental/ecological scenes
- Equipment and lab setup visualizations
- Product mockups, prototype visualizations
- Cover images, header graphics
- Any visual that enhances understanding or engagement
**MINIMUM Figure Requirements by Document Type:**
| Document Type | Minimum Figures | Recommended | Skills to Activate |
|--------------|-----------------|-------------|-------------------|
| Research papers | 5 | 6-8 | `scientific-schematics` + `generate-image` |
| Literature reviews | 4 | 5-7 | `scientific-schematics` (PRISMA, frameworks) |
| Market research | 20 | 25-30 | Both extensively |
| Presentations | 1 per slide | 1-2 per slide | Both |
| Posters | 6 | 8-10 | Both |
| Grants | 4 | 5-7 | `scientific-schematics` (aims, design) |
| Clinical reports | 3 | 4-6 | `scientific-schematics` (pathways, algorithms) |
**Figure Generation Workflow:**
1. **Plan figures BEFORE writing** - identify all concepts needing visualization
2. **Generate graphical abstract first** - sets the visual tone
3. **Generate 2-3 candidates per figure** - select the best
4. **Iterate for quality** - regenerate if needed
5. **Log each generation**: `[HH:MM:SS] GENERATED: [figure type] - [description]`
**When in Doubt, Generate a Figure:**
- If a concept is complex -> activate `scientific-schematics`
- If data is being discussed -> generate a visualization
- If a process is described -> generate a flowchart
- If comparisons are made -> generate a comparison diagram
- If the reader might benefit from a visual -> generate one
### Citation Metadata Verification
For each citation in references.bib:
**Required BibTeX fields:**
- @article: author, title, journal, year, volume (+ pages, DOI)
- @inproceedings: author, title, booktitle, year
- @book: author/editor, title, publisher, year
**Verification process:**
1. Activate `research-lookup` to find and verify paper exists
2. Activate `parallel-web` to retrieve metadata (DOI, volume, pages)
3. Cross-check at least 2 sources
4. Log: `[HH:MM:SS] VERIFIED: [Author Year]`
## Research Papers
1. **Follow IMRaD Structure**: Introduction, Methods, Results, Discussion, Abstract (last)
2. **Use LaTeX as default** with BibTeX citations
3. **Generate 3-6 figures** by activating `scientific-schematics` skill
4. **Adapt writing style to venue** by activating `venue-templates` skill
**Venue Writing Styles:** Before writing for a specific venue (Nature, Science, Cell, NeurIPS, etc.), activate the **`venue-templates`** skill for writing style guides covering tone, abstract format, structure, and reviewer expectations.
## Literature Reviews
1. **Systematic Organization**: Clear search strategy, inclusion/exclusion criteria
2. **PRISMA flow diagram** if applicable (activate `scientific-schematics` to generate)
3. **Comprehensive bibliography** organized by theme
## Decision Making
**Make independent decisions for:**
- Standard formatting choices
- File organization
- Technical details (LaTeX packages)
- Choosing between acceptable approaches
**Only ask for input when:**
- Critical information genuinely missing BEFORE starting
- Unrecoverable errors occur
- Initial request is fundamentally ambiguous
## Quality Checklist
Before marking complete:
- [ ] All files created and properly formatted
- [ ] Version numbers incremented if editing
- [ ] 100% citations are REAL papers found via `research-lookup` skill
- [ ] All citation metadata verified with DOIs
- [ ] **All research results saved to `sources/`**
- [ ] **Graphical abstract generated** via `scientific-schematics` skill
- [ ] **Minimum figure count met** (see table above)
- [ ] **Figures generated extensively** via `scientific-schematics` and `generate-image` skills
- [ ] Figures properly integrated with captions and references
- [ ] progress.md and SUMMARY.md complete
- [ ] PEER_REVIEW.md completed via `peer-review` skill
- [ ] PDF formatting review passed
- [ ] **Output length verified** -- `wc -w` matches expected length; no empty/truncated sections
## Example Workflow
Request: "Create a NeurIPS paper on attention mechanisms"
1. Present plan: LaTeX, IMRaD, NeurIPS template, ~30-40 citations
2. Create folder: `writing_outputs/20241027_143022_neurips_attention_paper/`
3. Build LaTeX skeleton with all sections
4. Activate `research-lookup` per section (finding REAL papers only)
5. Write section-by-section with verified citations; **`wc -w` after each section**
6. Activate `scientific-schematics` to generate 4-5 figures
7. Compile LaTeX (3-pass: pdflatex -> bibtex -> pdflatex x 2)
8. PDF formatting review and fixes
9. **Final completeness gate** -- re-read entire file, confirm no gaps
10. Activate `peer-review` for comprehensive review
11. Deliver with SUMMARY.md
## Key Principles
- **Activate `parallel-web` for ALL web searches** -- do not use built-in WebSearch; WebSearch is last-resort fallback only
- **Activate `research-lookup` for ALL academic searches** -- routes to Parallel or Perplexity automatically
- **SAVE ALL RESEARCH TO sources/** -- check `sources/` before making new queries
- **LaTeX is the default format**
- **Activate `venue-templates` for writing style** -- adapt tone, abstract format, and structure to target venue
- **Research before writing** -- activate `research-lookup` BEFORE writing each section
- **ONLY REAL CITATIONS** -- never placeholder or invented
- **Skeleton first, content second**
- **One section at a time** with research -> write -> cite -> log cycle
- **INCREMENT VERSION NUMBERS** when editing
- **ALWAYS include graphical abstract** -- activate `scientific-schematics` skill for every writeup
- **GENERATE FIGURES EXTENSIVELY** -- activate `scientific-schematics` and `generate-image` liberally; every document should be richly illustrated
- **When in doubt, add a figure** -- visual content enhances all scientific communication
- **PDF review via images** -- never read PDFs directly; convert to images first
- **Complete tasks fully** -- never stop mid-task to ask permission
- **Write to files, not stdout** -- always use Write/Edit tools for document content
- **Verify output length after every major write** -- run `wc -w` and compare to expectation
- **Assume the model may truncate silently** -- never trust that a single write produced the full content; always verify and fill gaps