Files
claude-scientific-skills/scientific-skills/modal/references/resources.md
2026-03-23 16:21:31 -07:00

118 lines
2.5 KiB
Markdown

# Modal Resource Configuration
## CPU
### Requesting CPU
```python
@app.function(cpu=4.0)
def compute():
...
```
- Values are **physical cores**, not vCPUs
- Default: 0.125 cores
- Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request
### CPU Limits
- Default soft limit: 16 physical cores above the CPU request
- Set explicit limits to prevent noisy-neighbor effects:
```python
@app.function(cpu=4.0) # Request 4 cores
def bounded_compute():
...
```
## Memory
### Requesting Memory
```python
@app.function(memory=16384) # 16 GiB in MiB
def large_data():
...
```
- Value in **MiB** (megabytes)
- Default: 128 MiB
### Memory Limits
Set hard memory limits to OOM-kill containers that exceed them:
```python
@app.function(memory=8192) # 8 GiB request and limit
def bounded_memory():
...
```
This prevents paying for runaway memory leaks.
## Ephemeral Disk
For temporary storage within a container's lifetime:
```python
@app.function(ephemeral_disk=102400) # 100 GiB in MiB
def process_dataset():
# Temporary files at /tmp or anywhere in the container filesystem
...
```
- Value in **MiB**
- Default: 512 GiB quota per container
- Maximum: 3,145,728 MiB (3 TiB)
- Data is lost when the container shuts down
- Use Volumes for persistent storage
Larger disk requests increase the memory request at a 20:1 ratio for billing purposes.
## Timeout
```python
@app.function(timeout=3600) # 1 hour in seconds
def long_running():
...
```
- Default: 300 seconds (5 minutes)
- Maximum: 86,400 seconds (24 hours)
- Function is killed when timeout expires
## Billing
You are charged based on **whichever is higher**: your resource request or actual usage.
| Resource | Billing Basis |
|----------|--------------|
| CPU | max(requested, used) |
| Memory | max(requested, used) |
| GPU | Time GPU is allocated |
| Disk | Increases memory billing at 20:1 ratio |
### Cost Optimization Tips
- Request only what you need
- Use appropriate GPU tiers (L40S over H100 for inference)
- Set `scaledown_window` to minimize idle time
- Use `min_containers=0` when cold starts are acceptable
- Batch inputs with `.map()` instead of individual `.remote()` calls
## Complete Example
```python
@app.function(
cpu=8.0, # 8 physical cores
memory=32768, # 32 GiB
gpu="L40S", # L40S GPU
ephemeral_disk=204800, # 200 GiB temp disk
timeout=7200, # 2 hours
max_containers=50,
min_containers=1,
)
def full_pipeline(data_path: str):
...
```