claude-scientific-skills/scientific-skills/modal/references/resources.md

# Modal Resource Configuration

## CPU

### Requesting CPU

```python
@app.function(cpu=4.0)
def compute():
    ...
```

- Values are **physical cores**, not vCPUs
- Default: 0.125 cores
- Modal auto-sets `OPENBLAS_NUM_THREADS`, `OMP_NUM_THREADS`, `MKL_NUM_THREADS` based on your CPU request

### CPU Limits

- Default soft limit: 16 physical cores above the CPU request
- Set explicit limits to prevent noisy-neighbor effects:

```python
@app.function(cpu=4.0)  # Request 4 cores
def bounded_compute():
    ...
```

## Memory

### Requesting Memory

```python
@app.function(memory=16384)  # 16 GiB in MiB
def large_data():
    ...
```

- Value in **MiB** (megabytes)
- Default: 128 MiB

### Memory Limits

Set hard memory limits to OOM-kill containers that exceed them:

```python
@app.function(memory=8192)  # 8 GiB request and limit
def bounded_memory():
    ...
```

This prevents paying for runaway memory leaks.

## Ephemeral Disk

For temporary storage within a container's lifetime:

```python
@app.function(ephemeral_disk=102400)  # 100 GiB in MiB
def process_dataset():
    # Temporary files at /tmp or anywhere in the container filesystem
    ...
```

- Value in **MiB**
- Default: 512 GiB quota per container
- Maximum: 3,145,728 MiB (3 TiB)
- Data is lost when the container shuts down
- Use Volumes for persistent storage

Larger disk requests increase the memory request at a 20:1 ratio for billing purposes.

## Timeout

```python
@app.function(timeout=3600)  # 1 hour in seconds
def long_running():
    ...
```

- Default: 300 seconds (5 minutes)
- Maximum: 86,400 seconds (24 hours)
- Function is killed when timeout expires

## Billing

You are charged based on **whichever is higher**: your resource request or actual usage.

| Resource | Billing Basis |
|----------|--------------|
| CPU | max(requested, used) |
| Memory | max(requested, used) |
| GPU | Time GPU is allocated |
| Disk | Increases memory billing at 20:1 ratio |

### Cost Optimization Tips

- Request only what you need
- Use appropriate GPU tiers (L40S over H100 for inference)
- Set `scaledown_window` to minimize idle time
- Use `min_containers=0` when cold starts are acceptable
- Batch inputs with `.map()` instead of individual `.remote()` calls

## Complete Example

```python
@app.function(
    cpu=8.0,              # 8 physical cores
    memory=32768,         # 32 GiB
    gpu="L40S",           # L40S GPU
    ephemeral_disk=204800, # 200 GiB temp disk
    timeout=7200,         # 2 hours
    max_containers=50,
    min_containers=1,
)
def full_pipeline(data_path: str):
    ...
```