mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-27 07:09:27 +08:00
435 lines
13 KiB
Markdown
435 lines
13 KiB
Markdown
---
|
|
name: pufferlib
|
|
description: High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
|
|
license: MIT license
|
|
metadata:
|
|
skill-author: K-Dense Inc.
|
|
---
|
|
|
|
# PufferLib - High-Performance Reinforcement Learning
|
|
|
|
## Overview
|
|
|
|
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.
|
|
|
|
## When to Use This Skill
|
|
|
|
Use this skill when:
|
|
- **Training RL agents** with PPO on any environment (single or multi-agent)
|
|
- **Creating custom environments** using the PufferEnv API
|
|
- **Optimizing performance** for parallel environment simulation (vectorization)
|
|
- **Integrating existing environments** from Gymnasium, PettingZoo, Atari, Procgen, etc.
|
|
- **Developing policies** with CNN, LSTM, or custom architectures
|
|
- **Scaling RL** to millions of steps per second for faster experimentation
|
|
- **Multi-agent RL** with native multi-agent environment support
|
|
|
|
## Core Capabilities
|
|
|
|
### 1. High-Performance Training (PuffeRL)
|
|
|
|
PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.
|
|
|
|
**Quick start training:**
|
|
```bash
|
|
# CLI training
|
|
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
|
|
|
|
# Distributed training
|
|
torchrun --nproc_per_node=4 train.py
|
|
```
|
|
|
|
**Python training loop:**
|
|
```python
|
|
import pufferlib
|
|
from pufferlib import PuffeRL
|
|
|
|
# Create vectorized environment
|
|
env = pufferlib.make('procgen-coinrun', num_envs=256)
|
|
|
|
# Create trainer
|
|
trainer = PuffeRL(
|
|
env=env,
|
|
policy=my_policy,
|
|
device='cuda',
|
|
learning_rate=3e-4,
|
|
batch_size=32768
|
|
)
|
|
|
|
# Training loop
|
|
for iteration in range(num_iterations):
|
|
trainer.evaluate() # Collect rollouts
|
|
trainer.train() # Train on batch
|
|
trainer.mean_and_log() # Log results
|
|
```
|
|
|
|
**For comprehensive training guidance**, read `references/training.md` for:
|
|
- Complete training workflow and CLI options
|
|
- Hyperparameter tuning with Protein
|
|
- Distributed multi-GPU/multi-node training
|
|
- Logger integration (Weights & Biases, Neptune)
|
|
- Checkpointing and resume training
|
|
- Performance optimization tips
|
|
- Curriculum learning patterns
|
|
|
|
### 2. Environment Development (PufferEnv)
|
|
|
|
Create custom high-performance environments with the PufferEnv API.
|
|
|
|
**Basic environment structure:**
|
|
```python
|
|
import numpy as np
|
|
from pufferlib import PufferEnv
|
|
|
|
class MyEnvironment(PufferEnv):
|
|
def __init__(self, buf=None):
|
|
super().__init__(buf)
|
|
|
|
# Define spaces
|
|
self.observation_space = self.make_space((4,))
|
|
self.action_space = self.make_discrete(4)
|
|
|
|
self.reset()
|
|
|
|
def reset(self):
|
|
# Reset state and return initial observation
|
|
return np.zeros(4, dtype=np.float32)
|
|
|
|
def step(self, action):
|
|
# Execute action, compute reward, check done
|
|
obs = self._get_observation()
|
|
reward = self._compute_reward()
|
|
done = self._is_done()
|
|
info = {}
|
|
|
|
return obs, reward, done, info
|
|
```
|
|
|
|
**Use the template script:** `scripts/env_template.py` provides complete single-agent and multi-agent environment templates with examples of:
|
|
- Different observation space types (vector, image, dict)
|
|
- Action space variations (discrete, continuous, multi-discrete)
|
|
- Multi-agent environment structure
|
|
- Testing utilities
|
|
|
|
**For complete environment development**, read `references/environments.md` for:
|
|
- PufferEnv API details and in-place operation patterns
|
|
- Observation and action space definitions
|
|
- Multi-agent environment creation
|
|
- Ocean suite (20+ pre-built environments)
|
|
- Performance optimization (Python to C workflow)
|
|
- Environment wrappers and best practices
|
|
- Debugging and validation techniques
|
|
|
|
### 3. Vectorization and Performance
|
|
|
|
Achieve maximum throughput with optimized parallel simulation.
|
|
|
|
**Vectorization setup:**
|
|
```python
|
|
import pufferlib
|
|
|
|
# Automatic vectorization
|
|
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)
|
|
|
|
# Performance benchmarks:
|
|
# - Pure Python envs: 100k-500k SPS
|
|
# - C-based envs: 100M+ SPS
|
|
# - With training: 400k-4M total SPS
|
|
```
|
|
|
|
**Key optimizations:**
|
|
- Shared memory buffers for zero-copy observation passing
|
|
- Busy-wait flags instead of pipes/queues
|
|
- Surplus environments for async returns
|
|
- Multiple environments per worker
|
|
|
|
**For vectorization optimization**, read `references/vectorization.md` for:
|
|
- Architecture and performance characteristics
|
|
- Worker and batch size configuration
|
|
- Serial vs multiprocessing vs async modes
|
|
- Shared memory and zero-copy patterns
|
|
- Hierarchical vectorization for large scale
|
|
- Multi-agent vectorization strategies
|
|
- Performance profiling and troubleshooting
|
|
|
|
### 4. Policy Development
|
|
|
|
Build policies as standard PyTorch modules with optional utilities.
|
|
|
|
**Basic policy structure:**
|
|
```python
|
|
import torch.nn as nn
|
|
from pufferlib.pytorch import layer_init
|
|
|
|
class Policy(nn.Module):
|
|
def __init__(self, observation_space, action_space):
|
|
super().__init__()
|
|
|
|
# Encoder
|
|
self.encoder = nn.Sequential(
|
|
layer_init(nn.Linear(obs_dim, 256)),
|
|
nn.ReLU(),
|
|
layer_init(nn.Linear(256, 256)),
|
|
nn.ReLU()
|
|
)
|
|
|
|
# Actor and critic heads
|
|
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
|
|
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
|
|
|
|
def forward(self, observations):
|
|
features = self.encoder(observations)
|
|
return self.actor(features), self.critic(features)
|
|
```
|
|
|
|
**For complete policy development**, read `references/policies.md` for:
|
|
- CNN policies for image observations
|
|
- Recurrent policies with optimized LSTM (3x faster inference)
|
|
- Multi-input policies for complex observations
|
|
- Continuous action policies
|
|
- Multi-agent policies (shared vs independent parameters)
|
|
- Advanced architectures (attention, residual)
|
|
- Observation normalization and gradient clipping
|
|
- Policy debugging and testing
|
|
|
|
### 5. Environment Integration
|
|
|
|
Seamlessly integrate environments from popular RL frameworks.
|
|
|
|
**Gymnasium integration:**
|
|
```python
|
|
import gymnasium as gym
|
|
import pufferlib
|
|
|
|
# Wrap Gymnasium environment
|
|
gym_env = gym.make('CartPole-v1')
|
|
env = pufferlib.emulate(gym_env, num_envs=256)
|
|
|
|
# Or use make directly
|
|
env = pufferlib.make('gym-CartPole-v1', num_envs=256)
|
|
```
|
|
|
|
**PettingZoo multi-agent:**
|
|
```python
|
|
# Multi-agent environment
|
|
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
|
|
```
|
|
|
|
**Supported frameworks:**
|
|
- Gymnasium / OpenAI Gym
|
|
- PettingZoo (parallel and AEC)
|
|
- Atari (ALE)
|
|
- Procgen
|
|
- NetHack / MiniHack
|
|
- Minigrid
|
|
- Neural MMO
|
|
- Crafter
|
|
- GPUDrive
|
|
- MicroRTS
|
|
- Griddly
|
|
- And more...
|
|
|
|
**For integration details**, read `references/integration.md` for:
|
|
- Complete integration examples for each framework
|
|
- Custom wrappers (observation, reward, frame stacking, action repeat)
|
|
- Space flattening and unflattening
|
|
- Environment registration
|
|
- Compatibility patterns
|
|
- Performance considerations
|
|
- Integration debugging
|
|
|
|
## Quick Start Workflow
|
|
|
|
### For Training Existing Environments
|
|
|
|
1. Choose environment from Ocean suite or compatible framework
|
|
2. Use `scripts/train_template.py` as starting point
|
|
3. Configure hyperparameters for your task
|
|
4. Run training with CLI or Python script
|
|
5. Monitor with Weights & Biases or Neptune
|
|
6. Refer to `references/training.md` for optimization
|
|
|
|
### For Creating Custom Environments
|
|
|
|
1. Start with `scripts/env_template.py`
|
|
2. Define observation and action spaces
|
|
3. Implement `reset()` and `step()` methods
|
|
4. Test environment locally
|
|
5. Vectorize with `pufferlib.emulate()` or `make()`
|
|
6. Refer to `references/environments.md` for advanced patterns
|
|
7. Optimize with `references/vectorization.md` if needed
|
|
|
|
### For Policy Development
|
|
|
|
1. Choose architecture based on observations:
|
|
- Vector observations → MLP policy
|
|
- Image observations → CNN policy
|
|
- Sequential tasks → LSTM policy
|
|
- Complex observations → Multi-input policy
|
|
2. Use `layer_init` for proper weight initialization
|
|
3. Follow patterns in `references/policies.md`
|
|
4. Test with environment before full training
|
|
|
|
### For Performance Optimization
|
|
|
|
1. Profile current throughput (steps per second)
|
|
2. Check vectorization configuration (num_envs, num_workers)
|
|
3. Optimize environment code (in-place ops, numpy vectorization)
|
|
4. Consider C implementation for critical paths
|
|
5. Use `references/vectorization.md` for systematic optimization
|
|
|
|
## Resources
|
|
|
|
### scripts/
|
|
|
|
**train_template.py** - Complete training script template with:
|
|
- Environment creation and configuration
|
|
- Policy initialization
|
|
- Logger integration (WandB, Neptune)
|
|
- Training loop with checkpointing
|
|
- Command-line argument parsing
|
|
- Multi-GPU distributed training setup
|
|
|
|
**env_template.py** - Environment implementation templates:
|
|
- Single-agent PufferEnv example (grid world)
|
|
- Multi-agent PufferEnv example (cooperative navigation)
|
|
- Multiple observation/action space patterns
|
|
- Testing utilities
|
|
|
|
### references/
|
|
|
|
**training.md** - Comprehensive training guide:
|
|
- Training workflow and CLI options
|
|
- Hyperparameter configuration
|
|
- Distributed training (multi-GPU, multi-node)
|
|
- Monitoring and logging
|
|
- Checkpointing
|
|
- Protein hyperparameter tuning
|
|
- Performance optimization
|
|
- Common training patterns
|
|
- Troubleshooting
|
|
|
|
**environments.md** - Environment development guide:
|
|
- PufferEnv API and characteristics
|
|
- Observation and action spaces
|
|
- Multi-agent environments
|
|
- Ocean suite environments
|
|
- Custom environment development workflow
|
|
- Python to C optimization path
|
|
- Third-party environment integration
|
|
- Wrappers and best practices
|
|
- Debugging
|
|
|
|
**vectorization.md** - Vectorization optimization:
|
|
- Architecture and key optimizations
|
|
- Vectorization modes (serial, multiprocessing, async)
|
|
- Worker and batch configuration
|
|
- Shared memory and zero-copy patterns
|
|
- Advanced vectorization (hierarchical, custom)
|
|
- Multi-agent vectorization
|
|
- Performance monitoring and profiling
|
|
- Troubleshooting and best practices
|
|
|
|
**policies.md** - Policy architecture guide:
|
|
- Basic policy structure
|
|
- CNN policies for images
|
|
- LSTM policies with optimization
|
|
- Multi-input policies
|
|
- Continuous action policies
|
|
- Multi-agent policies
|
|
- Advanced architectures (attention, residual)
|
|
- Observation processing and unflattening
|
|
- Initialization and normalization
|
|
- Debugging and testing
|
|
|
|
**integration.md** - Framework integration guide:
|
|
- Gymnasium integration
|
|
- PettingZoo integration (parallel and AEC)
|
|
- Third-party environments (Procgen, NetHack, Minigrid, etc.)
|
|
- Custom wrappers (observation, reward, frame stacking, etc.)
|
|
- Space conversion and unflattening
|
|
- Environment registration
|
|
- Compatibility patterns
|
|
- Performance considerations
|
|
- Debugging integration
|
|
|
|
## Tips for Success
|
|
|
|
1. **Start simple**: Begin with Ocean environments or Gymnasium integration before creating custom environments
|
|
|
|
2. **Profile early**: Measure steps per second from the start to identify bottlenecks
|
|
|
|
3. **Use templates**: `scripts/train_template.py` and `scripts/env_template.py` provide solid starting points
|
|
|
|
4. **Read references as needed**: Each reference file is self-contained and focused on a specific capability
|
|
|
|
5. **Optimize progressively**: Start with Python, profile, then optimize critical paths with C if needed
|
|
|
|
6. **Leverage vectorization**: PufferLib's vectorization is key to achieving high throughput
|
|
|
|
7. **Monitor training**: Use WandB or Neptune to track experiments and identify issues early
|
|
|
|
8. **Test environments**: Validate environment logic before scaling up training
|
|
|
|
9. **Check existing environments**: Ocean suite provides 20+ pre-built environments
|
|
|
|
10. **Use proper initialization**: Always use `layer_init` from `pufferlib.pytorch` for policies
|
|
|
|
## Common Use Cases
|
|
|
|
### Training on Standard Benchmarks
|
|
```python
|
|
# Atari
|
|
env = pufferlib.make('atari-pong', num_envs=256)
|
|
|
|
# Procgen
|
|
env = pufferlib.make('procgen-coinrun', num_envs=256)
|
|
|
|
# Minigrid
|
|
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
|
|
```
|
|
|
|
### Multi-Agent Learning
|
|
```python
|
|
# PettingZoo
|
|
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)
|
|
|
|
# Shared policy for all agents
|
|
policy = create_policy(env.observation_space, env.action_space)
|
|
trainer = PuffeRL(env=env, policy=policy)
|
|
```
|
|
|
|
### Custom Task Development
|
|
```python
|
|
# Create custom environment
|
|
class MyTask(PufferEnv):
|
|
# ... implement environment ...
|
|
|
|
# Vectorize and train
|
|
env = pufferlib.emulate(MyTask, num_envs=256)
|
|
trainer = PuffeRL(env=env, policy=my_policy)
|
|
```
|
|
|
|
### High-Performance Optimization
|
|
```python
|
|
# Maximize throughput
|
|
env = pufferlib.make(
|
|
'my-env',
|
|
num_envs=1024, # Large batch
|
|
num_workers=16, # Many workers
|
|
envs_per_worker=64 # Optimize per worker
|
|
)
|
|
```
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
uv pip install pufferlib
|
|
```
|
|
|
|
## Documentation
|
|
|
|
- Official docs: https://puffer.ai/docs.html
|
|
- GitHub: https://github.com/PufferAI/PufferLib
|
|
- Discord: Community support available
|
|
|