mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
861 lines
23 KiB
Markdown
861 lines
23 KiB
Markdown
---
|
|
name: transformers
|
|
description: "Hugging Face Transformers. Load BERT, GPT, T5, ViT, CLIP, Llama models, fine-tune, text generation, classification, NER, pipelines, LoRA, for NLP/vision/audio tasks."
|
|
---
|
|
|
|
# Transformers
|
|
|
|
## Overview
|
|
|
|
Transformers is Hugging Face's flagship library providing unified access to over 1 million pretrained models for machine learning across text, vision, audio, and multimodal domains. The library serves as a standardized model-definition framework compatible with PyTorch, TensorFlow, and JAX, emphasizing ease of use through three core components:
|
|
|
|
- **Pipeline**: Simple, optimized inference API for common tasks
|
|
- **AutoClasses**: Automatic model/tokenizer selection from pretrained checkpoints
|
|
- **Trainer**: Full-featured training loop with distributed training, mixed precision, and optimization
|
|
|
|
The library prioritizes accessibility with pretrained models that reduce computational costs and carbon footprint while providing compatibility across major training frameworks (PyTorch-Lightning, DeepSpeed, vLLM, etc.).
|
|
|
|
## Quick Start with Pipelines
|
|
|
|
Use pipelines for simple, efficient inference without managing models, tokenizers, or preprocessing manually. Pipelines abstract complexity into a single function call.
|
|
|
|
### Basic Pipeline Usage
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
# Text classification
|
|
classifier = pipeline("text-classification")
|
|
result = classifier("This restaurant is awesome")
|
|
# [{'label': 'POSITIVE', 'score': 0.9998}]
|
|
|
|
# Text generation
|
|
generator = pipeline("text-generation", model="meta-llama/Llama-2-7b-hf")
|
|
generator("The secret to baking a good cake is", max_length=50)
|
|
|
|
# Question answering
|
|
qa = pipeline("question-answering")
|
|
qa(question="What is extractive QA?", context="Extractive QA is...")
|
|
|
|
# Image classification
|
|
img_classifier = pipeline("image-classification")
|
|
img_classifier("path/to/image.jpg")
|
|
|
|
# Automatic speech recognition
|
|
transcriber = pipeline("automatic-speech-recognition")
|
|
transcriber("audio_file.mp3")
|
|
```
|
|
|
|
### Available Pipeline Tasks
|
|
|
|
**NLP Tasks:**
|
|
- `text-classification`, `token-classification`, `question-answering`
|
|
- `fill-mask`, `summarization`, `translation`
|
|
- `text-generation`, `conversational`
|
|
- `zero-shot-classification`, `sentiment-analysis`
|
|
|
|
**Vision Tasks:**
|
|
- `image-classification`, `image-segmentation`, `object-detection`
|
|
- `depth-estimation`, `image-to-image`, `zero-shot-image-classification`
|
|
|
|
**Audio Tasks:**
|
|
- `automatic-speech-recognition`, `audio-classification`
|
|
- `text-to-audio`, `zero-shot-audio-classification`
|
|
|
|
**Multimodal Tasks:**
|
|
- `visual-question-answering`, `document-question-answering`
|
|
- `image-to-text`, `zero-shot-object-detection`
|
|
|
|
### Pipeline Best Practices
|
|
|
|
**Device Management:**
|
|
```python
|
|
from transformers import pipeline, infer_device
|
|
|
|
device = infer_device() # Auto-detect best device
|
|
pipe = pipeline("text-generation", model="...", device=device)
|
|
```
|
|
|
|
**Batch Processing:**
|
|
```python
|
|
# Process multiple inputs efficiently
|
|
results = classifier(["Text 1", "Text 2", "Text 3"])
|
|
|
|
# Use KeyDataset for large datasets
|
|
from transformers.pipelines.pt_utils import KeyDataset
|
|
from datasets import load_dataset
|
|
|
|
dataset = load_dataset("imdb", split="test")
|
|
for result in pipe(KeyDataset(dataset, "text")):
|
|
print(result)
|
|
```
|
|
|
|
**Memory Optimization:**
|
|
```python
|
|
# Use half-precision for faster inference
|
|
pipe = pipeline("text-generation", model="...",
|
|
torch_dtype=torch.float16, device="cuda")
|
|
```
|
|
|
|
## Core Components
|
|
|
|
### AutoClasses for Model Loading
|
|
|
|
AutoClasses automatically select the correct architecture based on pretrained checkpoints.
|
|
|
|
```python
|
|
from transformers import (
|
|
AutoModel, AutoTokenizer, AutoConfig,
|
|
AutoModelForCausalLM, AutoModelForSequenceClassification
|
|
)
|
|
|
|
# Load any model by checkpoint name
|
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
|
model = AutoModel.from_pretrained("bert-base-uncased")
|
|
|
|
# Task-specific model classes
|
|
causal_lm = AutoModelForCausalLM.from_pretrained("gpt2")
|
|
classifier = AutoModelForSequenceClassification.from_pretrained(
|
|
"bert-base-uncased",
|
|
num_labels=3
|
|
)
|
|
|
|
# Load with device and dtype optimization
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
"meta-llama/Llama-2-7b-hf",
|
|
device_map="auto", # Automatically distribute across devices
|
|
torch_dtype="auto" # Use optimal dtype
|
|
)
|
|
```
|
|
|
|
**Key Parameters:**
|
|
- `device_map="auto"`: Optimal device allocation (CPU/GPU/multi-GPU)
|
|
- `torch_dtype`: Control precision (torch.float16, torch.bfloat16, "auto")
|
|
- `trust_remote_code`: Enable custom model code (use cautiously)
|
|
- `use_fast`: Enable Rust-backed fast tokenizers (default True)
|
|
|
|
### Tokenization
|
|
|
|
Tokenizers convert text to model-compatible tensor inputs.
|
|
|
|
```python
|
|
from transformers import AutoTokenizer
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
|
|
|
# Basic tokenization
|
|
tokens = tokenizer.tokenize("Hello, how are you?")
|
|
# ['hello', ',', 'how', 'are', 'you', '?']
|
|
|
|
# Encoding (text → token IDs)
|
|
encoded = tokenizer("Hello, how are you?", return_tensors="pt")
|
|
# {'input_ids': tensor([[...]], 'attention_mask': tensor([[...]])}
|
|
|
|
# Batch encoding with padding and truncation
|
|
batch = tokenizer(
|
|
["Short text", "This is a much longer text..."],
|
|
padding=True, # Pad to longest in batch
|
|
truncation=True, # Truncate to model's max length
|
|
max_length=512,
|
|
return_tensors="pt"
|
|
)
|
|
|
|
# Decoding (token IDs → text)
|
|
text = tokenizer.decode(encoded['input_ids'][0])
|
|
```
|
|
|
|
**Special Tokens:**
|
|
```python
|
|
# Access special tokens
|
|
tokenizer.pad_token # Padding token
|
|
tokenizer.cls_token # Classification token
|
|
tokenizer.sep_token # Separator token
|
|
tokenizer.mask_token # Mask token (for MLM)
|
|
|
|
# Add custom tokens
|
|
tokenizer.add_tokens(["[CUSTOM]"])
|
|
tokenizer.add_special_tokens({'additional_special_tokens': ['[NEW]']})
|
|
|
|
# Resize model embeddings to match new vocabulary
|
|
model.resize_token_embeddings(len(tokenizer))
|
|
```
|
|
|
|
### Image Processors
|
|
|
|
For vision tasks, use image processors instead of tokenizers.
|
|
|
|
```python
|
|
from transformers import AutoImageProcessor
|
|
|
|
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
|
|
|
|
# Process single image
|
|
from PIL import Image
|
|
image = Image.open("path/to/image.jpg")
|
|
inputs = processor(image, return_tensors="pt")
|
|
# Returns: {'pixel_values': tensor([[...]])}
|
|
|
|
# Batch processing
|
|
images = [Image.open(f"img{i}.jpg") for i in range(3)]
|
|
inputs = processor(images, return_tensors="pt")
|
|
```
|
|
|
|
### Processors for Multimodal Models
|
|
|
|
Multimodal models use processors that combine image and text processing.
|
|
|
|
```python
|
|
from transformers import AutoProcessor
|
|
|
|
processor = AutoProcessor.from_pretrained("microsoft/git-base")
|
|
|
|
# Process image + text caption
|
|
inputs = processor(
|
|
images=image,
|
|
text="A description of the image",
|
|
return_tensors="pt",
|
|
padding=True
|
|
)
|
|
```
|
|
|
|
## Model Inference
|
|
|
|
### Basic Inference Pattern
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
# Load model and tokenizer
|
|
model = AutoModelForCausalLM.from_pretrained("gpt2")
|
|
tokenizer = AutoTokenizer.from_pretrained("gpt2")
|
|
|
|
# Tokenize input
|
|
inputs = tokenizer("The future of AI is", return_tensors="pt")
|
|
|
|
# Generate (for causal LM)
|
|
outputs = model.generate(**inputs, max_length=50)
|
|
text = tokenizer.decode(outputs[0])
|
|
|
|
# Or get model outputs directly
|
|
outputs = model(**inputs)
|
|
logits = outputs.logits # Shape: (batch_size, seq_len, vocab_size)
|
|
```
|
|
|
|
### Text Generation Strategies
|
|
|
|
For generative models, control generation behavior with parameters:
|
|
|
|
```python
|
|
# Greedy decoding (default)
|
|
output = model.generate(inputs, max_length=50)
|
|
|
|
# Beam search (multiple hypothesis)
|
|
output = model.generate(
|
|
inputs,
|
|
max_length=50,
|
|
num_beams=5, # Keep top 5 beams
|
|
early_stopping=True
|
|
)
|
|
|
|
# Sampling with temperature
|
|
output = model.generate(
|
|
inputs,
|
|
max_length=50,
|
|
do_sample=True,
|
|
temperature=0.7, # Lower = more focused, higher = more random
|
|
top_k=50, # Sample from top 50 tokens
|
|
top_p=0.95 # Nucleus sampling
|
|
)
|
|
|
|
# Streaming generation
|
|
from transformers import TextStreamer
|
|
|
|
streamer = TextStreamer(tokenizer)
|
|
model.generate(**inputs, streamer=streamer, max_length=100)
|
|
```
|
|
|
|
**Generation Parameters:**
|
|
- `max_length` / `max_new_tokens`: Control output length
|
|
- `num_beams`: Beam search width (1 = greedy)
|
|
- `temperature`: Randomness (0.7-1.0 typical)
|
|
- `top_k`: Sample from top k tokens
|
|
- `top_p`: Nucleus sampling threshold
|
|
- `repetition_penalty`: Discourage repetition (>1.0)
|
|
|
|
Refer to `references/generation_strategies.md` for detailed information on choosing appropriate strategies.
|
|
|
|
## Training and Fine-Tuning
|
|
|
|
### Training Workflow Overview
|
|
|
|
1. **Load dataset** → 2. **Preprocess** → 3. **Configure training** → 4. **Train** → 5. **Evaluate** → 6. **Save/Share**
|
|
|
|
### Text Classification Example
|
|
|
|
```python
|
|
from transformers import (
|
|
AutoTokenizer, AutoModelForSequenceClassification,
|
|
TrainingArguments, Trainer, DataCollatorWithPadding
|
|
)
|
|
from datasets import load_dataset
|
|
|
|
# 1. Load dataset
|
|
dataset = load_dataset("imdb")
|
|
|
|
# 2. Preprocess
|
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
|
|
|
def preprocess(examples):
|
|
return tokenizer(examples["text"], truncation=True)
|
|
|
|
tokenized = dataset.map(preprocess, batched=True)
|
|
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
|
|
|
|
# 3. Load model
|
|
model = AutoModelForSequenceClassification.from_pretrained(
|
|
"bert-base-uncased",
|
|
num_labels=2,
|
|
id2label={0: "negative", 1: "positive"},
|
|
label2id={"negative": 0, "positive": 1}
|
|
)
|
|
|
|
# 4. Configure training
|
|
training_args = TrainingArguments(
|
|
output_dir="./results",
|
|
learning_rate=2e-5,
|
|
per_device_train_batch_size=16,
|
|
per_device_eval_batch_size=16,
|
|
num_train_epochs=3,
|
|
weight_decay=0.01,
|
|
eval_strategy="epoch",
|
|
save_strategy="epoch",
|
|
load_best_model_at_end=True,
|
|
push_to_hub=False,
|
|
)
|
|
|
|
# 5. Train
|
|
trainer = Trainer(
|
|
model=model,
|
|
args=training_args,
|
|
train_dataset=tokenized["train"],
|
|
eval_dataset=tokenized["test"],
|
|
tokenizer=tokenizer,
|
|
data_collator=data_collator,
|
|
)
|
|
|
|
trainer.train()
|
|
|
|
# 6. Evaluate and save
|
|
metrics = trainer.evaluate()
|
|
trainer.save_model("./my-finetuned-model")
|
|
trainer.push_to_hub() # Share to Hugging Face Hub
|
|
```
|
|
|
|
### Vision Task Fine-Tuning
|
|
|
|
```python
|
|
from transformers import (
|
|
AutoImageProcessor, AutoModelForImageClassification,
|
|
TrainingArguments, Trainer
|
|
)
|
|
from datasets import load_dataset
|
|
|
|
# Load dataset
|
|
dataset = load_dataset("food101", split="train[:5000]")
|
|
|
|
# Image preprocessing
|
|
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
|
|
|
|
def transform(examples):
|
|
examples["pixel_values"] = [
|
|
processor(img.convert("RGB"), return_tensors="pt")["pixel_values"][0]
|
|
for img in examples["image"]
|
|
]
|
|
return examples
|
|
|
|
dataset = dataset.with_transform(transform)
|
|
|
|
# Load model
|
|
model = AutoModelForImageClassification.from_pretrained(
|
|
"google/vit-base-patch16-224",
|
|
num_labels=101, # 101 food categories
|
|
ignore_mismatched_sizes=True
|
|
)
|
|
|
|
# Training (similar pattern to text)
|
|
training_args = TrainingArguments(
|
|
output_dir="./vit-food101",
|
|
remove_unused_columns=False, # Keep image data
|
|
eval_strategy="epoch",
|
|
save_strategy="epoch",
|
|
learning_rate=5e-5,
|
|
per_device_train_batch_size=32,
|
|
num_train_epochs=3,
|
|
)
|
|
|
|
trainer = Trainer(
|
|
model=model,
|
|
args=training_args,
|
|
train_dataset=dataset,
|
|
tokenizer=processor,
|
|
)
|
|
|
|
trainer.train()
|
|
```
|
|
|
|
### Sequence-to-Sequence Tasks
|
|
|
|
For tasks like summarization, translation, use Seq2SeqTrainer:
|
|
|
|
```python
|
|
from transformers import (
|
|
AutoTokenizer, AutoModelForSeq2SeqLM,
|
|
Seq2SeqTrainingArguments, Seq2SeqTrainer,
|
|
DataCollatorForSeq2Seq
|
|
)
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("t5-small")
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
|
|
|
|
def preprocess(examples):
|
|
# Prefix input for T5
|
|
inputs = ["summarize: " + doc for doc in examples["text"]]
|
|
model_inputs = tokenizer(inputs, max_length=1024, truncation=True)
|
|
|
|
# Tokenize targets
|
|
labels = tokenizer(
|
|
examples["summary"],
|
|
max_length=128,
|
|
truncation=True
|
|
)
|
|
model_inputs["labels"] = labels["input_ids"]
|
|
return model_inputs
|
|
|
|
tokenized_dataset = dataset.map(preprocess, batched=True)
|
|
|
|
training_args = Seq2SeqTrainingArguments(
|
|
output_dir="./t5-summarization",
|
|
eval_strategy="epoch",
|
|
learning_rate=2e-5,
|
|
per_device_train_batch_size=8,
|
|
num_train_epochs=3,
|
|
predict_with_generate=True, # Important for seq2seq
|
|
)
|
|
|
|
trainer = Seq2SeqTrainer(
|
|
model=model,
|
|
args=training_args,
|
|
train_dataset=tokenized_dataset["train"],
|
|
eval_dataset=tokenized_dataset["test"],
|
|
tokenizer=tokenizer,
|
|
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
|
|
)
|
|
|
|
trainer.train()
|
|
```
|
|
|
|
### Important TrainingArguments
|
|
|
|
```python
|
|
TrainingArguments(
|
|
# Essential
|
|
output_dir="./results",
|
|
num_train_epochs=3,
|
|
per_device_train_batch_size=8,
|
|
learning_rate=2e-5,
|
|
|
|
# Evaluation
|
|
eval_strategy="epoch", # or "steps"
|
|
eval_steps=500, # if eval_strategy="steps"
|
|
|
|
# Checkpointing
|
|
save_strategy="epoch",
|
|
save_steps=500,
|
|
save_total_limit=2, # Keep only 2 best checkpoints
|
|
load_best_model_at_end=True,
|
|
metric_for_best_model="accuracy",
|
|
|
|
# Optimization
|
|
gradient_accumulation_steps=4,
|
|
warmup_steps=500,
|
|
weight_decay=0.01,
|
|
max_grad_norm=1.0,
|
|
|
|
# Mixed Precision
|
|
fp16=True, # For Nvidia GPUs
|
|
bf16=True, # For Ampere+ GPUs (better)
|
|
|
|
# Logging
|
|
logging_steps=100,
|
|
report_to="tensorboard", # or "wandb", "mlflow"
|
|
|
|
# Memory Optimization
|
|
gradient_checkpointing=True,
|
|
optim="adamw_torch", # or "adafactor" for memory
|
|
|
|
# Distributed Training
|
|
ddp_find_unused_parameters=False,
|
|
)
|
|
```
|
|
|
|
Refer to `references/training_guide.md` for comprehensive training patterns and optimization strategies.
|
|
|
|
## Performance Optimization
|
|
|
|
### Model Quantization
|
|
|
|
Reduce memory footprint while maintaining accuracy:
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
|
|
|
# 8-bit quantization
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
"meta-llama/Llama-2-7b-hf",
|
|
load_in_8bit=True,
|
|
device_map="auto"
|
|
)
|
|
|
|
# 4-bit quantization (even smaller)
|
|
bnb_config = BitsAndBytesConfig(
|
|
load_in_4bit=True,
|
|
bnb_4bit_quant_type="nf4",
|
|
bnb_4bit_compute_dtype=torch.float16,
|
|
bnb_4bit_use_double_quant=True,
|
|
)
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
"meta-llama/Llama-2-7b-hf",
|
|
quantization_config=bnb_config,
|
|
device_map="auto"
|
|
)
|
|
```
|
|
|
|
**Quantization Methods:**
|
|
- **Bitsandbytes**: 4/8-bit on-the-fly quantization, supports PEFT fine-tuning
|
|
- **GPTQ**: 2/3/4/8-bit, requires calibration, very fast inference
|
|
- **AWQ**: 4-bit activation-aware, balanced speed/accuracy
|
|
|
|
Refer to `references/quantization.md` for detailed comparison and usage patterns.
|
|
|
|
### Training Optimization
|
|
|
|
```python
|
|
# Gradient accumulation (simulate larger batch)
|
|
training_args = TrainingArguments(
|
|
per_device_train_batch_size=4,
|
|
gradient_accumulation_steps=8, # Effective batch = 4 * 8 = 32
|
|
)
|
|
|
|
# Gradient checkpointing (reduce memory, slower)
|
|
training_args = TrainingArguments(
|
|
gradient_checkpointing=True,
|
|
)
|
|
|
|
# Mixed precision training
|
|
training_args = TrainingArguments(
|
|
bf16=True, # or fp16=True
|
|
)
|
|
|
|
# Efficient optimizer
|
|
training_args = TrainingArguments(
|
|
optim="adafactor", # Lower memory than AdamW
|
|
)
|
|
```
|
|
|
|
**Key Strategies:**
|
|
- **Batch sizes**: Use powers of 2 (8, 16, 32, 64, 128)
|
|
- **Gradient accumulation**: Enables larger effective batch sizes
|
|
- **Gradient checkpointing**: Reduces memory ~60%, increases time ~20%
|
|
- **Mixed precision**: bf16 for Ampere+ GPUs, fp16 for older
|
|
- **torch.compile**: Optimize model graph (PyTorch 2.0+)
|
|
|
|
## Advanced Features
|
|
|
|
### Custom Training Loop
|
|
|
|
For maximum control, bypass Trainer:
|
|
|
|
```python
|
|
from torch.utils.data import DataLoader
|
|
from transformers import AdamW, get_scheduler
|
|
|
|
# Prepare data
|
|
train_dataloader = DataLoader(tokenized_dataset, batch_size=8, shuffle=True)
|
|
|
|
# Setup optimizer and scheduler
|
|
optimizer = AdamW(model.parameters(), lr=5e-5)
|
|
scheduler = get_scheduler(
|
|
"linear",
|
|
optimizer=optimizer,
|
|
num_warmup_steps=0,
|
|
num_training_steps=len(train_dataloader) * num_epochs
|
|
)
|
|
|
|
# Training loop
|
|
model.train()
|
|
for epoch in range(num_epochs):
|
|
for batch in train_dataloader:
|
|
batch = {k: v.to(device) for k, v in batch.items()}
|
|
|
|
outputs = model(**batch)
|
|
loss = outputs.loss
|
|
loss.backward()
|
|
|
|
optimizer.step()
|
|
scheduler.step()
|
|
optimizer.zero_grad()
|
|
```
|
|
|
|
### Parameter-Efficient Fine-Tuning (PEFT)
|
|
|
|
Use PEFT library with transformers for efficient fine-tuning:
|
|
|
|
```python
|
|
from peft import LoraConfig, get_peft_model
|
|
|
|
# Configure LoRA
|
|
lora_config = LoraConfig(
|
|
r=16, # Low-rank dimension
|
|
lora_alpha=32,
|
|
target_modules=["q_proj", "v_proj"], # Which layers to adapt
|
|
lora_dropout=0.05,
|
|
bias="none",
|
|
task_type="CAUSAL_LM"
|
|
)
|
|
|
|
# Apply to model
|
|
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
|
|
model = get_peft_model(model, lora_config)
|
|
|
|
# Now train as usual - only LoRA parameters train
|
|
trainer = Trainer(model=model, ...)
|
|
trainer.train()
|
|
```
|
|
|
|
### Chat Templates
|
|
|
|
Apply chat templates for instruction-tuned models:
|
|
|
|
```python
|
|
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
|
|
|
|
messages = [
|
|
{"role": "system", "content": "You are a helpful assistant."},
|
|
{"role": "user", "content": "What is machine learning?"},
|
|
]
|
|
|
|
# Format according to model's chat template
|
|
formatted = tokenizer.apply_chat_template(
|
|
messages,
|
|
tokenize=False,
|
|
add_generation_prompt=True
|
|
)
|
|
|
|
# Tokenize and generate
|
|
inputs = tokenizer(formatted, return_tensors="pt")
|
|
outputs = model.generate(**inputs, max_length=200)
|
|
response = tokenizer.decode(outputs[0])
|
|
```
|
|
|
|
### Multi-GPU Training
|
|
|
|
```python
|
|
# Automatic with Trainer - no code changes needed
|
|
# Just run with: accelerate launch train.py
|
|
|
|
# Or use PyTorch DDP explicitly
|
|
training_args = TrainingArguments(
|
|
output_dir="./results",
|
|
ddp_find_unused_parameters=False,
|
|
# ... other args
|
|
)
|
|
|
|
# For larger models, use FSDP
|
|
training_args = TrainingArguments(
|
|
output_dir="./results",
|
|
fsdp="full_shard auto_wrap",
|
|
fsdp_config={
|
|
"fsdp_transformer_layer_cls_to_wrap": ["BertLayer"],
|
|
},
|
|
)
|
|
```
|
|
|
|
## Task-Specific Patterns
|
|
|
|
### Question Answering (Extractive)
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
|
|
|
|
result = qa(
|
|
question="What is extractive QA?",
|
|
context="Extractive QA extracts the answer from the given context..."
|
|
)
|
|
# {'answer': 'extracts the answer from the given context', 'score': 0.97, ...}
|
|
```
|
|
|
|
### Named Entity Recognition
|
|
|
|
```python
|
|
ner = pipeline("token-classification", model="dslim/bert-base-NER")
|
|
|
|
result = ner("My name is John and I live in New York")
|
|
# [{'entity': 'B-PER', 'word': 'John', ...}, {'entity': 'B-LOC', 'word': 'New York', ...}]
|
|
```
|
|
|
|
### Image Captioning
|
|
|
|
```python
|
|
from transformers import AutoProcessor, AutoModelForCausalLM
|
|
|
|
processor = AutoProcessor.from_pretrained("microsoft/git-base")
|
|
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base")
|
|
|
|
from PIL import Image
|
|
image = Image.open("image.jpg")
|
|
|
|
inputs = processor(images=image, return_tensors="pt")
|
|
outputs = model.generate(**inputs, max_length=50)
|
|
caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]
|
|
```
|
|
|
|
### Speech Recognition
|
|
|
|
```python
|
|
transcriber = pipeline(
|
|
"automatic-speech-recognition",
|
|
model="openai/whisper-base"
|
|
)
|
|
|
|
result = transcriber("audio.mp3")
|
|
# {'text': 'This is the transcribed text...'}
|
|
|
|
# With timestamps
|
|
result = transcriber("audio.mp3", return_timestamps=True)
|
|
```
|
|
|
|
## Common Patterns and Best Practices
|
|
|
|
### Saving and Loading Models
|
|
|
|
```python
|
|
# Save entire model
|
|
model.save_pretrained("./my-model")
|
|
tokenizer.save_pretrained("./my-model")
|
|
|
|
# Load later
|
|
model = AutoModel.from_pretrained("./my-model")
|
|
tokenizer = AutoTokenizer.from_pretrained("./my-model")
|
|
|
|
# Push to Hugging Face Hub
|
|
model.push_to_hub("username/my-model")
|
|
tokenizer.push_to_hub("username/my-model")
|
|
|
|
# Load from Hub
|
|
model = AutoModel.from_pretrained("username/my-model")
|
|
```
|
|
|
|
### Error Handling
|
|
|
|
```python
|
|
from transformers import AutoModel
|
|
import torch
|
|
|
|
try:
|
|
model = AutoModel.from_pretrained("model-name")
|
|
except OSError:
|
|
print("Model not found - check internet connection or model name")
|
|
except torch.cuda.OutOfMemoryError:
|
|
print("GPU memory exceeded - try quantization or smaller batch size")
|
|
```
|
|
|
|
### Device Management
|
|
|
|
```python
|
|
import torch
|
|
|
|
# Check device availability
|
|
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
# Move model to device
|
|
model = model.to(device)
|
|
|
|
# Or use device_map for automatic distribution
|
|
model = AutoModel.from_pretrained("model-name", device_map="auto")
|
|
|
|
# For inputs
|
|
inputs = tokenizer(text, return_tensors="pt").to(device)
|
|
```
|
|
|
|
### Memory Management
|
|
|
|
```python
|
|
import torch
|
|
|
|
# Clear CUDA cache
|
|
torch.cuda.empty_cache()
|
|
|
|
# Use context manager for inference
|
|
with torch.no_grad():
|
|
outputs = model(**inputs)
|
|
|
|
# Delete unused models
|
|
del model
|
|
torch.cuda.empty_cache()
|
|
```
|
|
|
|
## Resources
|
|
|
|
This skill includes comprehensive reference documentation and example scripts:
|
|
|
|
### scripts/
|
|
|
|
- `quick_inference.py`: Ready-to-use script for running inference with pipelines
|
|
- `fine_tune_classifier.py`: Complete example for fine-tuning a text classifier
|
|
- `generate_text.py`: Text generation with various strategies
|
|
|
|
Execute scripts directly or read them as implementation templates.
|
|
|
|
### references/
|
|
|
|
- `api_reference.md`: Comprehensive API documentation for key classes
|
|
- `training_guide.md`: Detailed training patterns, optimization, and troubleshooting
|
|
- `generation_strategies.md`: In-depth guide to text generation methods
|
|
- `quantization.md`: Model quantization techniques comparison and usage
|
|
- `task_patterns.md`: Quick reference for common task implementations
|
|
|
|
Load reference files when you need detailed information on specific topics. References contain extensive examples, parameter explanations, and best practices.
|
|
|
|
## Troubleshooting
|
|
|
|
**Import errors:**
|
|
```bash
|
|
pip install transformers
|
|
pip install accelerate # For device_map="auto"
|
|
pip install bitsandbytes # For quantization
|
|
```
|
|
|
|
**CUDA out of memory:**
|
|
- Reduce batch size
|
|
- Enable gradient checkpointing
|
|
- Use gradient accumulation
|
|
- Try quantization (8-bit or 4-bit)
|
|
- Use smaller model variant
|
|
|
|
**Slow training:**
|
|
- Enable mixed precision (fp16/bf16)
|
|
- Increase batch size (if memory allows)
|
|
- Use torch.compile (PyTorch 2.0+)
|
|
- Check data loading isn't bottleneck
|
|
|
|
**Poor generation quality:**
|
|
- Adjust temperature (lower = more focused)
|
|
- Try different decoding strategies (beam search vs sampling)
|
|
- Increase max_length if outputs cut off
|
|
- Use repetition_penalty to reduce repetition
|
|
|
|
For task-specific guidance, consult the appropriate reference file in the `references/` directory.
|