mirror of
https://github.com/K-Dense-AI/claude-scientific-skills.git
synced 2026-03-28 07:33:45 +08:00
Add more scientific skills
This commit is contained in:
860
scientific-packages/transformers/SKILL.md
Normal file
860
scientific-packages/transformers/SKILL.md
Normal file
@@ -0,0 +1,860 @@
|
||||
---
|
||||
name: transformers
|
||||
description: Comprehensive toolkit for working with Hugging Face Transformers library for state-of-the-art machine learning across NLP, computer vision, audio, and multimodal tasks. Use this skill when working with pretrained models, fine-tuning transformers, implementing text generation, image classification, speech recognition, or any task involving transformer architectures like BERT, GPT, T5, Vision Transformers, CLIP, or Whisper.
|
||||
---
|
||||
|
||||
# Transformers
|
||||
|
||||
## Overview
|
||||
|
||||
Transformers is Hugging Face's flagship library providing unified access to over 1 million pretrained models for machine learning across text, vision, audio, and multimodal domains. The library serves as a standardized model-definition framework compatible with PyTorch, TensorFlow, and JAX, emphasizing ease of use through three core components:
|
||||
|
||||
- **Pipeline**: Simple, optimized inference API for common tasks
|
||||
- **AutoClasses**: Automatic model/tokenizer selection from pretrained checkpoints
|
||||
- **Trainer**: Full-featured training loop with distributed training, mixed precision, and optimization
|
||||
|
||||
The library prioritizes accessibility with pretrained models that reduce computational costs and carbon footprint while providing compatibility across major training frameworks (PyTorch-Lightning, DeepSpeed, vLLM, etc.).
|
||||
|
||||
## Quick Start with Pipelines
|
||||
|
||||
Use pipelines for simple, efficient inference without managing models, tokenizers, or preprocessing manually. Pipelines abstract complexity into a single function call.
|
||||
|
||||
### Basic Pipeline Usage
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
# Text classification
|
||||
classifier = pipeline("text-classification")
|
||||
result = classifier("This restaurant is awesome")
|
||||
# [{'label': 'POSITIVE', 'score': 0.9998}]
|
||||
|
||||
# Text generation
|
||||
generator = pipeline("text-generation", model="meta-llama/Llama-2-7b-hf")
|
||||
generator("The secret to baking a good cake is", max_length=50)
|
||||
|
||||
# Question answering
|
||||
qa = pipeline("question-answering")
|
||||
qa(question="What is extractive QA?", context="Extractive QA is...")
|
||||
|
||||
# Image classification
|
||||
img_classifier = pipeline("image-classification")
|
||||
img_classifier("path/to/image.jpg")
|
||||
|
||||
# Automatic speech recognition
|
||||
transcriber = pipeline("automatic-speech-recognition")
|
||||
transcriber("audio_file.mp3")
|
||||
```
|
||||
|
||||
### Available Pipeline Tasks
|
||||
|
||||
**NLP Tasks:**
|
||||
- `text-classification`, `token-classification`, `question-answering`
|
||||
- `fill-mask`, `summarization`, `translation`
|
||||
- `text-generation`, `conversational`
|
||||
- `zero-shot-classification`, `sentiment-analysis`
|
||||
|
||||
**Vision Tasks:**
|
||||
- `image-classification`, `image-segmentation`, `object-detection`
|
||||
- `depth-estimation`, `image-to-image`, `zero-shot-image-classification`
|
||||
|
||||
**Audio Tasks:**
|
||||
- `automatic-speech-recognition`, `audio-classification`
|
||||
- `text-to-audio`, `zero-shot-audio-classification`
|
||||
|
||||
**Multimodal Tasks:**
|
||||
- `visual-question-answering`, `document-question-answering`
|
||||
- `image-to-text`, `zero-shot-object-detection`
|
||||
|
||||
### Pipeline Best Practices
|
||||
|
||||
**Device Management:**
|
||||
```python
|
||||
from transformers import pipeline, infer_device
|
||||
|
||||
device = infer_device() # Auto-detect best device
|
||||
pipe = pipeline("text-generation", model="...", device=device)
|
||||
```
|
||||
|
||||
**Batch Processing:**
|
||||
```python
|
||||
# Process multiple inputs efficiently
|
||||
results = classifier(["Text 1", "Text 2", "Text 3"])
|
||||
|
||||
# Use KeyDataset for large datasets
|
||||
from transformers.pipelines.pt_utils import KeyDataset
|
||||
from datasets import load_dataset
|
||||
|
||||
dataset = load_dataset("imdb", split="test")
|
||||
for result in pipe(KeyDataset(dataset, "text")):
|
||||
print(result)
|
||||
```
|
||||
|
||||
**Memory Optimization:**
|
||||
```python
|
||||
# Use half-precision for faster inference
|
||||
pipe = pipeline("text-generation", model="...",
|
||||
torch_dtype=torch.float16, device="cuda")
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
### AutoClasses for Model Loading
|
||||
|
||||
AutoClasses automatically select the correct architecture based on pretrained checkpoints.
|
||||
|
||||
```python
|
||||
from transformers import (
|
||||
AutoModel, AutoTokenizer, AutoConfig,
|
||||
AutoModelForCausalLM, AutoModelForSequenceClassification
|
||||
)
|
||||
|
||||
# Load any model by checkpoint name
|
||||
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||
model = AutoModel.from_pretrained("bert-base-uncased")
|
||||
|
||||
# Task-specific model classes
|
||||
causal_lm = AutoModelForCausalLM.from_pretrained("gpt2")
|
||||
classifier = AutoModelForSequenceClassification.from_pretrained(
|
||||
"bert-base-uncased",
|
||||
num_labels=3
|
||||
)
|
||||
|
||||
# Load with device and dtype optimization
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"meta-llama/Llama-2-7b-hf",
|
||||
device_map="auto", # Automatically distribute across devices
|
||||
torch_dtype="auto" # Use optimal dtype
|
||||
)
|
||||
```
|
||||
|
||||
**Key Parameters:**
|
||||
- `device_map="auto"`: Optimal device allocation (CPU/GPU/multi-GPU)
|
||||
- `torch_dtype`: Control precision (torch.float16, torch.bfloat16, "auto")
|
||||
- `trust_remote_code`: Enable custom model code (use cautiously)
|
||||
- `use_fast`: Enable Rust-backed fast tokenizers (default True)
|
||||
|
||||
### Tokenization
|
||||
|
||||
Tokenizers convert text to model-compatible tensor inputs.
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||
|
||||
# Basic tokenization
|
||||
tokens = tokenizer.tokenize("Hello, how are you?")
|
||||
# ['hello', ',', 'how', 'are', 'you', '?']
|
||||
|
||||
# Encoding (text → token IDs)
|
||||
encoded = tokenizer("Hello, how are you?", return_tensors="pt")
|
||||
# {'input_ids': tensor([[...]], 'attention_mask': tensor([[...]])}
|
||||
|
||||
# Batch encoding with padding and truncation
|
||||
batch = tokenizer(
|
||||
["Short text", "This is a much longer text..."],
|
||||
padding=True, # Pad to longest in batch
|
||||
truncation=True, # Truncate to model's max length
|
||||
max_length=512,
|
||||
return_tensors="pt"
|
||||
)
|
||||
|
||||
# Decoding (token IDs → text)
|
||||
text = tokenizer.decode(encoded['input_ids'][0])
|
||||
```
|
||||
|
||||
**Special Tokens:**
|
||||
```python
|
||||
# Access special tokens
|
||||
tokenizer.pad_token # Padding token
|
||||
tokenizer.cls_token # Classification token
|
||||
tokenizer.sep_token # Separator token
|
||||
tokenizer.mask_token # Mask token (for MLM)
|
||||
|
||||
# Add custom tokens
|
||||
tokenizer.add_tokens(["[CUSTOM]"])
|
||||
tokenizer.add_special_tokens({'additional_special_tokens': ['[NEW]']})
|
||||
|
||||
# Resize model embeddings to match new vocabulary
|
||||
model.resize_token_embeddings(len(tokenizer))
|
||||
```
|
||||
|
||||
### Image Processors
|
||||
|
||||
For vision tasks, use image processors instead of tokenizers.
|
||||
|
||||
```python
|
||||
from transformers import AutoImageProcessor
|
||||
|
||||
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
|
||||
|
||||
# Process single image
|
||||
from PIL import Image
|
||||
image = Image.open("path/to/image.jpg")
|
||||
inputs = processor(image, return_tensors="pt")
|
||||
# Returns: {'pixel_values': tensor([[...]])}
|
||||
|
||||
# Batch processing
|
||||
images = [Image.open(f"img{i}.jpg") for i in range(3)]
|
||||
inputs = processor(images, return_tensors="pt")
|
||||
```
|
||||
|
||||
### Processors for Multimodal Models
|
||||
|
||||
Multimodal models use processors that combine image and text processing.
|
||||
|
||||
```python
|
||||
from transformers import AutoProcessor
|
||||
|
||||
processor = AutoProcessor.from_pretrained("microsoft/git-base")
|
||||
|
||||
# Process image + text caption
|
||||
inputs = processor(
|
||||
images=image,
|
||||
text="A description of the image",
|
||||
return_tensors="pt",
|
||||
padding=True
|
||||
)
|
||||
```
|
||||
|
||||
## Model Inference
|
||||
|
||||
### Basic Inference Pattern
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
# Load model and tokenizer
|
||||
model = AutoModelForCausalLM.from_pretrained("gpt2")
|
||||
tokenizer = AutoTokenizer.from_pretrained("gpt2")
|
||||
|
||||
# Tokenize input
|
||||
inputs = tokenizer("The future of AI is", return_tensors="pt")
|
||||
|
||||
# Generate (for causal LM)
|
||||
outputs = model.generate(**inputs, max_length=50)
|
||||
text = tokenizer.decode(outputs[0])
|
||||
|
||||
# Or get model outputs directly
|
||||
outputs = model(**inputs)
|
||||
logits = outputs.logits # Shape: (batch_size, seq_len, vocab_size)
|
||||
```
|
||||
|
||||
### Text Generation Strategies
|
||||
|
||||
For generative models, control generation behavior with parameters:
|
||||
|
||||
```python
|
||||
# Greedy decoding (default)
|
||||
output = model.generate(inputs, max_length=50)
|
||||
|
||||
# Beam search (multiple hypothesis)
|
||||
output = model.generate(
|
||||
inputs,
|
||||
max_length=50,
|
||||
num_beams=5, # Keep top 5 beams
|
||||
early_stopping=True
|
||||
)
|
||||
|
||||
# Sampling with temperature
|
||||
output = model.generate(
|
||||
inputs,
|
||||
max_length=50,
|
||||
do_sample=True,
|
||||
temperature=0.7, # Lower = more focused, higher = more random
|
||||
top_k=50, # Sample from top 50 tokens
|
||||
top_p=0.95 # Nucleus sampling
|
||||
)
|
||||
|
||||
# Streaming generation
|
||||
from transformers import TextStreamer
|
||||
|
||||
streamer = TextStreamer(tokenizer)
|
||||
model.generate(**inputs, streamer=streamer, max_length=100)
|
||||
```
|
||||
|
||||
**Generation Parameters:**
|
||||
- `max_length` / `max_new_tokens`: Control output length
|
||||
- `num_beams`: Beam search width (1 = greedy)
|
||||
- `temperature`: Randomness (0.7-1.0 typical)
|
||||
- `top_k`: Sample from top k tokens
|
||||
- `top_p`: Nucleus sampling threshold
|
||||
- `repetition_penalty`: Discourage repetition (>1.0)
|
||||
|
||||
Refer to `references/generation_strategies.md` for detailed information on choosing appropriate strategies.
|
||||
|
||||
## Training and Fine-Tuning
|
||||
|
||||
### Training Workflow Overview
|
||||
|
||||
1. **Load dataset** → 2. **Preprocess** → 3. **Configure training** → 4. **Train** → 5. **Evaluate** → 6. **Save/Share**
|
||||
|
||||
### Text Classification Example
|
||||
|
||||
```python
|
||||
from transformers import (
|
||||
AutoTokenizer, AutoModelForSequenceClassification,
|
||||
TrainingArguments, Trainer, DataCollatorWithPadding
|
||||
)
|
||||
from datasets import load_dataset
|
||||
|
||||
# 1. Load dataset
|
||||
dataset = load_dataset("imdb")
|
||||
|
||||
# 2. Preprocess
|
||||
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||
|
||||
def preprocess(examples):
|
||||
return tokenizer(examples["text"], truncation=True)
|
||||
|
||||
tokenized = dataset.map(preprocess, batched=True)
|
||||
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
|
||||
|
||||
# 3. Load model
|
||||
model = AutoModelForSequenceClassification.from_pretrained(
|
||||
"bert-base-uncased",
|
||||
num_labels=2,
|
||||
id2label={0: "negative", 1: "positive"},
|
||||
label2id={"negative": 0, "positive": 1}
|
||||
)
|
||||
|
||||
# 4. Configure training
|
||||
training_args = TrainingArguments(
|
||||
output_dir="./results",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=3,
|
||||
weight_decay=0.01,
|
||||
eval_strategy="epoch",
|
||||
save_strategy="epoch",
|
||||
load_best_model_at_end=True,
|
||||
push_to_hub=False,
|
||||
)
|
||||
|
||||
# 5. Train
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized["train"],
|
||||
eval_dataset=tokenized["test"],
|
||||
tokenizer=tokenizer,
|
||||
data_collator=data_collator,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
|
||||
# 6. Evaluate and save
|
||||
metrics = trainer.evaluate()
|
||||
trainer.save_model("./my-finetuned-model")
|
||||
trainer.push_to_hub() # Share to Hugging Face Hub
|
||||
```
|
||||
|
||||
### Vision Task Fine-Tuning
|
||||
|
||||
```python
|
||||
from transformers import (
|
||||
AutoImageProcessor, AutoModelForImageClassification,
|
||||
TrainingArguments, Trainer
|
||||
)
|
||||
from datasets import load_dataset
|
||||
|
||||
# Load dataset
|
||||
dataset = load_dataset("food101", split="train[:5000]")
|
||||
|
||||
# Image preprocessing
|
||||
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
|
||||
|
||||
def transform(examples):
|
||||
examples["pixel_values"] = [
|
||||
processor(img.convert("RGB"), return_tensors="pt")["pixel_values"][0]
|
||||
for img in examples["image"]
|
||||
]
|
||||
return examples
|
||||
|
||||
dataset = dataset.with_transform(transform)
|
||||
|
||||
# Load model
|
||||
model = AutoModelForImageClassification.from_pretrained(
|
||||
"google/vit-base-patch16-224",
|
||||
num_labels=101, # 101 food categories
|
||||
ignore_mismatched_sizes=True
|
||||
)
|
||||
|
||||
# Training (similar pattern to text)
|
||||
training_args = TrainingArguments(
|
||||
output_dir="./vit-food101",
|
||||
remove_unused_columns=False, # Keep image data
|
||||
eval_strategy="epoch",
|
||||
save_strategy="epoch",
|
||||
learning_rate=5e-5,
|
||||
per_device_train_batch_size=32,
|
||||
num_train_epochs=3,
|
||||
)
|
||||
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=dataset,
|
||||
tokenizer=processor,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### Sequence-to-Sequence Tasks
|
||||
|
||||
For tasks like summarization, translation, use Seq2SeqTrainer:
|
||||
|
||||
```python
|
||||
from transformers import (
|
||||
AutoTokenizer, AutoModelForSeq2SeqLM,
|
||||
Seq2SeqTrainingArguments, Seq2SeqTrainer,
|
||||
DataCollatorForSeq2Seq
|
||||
)
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained("t5-small")
|
||||
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
|
||||
|
||||
def preprocess(examples):
|
||||
# Prefix input for T5
|
||||
inputs = ["summarize: " + doc for doc in examples["text"]]
|
||||
model_inputs = tokenizer(inputs, max_length=1024, truncation=True)
|
||||
|
||||
# Tokenize targets
|
||||
labels = tokenizer(
|
||||
examples["summary"],
|
||||
max_length=128,
|
||||
truncation=True
|
||||
)
|
||||
model_inputs["labels"] = labels["input_ids"]
|
||||
return model_inputs
|
||||
|
||||
tokenized_dataset = dataset.map(preprocess, batched=True)
|
||||
|
||||
training_args = Seq2SeqTrainingArguments(
|
||||
output_dir="./t5-summarization",
|
||||
eval_strategy="epoch",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=8,
|
||||
num_train_epochs=3,
|
||||
predict_with_generate=True, # Important for seq2seq
|
||||
)
|
||||
|
||||
trainer = Seq2SeqTrainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized_dataset["train"],
|
||||
eval_dataset=tokenized_dataset["test"],
|
||||
tokenizer=tokenizer,
|
||||
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### Important TrainingArguments
|
||||
|
||||
```python
|
||||
TrainingArguments(
|
||||
# Essential
|
||||
output_dir="./results",
|
||||
num_train_epochs=3,
|
||||
per_device_train_batch_size=8,
|
||||
learning_rate=2e-5,
|
||||
|
||||
# Evaluation
|
||||
eval_strategy="epoch", # or "steps"
|
||||
eval_steps=500, # if eval_strategy="steps"
|
||||
|
||||
# Checkpointing
|
||||
save_strategy="epoch",
|
||||
save_steps=500,
|
||||
save_total_limit=2, # Keep only 2 best checkpoints
|
||||
load_best_model_at_end=True,
|
||||
metric_for_best_model="accuracy",
|
||||
|
||||
# Optimization
|
||||
gradient_accumulation_steps=4,
|
||||
warmup_steps=500,
|
||||
weight_decay=0.01,
|
||||
max_grad_norm=1.0,
|
||||
|
||||
# Mixed Precision
|
||||
fp16=True, # For Nvidia GPUs
|
||||
bf16=True, # For Ampere+ GPUs (better)
|
||||
|
||||
# Logging
|
||||
logging_steps=100,
|
||||
report_to="tensorboard", # or "wandb", "mlflow"
|
||||
|
||||
# Memory Optimization
|
||||
gradient_checkpointing=True,
|
||||
optim="adamw_torch", # or "adafactor" for memory
|
||||
|
||||
# Distributed Training
|
||||
ddp_find_unused_parameters=False,
|
||||
)
|
||||
```
|
||||
|
||||
Refer to `references/training_guide.md` for comprehensive training patterns and optimization strategies.
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Model Quantization
|
||||
|
||||
Reduce memory footprint while maintaining accuracy:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
||||
|
||||
# 8-bit quantization
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"meta-llama/Llama-2-7b-hf",
|
||||
load_in_8bit=True,
|
||||
device_map="auto"
|
||||
)
|
||||
|
||||
# 4-bit quantization (even smaller)
|
||||
bnb_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_quant_type="nf4",
|
||||
bnb_4bit_compute_dtype=torch.float16,
|
||||
bnb_4bit_use_double_quant=True,
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"meta-llama/Llama-2-7b-hf",
|
||||
quantization_config=bnb_config,
|
||||
device_map="auto"
|
||||
)
|
||||
```
|
||||
|
||||
**Quantization Methods:**
|
||||
- **Bitsandbytes**: 4/8-bit on-the-fly quantization, supports PEFT fine-tuning
|
||||
- **GPTQ**: 2/3/4/8-bit, requires calibration, very fast inference
|
||||
- **AWQ**: 4-bit activation-aware, balanced speed/accuracy
|
||||
|
||||
Refer to `references/quantization.md` for detailed comparison and usage patterns.
|
||||
|
||||
### Training Optimization
|
||||
|
||||
```python
|
||||
# Gradient accumulation (simulate larger batch)
|
||||
training_args = TrainingArguments(
|
||||
per_device_train_batch_size=4,
|
||||
gradient_accumulation_steps=8, # Effective batch = 4 * 8 = 32
|
||||
)
|
||||
|
||||
# Gradient checkpointing (reduce memory, slower)
|
||||
training_args = TrainingArguments(
|
||||
gradient_checkpointing=True,
|
||||
)
|
||||
|
||||
# Mixed precision training
|
||||
training_args = TrainingArguments(
|
||||
bf16=True, # or fp16=True
|
||||
)
|
||||
|
||||
# Efficient optimizer
|
||||
training_args = TrainingArguments(
|
||||
optim="adafactor", # Lower memory than AdamW
|
||||
)
|
||||
```
|
||||
|
||||
**Key Strategies:**
|
||||
- **Batch sizes**: Use powers of 2 (8, 16, 32, 64, 128)
|
||||
- **Gradient accumulation**: Enables larger effective batch sizes
|
||||
- **Gradient checkpointing**: Reduces memory ~60%, increases time ~20%
|
||||
- **Mixed precision**: bf16 for Ampere+ GPUs, fp16 for older
|
||||
- **torch.compile**: Optimize model graph (PyTorch 2.0+)
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Custom Training Loop
|
||||
|
||||
For maximum control, bypass Trainer:
|
||||
|
||||
```python
|
||||
from torch.utils.data import DataLoader
|
||||
from transformers import AdamW, get_scheduler
|
||||
|
||||
# Prepare data
|
||||
train_dataloader = DataLoader(tokenized_dataset, batch_size=8, shuffle=True)
|
||||
|
||||
# Setup optimizer and scheduler
|
||||
optimizer = AdamW(model.parameters(), lr=5e-5)
|
||||
scheduler = get_scheduler(
|
||||
"linear",
|
||||
optimizer=optimizer,
|
||||
num_warmup_steps=0,
|
||||
num_training_steps=len(train_dataloader) * num_epochs
|
||||
)
|
||||
|
||||
# Training loop
|
||||
model.train()
|
||||
for epoch in range(num_epochs):
|
||||
for batch in train_dataloader:
|
||||
batch = {k: v.to(device) for k, v in batch.items()}
|
||||
|
||||
outputs = model(**batch)
|
||||
loss = outputs.loss
|
||||
loss.backward()
|
||||
|
||||
optimizer.step()
|
||||
scheduler.step()
|
||||
optimizer.zero_grad()
|
||||
```
|
||||
|
||||
### Parameter-Efficient Fine-Tuning (PEFT)
|
||||
|
||||
Use PEFT library with transformers for efficient fine-tuning:
|
||||
|
||||
```python
|
||||
from peft import LoraConfig, get_peft_model
|
||||
|
||||
# Configure LoRA
|
||||
lora_config = LoraConfig(
|
||||
r=16, # Low-rank dimension
|
||||
lora_alpha=32,
|
||||
target_modules=["q_proj", "v_proj"], # Which layers to adapt
|
||||
lora_dropout=0.05,
|
||||
bias="none",
|
||||
task_type="CAUSAL_LM"
|
||||
)
|
||||
|
||||
# Apply to model
|
||||
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
|
||||
model = get_peft_model(model, lora_config)
|
||||
|
||||
# Now train as usual - only LoRA parameters train
|
||||
trainer = Trainer(model=model, ...)
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### Chat Templates
|
||||
|
||||
Apply chat templates for instruction-tuned models:
|
||||
|
||||
```python
|
||||
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
|
||||
|
||||
messages = [
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "What is machine learning?"},
|
||||
]
|
||||
|
||||
# Format according to model's chat template
|
||||
formatted = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
|
||||
# Tokenize and generate
|
||||
inputs = tokenizer(formatted, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_length=200)
|
||||
response = tokenizer.decode(outputs[0])
|
||||
```
|
||||
|
||||
### Multi-GPU Training
|
||||
|
||||
```python
|
||||
# Automatic with Trainer - no code changes needed
|
||||
# Just run with: accelerate launch train.py
|
||||
|
||||
# Or use PyTorch DDP explicitly
|
||||
training_args = TrainingArguments(
|
||||
output_dir="./results",
|
||||
ddp_find_unused_parameters=False,
|
||||
# ... other args
|
||||
)
|
||||
|
||||
# For larger models, use FSDP
|
||||
training_args = TrainingArguments(
|
||||
output_dir="./results",
|
||||
fsdp="full_shard auto_wrap",
|
||||
fsdp_config={
|
||||
"fsdp_transformer_layer_cls_to_wrap": ["BertLayer"],
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
## Task-Specific Patterns
|
||||
|
||||
### Question Answering (Extractive)
|
||||
|
||||
```python
|
||||
from transformers import pipeline
|
||||
|
||||
qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
|
||||
|
||||
result = qa(
|
||||
question="What is extractive QA?",
|
||||
context="Extractive QA extracts the answer from the given context..."
|
||||
)
|
||||
# {'answer': 'extracts the answer from the given context', 'score': 0.97, ...}
|
||||
```
|
||||
|
||||
### Named Entity Recognition
|
||||
|
||||
```python
|
||||
ner = pipeline("token-classification", model="dslim/bert-base-NER")
|
||||
|
||||
result = ner("My name is John and I live in New York")
|
||||
# [{'entity': 'B-PER', 'word': 'John', ...}, {'entity': 'B-LOC', 'word': 'New York', ...}]
|
||||
```
|
||||
|
||||
### Image Captioning
|
||||
|
||||
```python
|
||||
from transformers import AutoProcessor, AutoModelForCausalLM
|
||||
|
||||
processor = AutoProcessor.from_pretrained("microsoft/git-base")
|
||||
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base")
|
||||
|
||||
from PIL import Image
|
||||
image = Image.open("image.jpg")
|
||||
|
||||
inputs = processor(images=image, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_length=50)
|
||||
caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]
|
||||
```
|
||||
|
||||
### Speech Recognition
|
||||
|
||||
```python
|
||||
transcriber = pipeline(
|
||||
"automatic-speech-recognition",
|
||||
model="openai/whisper-base"
|
||||
)
|
||||
|
||||
result = transcriber("audio.mp3")
|
||||
# {'text': 'This is the transcribed text...'}
|
||||
|
||||
# With timestamps
|
||||
result = transcriber("audio.mp3", return_timestamps=True)
|
||||
```
|
||||
|
||||
## Common Patterns and Best Practices
|
||||
|
||||
### Saving and Loading Models
|
||||
|
||||
```python
|
||||
# Save entire model
|
||||
model.save_pretrained("./my-model")
|
||||
tokenizer.save_pretrained("./my-model")
|
||||
|
||||
# Load later
|
||||
model = AutoModel.from_pretrained("./my-model")
|
||||
tokenizer = AutoTokenizer.from_pretrained("./my-model")
|
||||
|
||||
# Push to Hugging Face Hub
|
||||
model.push_to_hub("username/my-model")
|
||||
tokenizer.push_to_hub("username/my-model")
|
||||
|
||||
# Load from Hub
|
||||
model = AutoModel.from_pretrained("username/my-model")
|
||||
```
|
||||
|
||||
### Error Handling
|
||||
|
||||
```python
|
||||
from transformers import AutoModel
|
||||
import torch
|
||||
|
||||
try:
|
||||
model = AutoModel.from_pretrained("model-name")
|
||||
except OSError:
|
||||
print("Model not found - check internet connection or model name")
|
||||
except torch.cuda.OutOfMemoryError:
|
||||
print("GPU memory exceeded - try quantization or smaller batch size")
|
||||
```
|
||||
|
||||
### Device Management
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
# Check device availability
|
||||
device = "cuda" if torch.cuda.is_available() else "cpu"
|
||||
|
||||
# Move model to device
|
||||
model = model.to(device)
|
||||
|
||||
# Or use device_map for automatic distribution
|
||||
model = AutoModel.from_pretrained("model-name", device_map="auto")
|
||||
|
||||
# For inputs
|
||||
inputs = tokenizer(text, return_tensors="pt").to(device)
|
||||
```
|
||||
|
||||
### Memory Management
|
||||
|
||||
```python
|
||||
import torch
|
||||
|
||||
# Clear CUDA cache
|
||||
torch.cuda.empty_cache()
|
||||
|
||||
# Use context manager for inference
|
||||
with torch.no_grad():
|
||||
outputs = model(**inputs)
|
||||
|
||||
# Delete unused models
|
||||
del model
|
||||
torch.cuda.empty_cache()
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
This skill includes comprehensive reference documentation and example scripts:
|
||||
|
||||
### scripts/
|
||||
|
||||
- `quick_inference.py`: Ready-to-use script for running inference with pipelines
|
||||
- `fine_tune_classifier.py`: Complete example for fine-tuning a text classifier
|
||||
- `generate_text.py`: Text generation with various strategies
|
||||
|
||||
Execute scripts directly or read them as implementation templates.
|
||||
|
||||
### references/
|
||||
|
||||
- `api_reference.md`: Comprehensive API documentation for key classes
|
||||
- `training_guide.md`: Detailed training patterns, optimization, and troubleshooting
|
||||
- `generation_strategies.md`: In-depth guide to text generation methods
|
||||
- `quantization.md`: Model quantization techniques comparison and usage
|
||||
- `task_patterns.md`: Quick reference for common task implementations
|
||||
|
||||
Load reference files when you need detailed information on specific topics. References contain extensive examples, parameter explanations, and best practices.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Import errors:**
|
||||
```bash
|
||||
pip install transformers
|
||||
pip install accelerate # For device_map="auto"
|
||||
pip install bitsandbytes # For quantization
|
||||
```
|
||||
|
||||
**CUDA out of memory:**
|
||||
- Reduce batch size
|
||||
- Enable gradient checkpointing
|
||||
- Use gradient accumulation
|
||||
- Try quantization (8-bit or 4-bit)
|
||||
- Use smaller model variant
|
||||
|
||||
**Slow training:**
|
||||
- Enable mixed precision (fp16/bf16)
|
||||
- Increase batch size (if memory allows)
|
||||
- Use torch.compile (PyTorch 2.0+)
|
||||
- Check data loading isn't bottleneck
|
||||
|
||||
**Poor generation quality:**
|
||||
- Adjust temperature (lower = more focused)
|
||||
- Try different decoding strategies (beam search vs sampling)
|
||||
- Increase max_length if outputs cut off
|
||||
- Use repetition_penalty to reduce repetition
|
||||
|
||||
For task-specific guidance, consult the appropriate reference file in the `references/` directory.
|
||||
Reference in New Issue
Block a user