Add more scientific skills

2026-03-28 07:33:45 +08:00 · 2025-10-19 14:12:02 -07:00
parent 78d5ac2b56
commit 660c8574d0
210 changed files with 88957 additions and 1 deletions
--- a/scientific-packages/transformers/SKILL.md
+++ b/scientific-packages/transformers/SKILL.md
@@ -0,0 +1,860 @@
+---
+name: transformers
+description: Comprehensive toolkit for working with Hugging Face Transformers library for state-of-the-art machine learning across NLP, computer vision, audio, and multimodal tasks. Use this skill when working with pretrained models, fine-tuning transformers, implementing text generation, image classification, speech recognition, or any task involving transformer architectures like BERT, GPT, T5, Vision Transformers, CLIP, or Whisper.
+---
+
+# Transformers
+
+## Overview
+
+Transformers is Hugging Face's flagship library providing unified access to over 1 million pretrained models for machine learning across text, vision, audio, and multimodal domains. The library serves as a standardized model-definition framework compatible with PyTorch, TensorFlow, and JAX, emphasizing ease of use through three core components:
+
+- **Pipeline**: Simple, optimized inference API for common tasks
+- **AutoClasses**: Automatic model/tokenizer selection from pretrained checkpoints
+- **Trainer**: Full-featured training loop with distributed training, mixed precision, and optimization
+
+The library prioritizes accessibility with pretrained models that reduce computational costs and carbon footprint while providing compatibility across major training frameworks (PyTorch-Lightning, DeepSpeed, vLLM, etc.).
+
+## Quick Start with Pipelines
+
+Use pipelines for simple, efficient inference without managing models, tokenizers, or preprocessing manually. Pipelines abstract complexity into a single function call.
+
+### Basic Pipeline Usage
+
+```python
+from transformers import pipeline
+
+# Text classification
+classifier = pipeline("text-classification")
+result = classifier("This restaurant is awesome")
+# [{'label': 'POSITIVE', 'score': 0.9998}]
+
+# Text generation
+generator = pipeline("text-generation", model="meta-llama/Llama-2-7b-hf")
+generator("The secret to baking a good cake is", max_length=50)
+
+# Question answering
+qa = pipeline("question-answering")
+qa(question="What is extractive QA?", context="Extractive QA is...")
+
+# Image classification
+img_classifier = pipeline("image-classification")
+img_classifier("path/to/image.jpg")
+
+# Automatic speech recognition
+transcriber = pipeline("automatic-speech-recognition")
+transcriber("audio_file.mp3")
+```
+
+### Available Pipeline Tasks
+
+**NLP Tasks:**
+- `text-classification`, `token-classification`, `question-answering`
+- `fill-mask`, `summarization`, `translation`
+- `text-generation`, `conversational`
+- `zero-shot-classification`, `sentiment-analysis`
+
+**Vision Tasks:**
+- `image-classification`, `image-segmentation`, `object-detection`
+- `depth-estimation`, `image-to-image`, `zero-shot-image-classification`
+
+**Audio Tasks:**
+- `automatic-speech-recognition`, `audio-classification`
+- `text-to-audio`, `zero-shot-audio-classification`
+
+**Multimodal Tasks:**
+- `visual-question-answering`, `document-question-answering`
+- `image-to-text`, `zero-shot-object-detection`
+
+### Pipeline Best Practices
+
+**Device Management:**
+```python
+from transformers import pipeline, infer_device
+
+device = infer_device()  # Auto-detect best device
+pipe = pipeline("text-generation", model="...", device=device)
+```
+
+**Batch Processing:**
+```python
+# Process multiple inputs efficiently
+results = classifier(["Text 1", "Text 2", "Text 3"])
+
+# Use KeyDataset for large datasets
+from transformers.pipelines.pt_utils import KeyDataset
+from datasets import load_dataset
+
+dataset = load_dataset("imdb", split="test")
+for result in pipe(KeyDataset(dataset, "text")):
+    print(result)
+```
+
+**Memory Optimization:**
+```python
+# Use half-precision for faster inference
+pipe = pipeline("text-generation", model="...",
+                torch_dtype=torch.float16, device="cuda")
+```
+
+## Core Components
+
+### AutoClasses for Model Loading
+
+AutoClasses automatically select the correct architecture based on pretrained checkpoints.
+
+```python
+from transformers import (
+    AutoModel, AutoTokenizer, AutoConfig,
+    AutoModelForCausalLM, AutoModelForSequenceClassification
+)
+
+# Load any model by checkpoint name
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+model = AutoModel.from_pretrained("bert-base-uncased")
+
+# Task-specific model classes
+causal_lm = AutoModelForCausalLM.from_pretrained("gpt2")
+classifier = AutoModelForSequenceClassification.from_pretrained(
+    "bert-base-uncased",
+    num_labels=3
+)
+
+# Load with device and dtype optimization
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf",
+    device_map="auto",      # Automatically distribute across devices
+    torch_dtype="auto"      # Use optimal dtype
+)
+```
+
+**Key Parameters:**
+- `device_map="auto"`: Optimal device allocation (CPU/GPU/multi-GPU)
+- `torch_dtype`: Control precision (torch.float16, torch.bfloat16, "auto")
+- `trust_remote_code`: Enable custom model code (use cautiously)
+- `use_fast`: Enable Rust-backed fast tokenizers (default True)
+
+### Tokenization
+
+Tokenizers convert text to model-compatible tensor inputs.
+
+```python
+from transformers import AutoTokenizer
+
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+
+# Basic tokenization
+tokens = tokenizer.tokenize("Hello, how are you?")
+# ['hello', ',', 'how', 'are', 'you', '?']
+
+# Encoding (text → token IDs)
+encoded = tokenizer("Hello, how are you?", return_tensors="pt")
+# {'input_ids': tensor([[...]], 'attention_mask': tensor([[...]])}
+
+# Batch encoding with padding and truncation
+batch = tokenizer(
+    ["Short text", "This is a much longer text..."],
+    padding=True,           # Pad to longest in batch
+    truncation=True,        # Truncate to model's max length
+    max_length=512,
+    return_tensors="pt"
+)
+
+# Decoding (token IDs → text)
+text = tokenizer.decode(encoded['input_ids'][0])
+```
+
+**Special Tokens:**
+```python
+# Access special tokens
+tokenizer.pad_token      # Padding token
+tokenizer.cls_token      # Classification token
+tokenizer.sep_token      # Separator token
+tokenizer.mask_token     # Mask token (for MLM)
+
+# Add custom tokens
+tokenizer.add_tokens(["[CUSTOM]"])
+tokenizer.add_special_tokens({'additional_special_tokens': ['[NEW]']})
+
+# Resize model embeddings to match new vocabulary
+model.resize_token_embeddings(len(tokenizer))
+```
+
+### Image Processors
+
+For vision tasks, use image processors instead of tokenizers.
+
+```python
+from transformers import AutoImageProcessor
+
+processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
+
+# Process single image
+from PIL import Image
+image = Image.open("path/to/image.jpg")
+inputs = processor(image, return_tensors="pt")
+# Returns: {'pixel_values': tensor([[...]])}
+
+# Batch processing
+images = [Image.open(f"img{i}.jpg") for i in range(3)]
+inputs = processor(images, return_tensors="pt")
+```
+
+### Processors for Multimodal Models
+
+Multimodal models use processors that combine image and text processing.
+
+```python
+from transformers import AutoProcessor
+
+processor = AutoProcessor.from_pretrained("microsoft/git-base")
+
+# Process image + text caption
+inputs = processor(
+    images=image,
+    text="A description of the image",
+    return_tensors="pt",
+    padding=True
+)
+```
+
+## Model Inference
+
+### Basic Inference Pattern
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+# Load model and tokenizer
+model = AutoModelForCausalLM.from_pretrained("gpt2")
+tokenizer = AutoTokenizer.from_pretrained("gpt2")
+
+# Tokenize input
+inputs = tokenizer("The future of AI is", return_tensors="pt")
+
+# Generate (for causal LM)
+outputs = model.generate(**inputs, max_length=50)
+text = tokenizer.decode(outputs[0])
+
+# Or get model outputs directly
+outputs = model(**inputs)
+logits = outputs.logits  # Shape: (batch_size, seq_len, vocab_size)
+```
+
+### Text Generation Strategies
+
+For generative models, control generation behavior with parameters:
+
+```python
+# Greedy decoding (default)
+output = model.generate(inputs, max_length=50)
+
+# Beam search (multiple hypothesis)
+output = model.generate(
+    inputs,
+    max_length=50,
+    num_beams=5,           # Keep top 5 beams
+    early_stopping=True
+)
+
+# Sampling with temperature
+output = model.generate(
+    inputs,
+    max_length=50,
+    do_sample=True,
+    temperature=0.7,       # Lower = more focused, higher = more random
+    top_k=50,              # Sample from top 50 tokens
+    top_p=0.95             # Nucleus sampling
+)
+
+# Streaming generation
+from transformers import TextStreamer
+
+streamer = TextStreamer(tokenizer)
+model.generate(**inputs, streamer=streamer, max_length=100)
+```
+
+**Generation Parameters:**
+- `max_length` / `max_new_tokens`: Control output length
+- `num_beams`: Beam search width (1 = greedy)
+- `temperature`: Randomness (0.7-1.0 typical)
+- `top_k`: Sample from top k tokens
+- `top_p`: Nucleus sampling threshold
+- `repetition_penalty`: Discourage repetition (>1.0)
+
+Refer to `references/generation_strategies.md` for detailed information on choosing appropriate strategies.
+
+## Training and Fine-Tuning
+
+### Training Workflow Overview
+
+1. **Load dataset** → 2. **Preprocess** → 3. **Configure training** → 4. **Train** → 5. **Evaluate** → 6. **Save/Share**
+
+### Text Classification Example
+
+```python
+from transformers import (
+    AutoTokenizer, AutoModelForSequenceClassification,
+    TrainingArguments, Trainer, DataCollatorWithPadding
+)
+from datasets import load_dataset
+
+# 1. Load dataset
+dataset = load_dataset("imdb")
+
+# 2. Preprocess
+tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
+
+def preprocess(examples):
+    return tokenizer(examples["text"], truncation=True)
+
+tokenized = dataset.map(preprocess, batched=True)
+data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
+
+# 3. Load model
+model = AutoModelForSequenceClassification.from_pretrained(
+    "bert-base-uncased",
+    num_labels=2,
+    id2label={0: "negative", 1: "positive"},
+    label2id={"negative": 0, "positive": 1}
+)
+
+# 4. Configure training
+training_args = TrainingArguments(
+    output_dir="./results",
+    learning_rate=2e-5,
+    per_device_train_batch_size=16,
+    per_device_eval_batch_size=16,
+    num_train_epochs=3,
+    weight_decay=0.01,
+    eval_strategy="epoch",
+    save_strategy="epoch",
+    load_best_model_at_end=True,
+    push_to_hub=False,
+)
+
+# 5. Train
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=tokenized["train"],
+    eval_dataset=tokenized["test"],
+    tokenizer=tokenizer,
+    data_collator=data_collator,
+)
+
+trainer.train()
+
+# 6. Evaluate and save
+metrics = trainer.evaluate()
+trainer.save_model("./my-finetuned-model")
+trainer.push_to_hub()  # Share to Hugging Face Hub
+```
+
+### Vision Task Fine-Tuning
+
+```python
+from transformers import (
+    AutoImageProcessor, AutoModelForImageClassification,
+    TrainingArguments, Trainer
+)
+from datasets import load_dataset
+
+# Load dataset
+dataset = load_dataset("food101", split="train[:5000]")
+
+# Image preprocessing
+processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
+
+def transform(examples):
+    examples["pixel_values"] = [
+        processor(img.convert("RGB"), return_tensors="pt")["pixel_values"][0]
+        for img in examples["image"]
+    ]
+    return examples
+
+dataset = dataset.with_transform(transform)
+
+# Load model
+model = AutoModelForImageClassification.from_pretrained(
+    "google/vit-base-patch16-224",
+    num_labels=101,  # 101 food categories
+    ignore_mismatched_sizes=True
+)
+
+# Training (similar pattern to text)
+training_args = TrainingArguments(
+    output_dir="./vit-food101",
+    remove_unused_columns=False,  # Keep image data
+    eval_strategy="epoch",
+    save_strategy="epoch",
+    learning_rate=5e-5,
+    per_device_train_batch_size=32,
+    num_train_epochs=3,
+)
+
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=dataset,
+    tokenizer=processor,
+)
+
+trainer.train()
+```
+
+### Sequence-to-Sequence Tasks
+
+For tasks like summarization, translation, use Seq2SeqTrainer:
+
+```python
+from transformers import (
+    AutoTokenizer, AutoModelForSeq2SeqLM,
+    Seq2SeqTrainingArguments, Seq2SeqTrainer,
+    DataCollatorForSeq2Seq
+)
+
+tokenizer = AutoTokenizer.from_pretrained("t5-small")
+model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")
+
+def preprocess(examples):
+    # Prefix input for T5
+    inputs = ["summarize: " + doc for doc in examples["text"]]
+    model_inputs = tokenizer(inputs, max_length=1024, truncation=True)
+
+    # Tokenize targets
+    labels = tokenizer(
+        examples["summary"],
+        max_length=128,
+        truncation=True
+    )
+    model_inputs["labels"] = labels["input_ids"]
+    return model_inputs
+
+tokenized_dataset = dataset.map(preprocess, batched=True)
+
+training_args = Seq2SeqTrainingArguments(
+    output_dir="./t5-summarization",
+    eval_strategy="epoch",
+    learning_rate=2e-5,
+    per_device_train_batch_size=8,
+    num_train_epochs=3,
+    predict_with_generate=True,  # Important for seq2seq
+)
+
+trainer = Seq2SeqTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=tokenized_dataset["train"],
+    eval_dataset=tokenized_dataset["test"],
+    tokenizer=tokenizer,
+    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
+)
+
+trainer.train()
+```
+
+### Important TrainingArguments
+
+```python
+TrainingArguments(
+    # Essential
+    output_dir="./results",
+    num_train_epochs=3,
+    per_device_train_batch_size=8,
+    learning_rate=2e-5,
+
+    # Evaluation
+    eval_strategy="epoch",        # or "steps"
+    eval_steps=500,               # if eval_strategy="steps"
+
+    # Checkpointing
+    save_strategy="epoch",
+    save_steps=500,
+    save_total_limit=2,           # Keep only 2 best checkpoints
+    load_best_model_at_end=True,
+    metric_for_best_model="accuracy",
+
+    # Optimization
+    gradient_accumulation_steps=4,
+    warmup_steps=500,
+    weight_decay=0.01,
+    max_grad_norm=1.0,
+
+    # Mixed Precision
+    fp16=True,                    # For Nvidia GPUs
+    bf16=True,                    # For Ampere+ GPUs (better)
+
+    # Logging
+    logging_steps=100,
+    report_to="tensorboard",      # or "wandb", "mlflow"
+
+    # Memory Optimization
+    gradient_checkpointing=True,
+    optim="adamw_torch",          # or "adafactor" for memory
+
+    # Distributed Training
+    ddp_find_unused_parameters=False,
+)
+```
+
+Refer to `references/training_guide.md` for comprehensive training patterns and optimization strategies.
+
+## Performance Optimization
+
+### Model Quantization
+
+Reduce memory footprint while maintaining accuracy:
+
+```python
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig
+
+# 8-bit quantization
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf",
+    load_in_8bit=True,
+    device_map="auto"
+)
+
+# 4-bit quantization (even smaller)
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_use_double_quant=True,
+)
+
+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-2-7b-hf",
+    quantization_config=bnb_config,
+    device_map="auto"
+)
+```
+
+**Quantization Methods:**
+- **Bitsandbytes**: 4/8-bit on-the-fly quantization, supports PEFT fine-tuning
+- **GPTQ**: 2/3/4/8-bit, requires calibration, very fast inference
+- **AWQ**: 4-bit activation-aware, balanced speed/accuracy
+
+Refer to `references/quantization.md` for detailed comparison and usage patterns.
+
+### Training Optimization
+
+```python
+# Gradient accumulation (simulate larger batch)
+training_args = TrainingArguments(
+    per_device_train_batch_size=4,
+    gradient_accumulation_steps=8,  # Effective batch = 4 * 8 = 32
+)
+
+# Gradient checkpointing (reduce memory, slower)
+training_args = TrainingArguments(
+    gradient_checkpointing=True,
+)
+
+# Mixed precision training
+training_args = TrainingArguments(
+    bf16=True,  # or fp16=True
+)
+
+# Efficient optimizer
+training_args = TrainingArguments(
+    optim="adafactor",  # Lower memory than AdamW
+)
+```
+
+**Key Strategies:**
+- **Batch sizes**: Use powers of 2 (8, 16, 32, 64, 128)
+- **Gradient accumulation**: Enables larger effective batch sizes
+- **Gradient checkpointing**: Reduces memory ~60%, increases time ~20%
+- **Mixed precision**: bf16 for Ampere+ GPUs, fp16 for older
+- **torch.compile**: Optimize model graph (PyTorch 2.0+)
+
+## Advanced Features
+
+### Custom Training Loop
+
+For maximum control, bypass Trainer:
+
+```python
+from torch.utils.data import DataLoader
+from transformers import AdamW, get_scheduler
+
+# Prepare data
+train_dataloader = DataLoader(tokenized_dataset, batch_size=8, shuffle=True)
+
+# Setup optimizer and scheduler
+optimizer = AdamW(model.parameters(), lr=5e-5)
+scheduler = get_scheduler(
+    "linear",
+    optimizer=optimizer,
+    num_warmup_steps=0,
+    num_training_steps=len(train_dataloader) * num_epochs
+)
+
+# Training loop
+model.train()
+for epoch in range(num_epochs):
+    for batch in train_dataloader:
+        batch = {k: v.to(device) for k, v in batch.items()}
+
+        outputs = model(**batch)
+        loss = outputs.loss
+        loss.backward()
+
+        optimizer.step()
+        scheduler.step()
+        optimizer.zero_grad()
+```
+
+### Parameter-Efficient Fine-Tuning (PEFT)
+
+Use PEFT library with transformers for efficient fine-tuning:
+
+```python
+from peft import LoraConfig, get_peft_model
+
+# Configure LoRA
+lora_config = LoraConfig(
+    r=16,                   # Low-rank dimension
+    lora_alpha=32,
+    target_modules=["q_proj", "v_proj"],  # Which layers to adapt
+    lora_dropout=0.05,
+    bias="none",
+    task_type="CAUSAL_LM"
+)
+
+# Apply to model
+model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
+model = get_peft_model(model, lora_config)
+
+# Now train as usual - only LoRA parameters train
+trainer = Trainer(model=model, ...)
+trainer.train()
+```
+
+### Chat Templates
+
+Apply chat templates for instruction-tuned models:
+
+```python
+tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
+
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is machine learning?"},
+]
+
+# Format according to model's chat template
+formatted = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+
+# Tokenize and generate
+inputs = tokenizer(formatted, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=200)
+response = tokenizer.decode(outputs[0])
+```
+
+### Multi-GPU Training
+
+```python
+# Automatic with Trainer - no code changes needed
+# Just run with: accelerate launch train.py
+
+# Or use PyTorch DDP explicitly
+training_args = TrainingArguments(
+    output_dir="./results",
+    ddp_find_unused_parameters=False,
+    # ... other args
+)
+
+# For larger models, use FSDP
+training_args = TrainingArguments(
+    output_dir="./results",
+    fsdp="full_shard auto_wrap",
+    fsdp_config={
+        "fsdp_transformer_layer_cls_to_wrap": ["BertLayer"],
+    },
+)
+```
+
+## Task-Specific Patterns
+
+### Question Answering (Extractive)
+
+```python
+from transformers import pipeline
+
+qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
+
+result = qa(
+    question="What is extractive QA?",
+    context="Extractive QA extracts the answer from the given context..."
+)
+# {'answer': 'extracts the answer from the given context', 'score': 0.97, ...}
+```
+
+### Named Entity Recognition
+
+```python
+ner = pipeline("token-classification", model="dslim/bert-base-NER")
+
+result = ner("My name is John and I live in New York")
+# [{'entity': 'B-PER', 'word': 'John', ...}, {'entity': 'B-LOC', 'word': 'New York', ...}]
+```
+
+### Image Captioning
+
+```python
+from transformers import AutoProcessor, AutoModelForCausalLM
+
+processor = AutoProcessor.from_pretrained("microsoft/git-base")
+model = AutoModelForCausalLM.from_pretrained("microsoft/git-base")
+
+from PIL import Image
+image = Image.open("image.jpg")
+
+inputs = processor(images=image, return_tensors="pt")
+outputs = model.generate(**inputs, max_length=50)
+caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]
+```
+
+### Speech Recognition
+
+```python
+transcriber = pipeline(
+    "automatic-speech-recognition",
+    model="openai/whisper-base"
+)
+
+result = transcriber("audio.mp3")
+# {'text': 'This is the transcribed text...'}
+
+# With timestamps
+result = transcriber("audio.mp3", return_timestamps=True)
+```
+
+## Common Patterns and Best Practices
+
+### Saving and Loading Models
+
+```python
+# Save entire model
+model.save_pretrained("./my-model")
+tokenizer.save_pretrained("./my-model")
+
+# Load later
+model = AutoModel.from_pretrained("./my-model")
+tokenizer = AutoTokenizer.from_pretrained("./my-model")
+
+# Push to Hugging Face Hub
+model.push_to_hub("username/my-model")
+tokenizer.push_to_hub("username/my-model")
+
+# Load from Hub
+model = AutoModel.from_pretrained("username/my-model")
+```
+
+### Error Handling
+
+```python
+from transformers import AutoModel
+import torch
+
+try:
+    model = AutoModel.from_pretrained("model-name")
+except OSError:
+    print("Model not found - check internet connection or model name")
+except torch.cuda.OutOfMemoryError:
+    print("GPU memory exceeded - try quantization or smaller batch size")
+```
+
+### Device Management
+
+```python
+import torch
+
+# Check device availability
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
+# Move model to device
+model = model.to(device)
+
+# Or use device_map for automatic distribution
+model = AutoModel.from_pretrained("model-name", device_map="auto")
+
+# For inputs
+inputs = tokenizer(text, return_tensors="pt").to(device)
+```
+
+### Memory Management
+
+```python
+import torch
+
+# Clear CUDA cache
+torch.cuda.empty_cache()
+
+# Use context manager for inference
+with torch.no_grad():
+    outputs = model(**inputs)
+
+# Delete unused models
+del model
+torch.cuda.empty_cache()
+```
+
+## Resources
+
+This skill includes comprehensive reference documentation and example scripts:
+
+### scripts/
+
+- `quick_inference.py`: Ready-to-use script for running inference with pipelines
+- `fine_tune_classifier.py`: Complete example for fine-tuning a text classifier
+- `generate_text.py`: Text generation with various strategies
+
+Execute scripts directly or read them as implementation templates.
+
+### references/
+
+- `api_reference.md`: Comprehensive API documentation for key classes
+- `training_guide.md`: Detailed training patterns, optimization, and troubleshooting
+- `generation_strategies.md`: In-depth guide to text generation methods
+- `quantization.md`: Model quantization techniques comparison and usage
+- `task_patterns.md`: Quick reference for common task implementations
+
+Load reference files when you need detailed information on specific topics. References contain extensive examples, parameter explanations, and best practices.
+
+## Troubleshooting
+
+**Import errors:**
+```bash
+pip install transformers
+pip install accelerate  # For device_map="auto"
+pip install bitsandbytes  # For quantization
+```
+
+**CUDA out of memory:**
+- Reduce batch size
+- Enable gradient checkpointing
+- Use gradient accumulation
+- Try quantization (8-bit or 4-bit)
+- Use smaller model variant
+
+**Slow training:**
+- Enable mixed precision (fp16/bf16)
+- Increase batch size (if memory allows)
+- Use torch.compile (PyTorch 2.0+)
+- Check data loading isn't bottleneck
+
+**Poor generation quality:**
+- Adjust temperature (lower = more focused)
+- Try different decoding strategies (beam search vs sampling)
+- Increase max_length if outputs cut off
+- Use repetition_penalty to reduce repetition
+
+For task-specific guidance, consult the appropriate reference file in the `references/` directory.