mirror of https://github.com/K-Dense-AI/claude-scientific-skills.git synced 2026-01-26 16:58:56 +08:00

Files

Haoxuan "Orion" Li c2b16829f6 Update SKILL.md files to add double quotation marks for all skills, ensuring clarity and consistency across all entries.

2025-10-20 20:51:50 -07:00

24 KiB

Raw Blame History

name, description

name	description
transformers	Essential toolkit for Hugging Face Transformers library enabling state-of-the-art machine learning across natural language processing, computer vision, audio processing, and multimodal applications. Use this skill for: loading and using pretrained transformer models (BERT, GPT, T5, RoBERTa, DistilBERT, BART, T5, ViT, CLIP, Whisper, Llama, Mistral), implementing text generation and completion, fine-tuning models for custom tasks, text classification and sentiment analysis, question answering and reading comprehension, named entity recognition and token classification, text summarization and translation, image classification and object detection, speech recognition and audio processing, multimodal tasks combining text and images, parameter-efficient fine-tuning with LoRA and adapters, model quantization and optimization, training custom transformer models, implementing chat interfaces and conversational AI, working with tokenizers and text preprocessing, handling model inference and deployment, managing GPU memory and device allocation, implementing custom training loops, using pipelines for quick inference, working with Hugging Face Hub for model sharing, and any machine learning task involving transformer architectures or attention mechanisms.

name

description

transformers

Essential toolkit for Hugging Face Transformers library enabling state-of-the-art machine learning across natural language processing, computer vision, audio processing, and multimodal applications. Use this skill for: loading and using pretrained transformer models (BERT, GPT, T5, RoBERTa, DistilBERT, BART, T5, ViT, CLIP, Whisper, Llama, Mistral), implementing text generation and completion, fine-tuning models for custom tasks, text classification and sentiment analysis, question answering and reading comprehension, named entity recognition and token classification, text summarization and translation, image classification and object detection, speech recognition and audio processing, multimodal tasks combining text and images, parameter-efficient fine-tuning with LoRA and adapters, model quantization and optimization, training custom transformer models, implementing chat interfaces and conversational AI, working with tokenizers and text preprocessing, handling model inference and deployment, managing GPU memory and device allocation, implementing custom training loops, using pipelines for quick inference, working with Hugging Face Hub for model sharing, and any machine learning task involving transformer architectures or attention mechanisms.

Transformers

Overview

Transformers is Hugging Face's flagship library providing unified access to over 1 million pretrained models for machine learning across text, vision, audio, and multimodal domains. The library serves as a standardized model-definition framework compatible with PyTorch, TensorFlow, and JAX, emphasizing ease of use through three core components:

Pipeline: Simple, optimized inference API for common tasks
AutoClasses: Automatic model/tokenizer selection from pretrained checkpoints
Trainer: Full-featured training loop with distributed training, mixed precision, and optimization

The library prioritizes accessibility with pretrained models that reduce computational costs and carbon footprint while providing compatibility across major training frameworks (PyTorch-Lightning, DeepSpeed, vLLM, etc.).

Quick Start with Pipelines

Use pipelines for simple, efficient inference without managing models, tokenizers, or preprocessing manually. Pipelines abstract complexity into a single function call.

Basic Pipeline Usage

from transformers import pipeline

# Text classification
classifier = pipeline("text-classification")
result = classifier("This restaurant is awesome")
# [{'label': 'POSITIVE', 'score': 0.9998}]

# Text generation
generator = pipeline("text-generation", model="meta-llama/Llama-2-7b-hf")
generator("The secret to baking a good cake is", max_length=50)

# Question answering
qa = pipeline("question-answering")
qa(question="What is extractive QA?", context="Extractive QA is...")

# Image classification
img_classifier = pipeline("image-classification")
img_classifier("path/to/image.jpg")

# Automatic speech recognition
transcriber = pipeline("automatic-speech-recognition")
transcriber("audio_file.mp3")

Available Pipeline Tasks

NLP Tasks:

text-classification, token-classification, question-answering
fill-mask, summarization, translation
text-generation, conversational
zero-shot-classification, sentiment-analysis

Vision Tasks:

image-classification, image-segmentation, object-detection
depth-estimation, image-to-image, zero-shot-image-classification

Audio Tasks:

automatic-speech-recognition, audio-classification
text-to-audio, zero-shot-audio-classification

Multimodal Tasks:

visual-question-answering, document-question-answering
image-to-text, zero-shot-object-detection

Pipeline Best Practices

Device Management:

from transformers import pipeline, infer_device

device = infer_device()  # Auto-detect best device
pipe = pipeline("text-generation", model="...", device=device)

Batch Processing:

# Process multiple inputs efficiently
results = classifier(["Text 1", "Text 2", "Text 3"])

# Use KeyDataset for large datasets
from transformers.pipelines.pt_utils import KeyDataset
from datasets import load_dataset

dataset = load_dataset("imdb", split="test")
for result in pipe(KeyDataset(dataset, "text")):
    print(result)

Memory Optimization:

# Use half-precision for faster inference
pipe = pipeline("text-generation", model="...",
                torch_dtype=torch.float16, device="cuda")

Core Components

AutoClasses for Model Loading

AutoClasses automatically select the correct architecture based on pretrained checkpoints.

from transformers import (
    AutoModel, AutoTokenizer, AutoConfig,
    AutoModelForCausalLM, AutoModelForSequenceClassification
)

# Load any model by checkpoint name
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

# Task-specific model classes
causal_lm = AutoModelForCausalLM.from_pretrained("gpt2")
classifier = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=3
)

# Load with device and dtype optimization
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    device_map="auto",      # Automatically distribute across devices
    torch_dtype="auto"      # Use optimal dtype
)

Key Parameters:

device_map="auto": Optimal device allocation (CPU/GPU/multi-GPU)
torch_dtype: Control precision (torch.float16, torch.bfloat16, "auto")
trust_remote_code: Enable custom model code (use cautiously)
use_fast: Enable Rust-backed fast tokenizers (default True)

Tokenization

Tokenizers convert text to model-compatible tensor inputs.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Basic tokenization
tokens = tokenizer.tokenize("Hello, how are you?")
# ['hello', ',', 'how', 'are', 'you', '?']

# Encoding (text → token IDs)
encoded = tokenizer("Hello, how are you?", return_tensors="pt")
# {'input_ids': tensor([[...]], 'attention_mask': tensor([[...]])}

# Batch encoding with padding and truncation
batch = tokenizer(
    ["Short text", "This is a much longer text..."],
    padding=True,           # Pad to longest in batch
    truncation=True,        # Truncate to model's max length
    max_length=512,
    return_tensors="pt"
)

# Decoding (token IDs → text)
text = tokenizer.decode(encoded['input_ids'][0])

Special Tokens:

# Access special tokens
tokenizer.pad_token      # Padding token
tokenizer.cls_token      # Classification token
tokenizer.sep_token      # Separator token
tokenizer.mask_token     # Mask token (for MLM)

# Add custom tokens
tokenizer.add_tokens(["[CUSTOM]"])
tokenizer.add_special_tokens({'additional_special_tokens': ['[NEW]']})

# Resize model embeddings to match new vocabulary
model.resize_token_embeddings(len(tokenizer))

Image Processors

For vision tasks, use image processors instead of tokenizers.

from transformers import AutoImageProcessor

processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")

# Process single image
from PIL import Image
image = Image.open("path/to/image.jpg")
inputs = processor(image, return_tensors="pt")
# Returns: {'pixel_values': tensor([[...]])}

# Batch processing
images = [Image.open(f"img{i}.jpg") for i in range(3)]
inputs = processor(images, return_tensors="pt")

Processors for Multimodal Models

Multimodal models use processors that combine image and text processing.

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("microsoft/git-base")

# Process image + text caption
inputs = processor(
    images=image,
    text="A description of the image",
    return_tensors="pt",
    padding=True
)

Model Inference

Basic Inference Pattern

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Tokenize input
inputs = tokenizer("The future of AI is", return_tensors="pt")

# Generate (for causal LM)
outputs = model.generate(**inputs, max_length=50)
text = tokenizer.decode(outputs[0])

# Or get model outputs directly
outputs = model(**inputs)
logits = outputs.logits  # Shape: (batch_size, seq_len, vocab_size)

Text Generation Strategies

For generative models, control generation behavior with parameters:

# Greedy decoding (default)
output = model.generate(inputs, max_length=50)

# Beam search (multiple hypothesis)
output = model.generate(
    inputs,
    max_length=50,
    num_beams=5,           # Keep top 5 beams
    early_stopping=True
)

# Sampling with temperature
output = model.generate(
    inputs,
    max_length=50,
    do_sample=True,
    temperature=0.7,       # Lower = more focused, higher = more random
    top_k=50,              # Sample from top 50 tokens
    top_p=0.95             # Nucleus sampling
)

# Streaming generation
from transformers import TextStreamer

streamer = TextStreamer(tokenizer)
model.generate(**inputs, streamer=streamer, max_length=100)

Generation Parameters:

max_length / max_new_tokens: Control output length
num_beams: Beam search width (1 = greedy)
temperature: Randomness (0.7-1.0 typical)
top_k: Sample from top k tokens
top_p: Nucleus sampling threshold
repetition_penalty: Discourage repetition (>1.0)

Refer to references/generation_strategies.md for detailed information on choosing appropriate strategies.

Training and Fine-Tuning

Training Workflow Overview

Load dataset → 2. Preprocess → 3. Configure training → 4. Train → 5. Evaluate → 6. Save/Share

Text Classification Example

from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    TrainingArguments, Trainer, DataCollatorWithPadding
)
from datasets import load_dataset

# 1. Load dataset
dataset = load_dataset("imdb")

# 2. Preprocess
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def preprocess(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized = dataset.map(preprocess, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# 3. Load model
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2,
    id2label={0: "negative", 1: "positive"},
    label2id={"negative": 0, "positive": 1}
)

# 4. Configure training
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,
)

# 5. Train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized["train"],
    eval_dataset=tokenized["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

trainer.train()

# 6. Evaluate and save
metrics = trainer.evaluate()
trainer.save_model("./my-finetuned-model")
trainer.push_to_hub()  # Share to Hugging Face Hub

Vision Task Fine-Tuning

from transformers import (
    AutoImageProcessor, AutoModelForImageClassification,
    TrainingArguments, Trainer
)
from datasets import load_dataset

# Load dataset
dataset = load_dataset("food101", split="train[:5000]")

# Image preprocessing
processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")

def transform(examples):
    examples["pixel_values"] = [
        processor(img.convert("RGB"), return_tensors="pt")["pixel_values"][0]
        for img in examples["image"]
    ]
    return examples

dataset = dataset.with_transform(transform)

# Load model
model = AutoModelForImageClassification.from_pretrained(
    "google/vit-base-patch16-224",
    num_labels=101,  # 101 food categories
    ignore_mismatched_sizes=True
)

# Training (similar pattern to text)
training_args = TrainingArguments(
    output_dir="./vit-food101",
    remove_unused_columns=False,  # Keep image data
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=32,
    num_train_epochs=3,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=processor,
)

trainer.train()

Sequence-to-Sequence Tasks

For tasks like summarization, translation, use Seq2SeqTrainer:

from transformers import (
    AutoTokenizer, AutoModelForSeq2SeqLM,
    Seq2SeqTrainingArguments, Seq2SeqTrainer,
    DataCollatorForSeq2Seq
)

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = AutoModelForSeq2SeqLM.from_pretrained("t5-small")

def preprocess(examples):
    # Prefix input for T5
    inputs = ["summarize: " + doc for doc in examples["text"]]
    model_inputs = tokenizer(inputs, max_length=1024, truncation=True)

    # Tokenize targets
    labels = tokenizer(
        examples["summary"],
        max_length=128,
        truncation=True
    )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

tokenized_dataset = dataset.map(preprocess, batched=True)

training_args = Seq2SeqTrainingArguments(
    output_dir="./t5-summarization",
    eval_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    predict_with_generate=True,  # Important for seq2seq
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
)

trainer.train()

Important TrainingArguments

TrainingArguments(
    # Essential
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    learning_rate=2e-5,

    # Evaluation
    eval_strategy="epoch",        # or "steps"
    eval_steps=500,               # if eval_strategy="steps"

    # Checkpointing
    save_strategy="epoch",
    save_steps=500,
    save_total_limit=2,           # Keep only 2 best checkpoints
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",

    # Optimization
    gradient_accumulation_steps=4,
    warmup_steps=500,
    weight_decay=0.01,
    max_grad_norm=1.0,

    # Mixed Precision
    fp16=True,                    # For Nvidia GPUs
    bf16=True,                    # For Ampere+ GPUs (better)

    # Logging
    logging_steps=100,
    report_to="tensorboard",      # or "wandb", "mlflow"

    # Memory Optimization
    gradient_checkpointing=True,
    optim="adamw_torch",          # or "adafactor" for memory

    # Distributed Training
    ddp_find_unused_parameters=False,
)

Refer to references/training_guide.md for comprehensive training patterns and optimization strategies.

Performance Optimization

Model Quantization

Reduce memory footprint while maintaining accuracy:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

# 8-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    load_in_8bit=True,
    device_map="auto"
)

# 4-bit quantization (even smaller)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=bnb_config,
    device_map="auto"
)

Quantization Methods:

Bitsandbytes: 4/8-bit on-the-fly quantization, supports PEFT fine-tuning
GPTQ: 2/3/4/8-bit, requires calibration, very fast inference
AWQ: 4-bit activation-aware, balanced speed/accuracy

Refer to references/quantization.md for detailed comparison and usage patterns.

Training Optimization

# Gradient accumulation (simulate larger batch)
training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=8,  # Effective batch = 4 * 8 = 32
)

# Gradient checkpointing (reduce memory, slower)
training_args = TrainingArguments(
    gradient_checkpointing=True,
)

# Mixed precision training
training_args = TrainingArguments(
    bf16=True,  # or fp16=True
)

# Efficient optimizer
training_args = TrainingArguments(
    optim="adafactor",  # Lower memory than AdamW
)

Key Strategies:

Batch sizes: Use powers of 2 (8, 16, 32, 64, 128)
Gradient accumulation: Enables larger effective batch sizes
Gradient checkpointing: Reduces memory ~60%, increases time ~20%
Mixed precision: bf16 for Ampere+ GPUs, fp16 for older
torch.compile: Optimize model graph (PyTorch 2.0+)

Advanced Features

Custom Training Loop

For maximum control, bypass Trainer:

from torch.utils.data import DataLoader
from transformers import AdamW, get_scheduler

# Prepare data
train_dataloader = DataLoader(tokenized_dataset, batch_size=8, shuffle=True)

# Setup optimizer and scheduler
optimizer = AdamW(model.parameters(), lr=5e-5)
scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=len(train_dataloader) * num_epochs
)

# Training loop
model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}

        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

Parameter-Efficient Fine-Tuning (PEFT)

Use PEFT library with transformers for efficient fine-tuning:

from peft import LoraConfig, get_peft_model

# Configure LoRA
lora_config = LoraConfig(
    r=16,                   # Low-rank dimension
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # Which layers to adapt
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply to model
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
model = get_peft_model(model, lora_config)

# Now train as usual - only LoRA parameters train
trainer = Trainer(model=model, ...)
trainer.train()

Chat Templates

Apply chat templates for instruction-tuned models:

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is machine learning?"},
]

# Format according to model's chat template
formatted = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize and generate
inputs = tokenizer(formatted, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0])

Multi-GPU Training

# Automatic with Trainer - no code changes needed
# Just run with: accelerate launch train.py

# Or use PyTorch DDP explicitly
training_args = TrainingArguments(
    output_dir="./results",
    ddp_find_unused_parameters=False,
    # ... other args
)

# For larger models, use FSDP
training_args = TrainingArguments(
    output_dir="./results",
    fsdp="full_shard auto_wrap",
    fsdp_config={
        "fsdp_transformer_layer_cls_to_wrap": ["BertLayer"],
    },
)

Task-Specific Patterns

Question Answering (Extractive)

from transformers import pipeline

qa = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")

result = qa(
    question="What is extractive QA?",
    context="Extractive QA extracts the answer from the given context..."
)
# {'answer': 'extracts the answer from the given context', 'score': 0.97, ...}

Named Entity Recognition

ner = pipeline("token-classification", model="dslim/bert-base-NER")

result = ner("My name is John and I live in New York")
# [{'entity': 'B-PER', 'word': 'John', ...}, {'entity': 'B-LOC', 'word': 'New York', ...}]

Image Captioning

from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("microsoft/git-base")
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base")

from PIL import Image
image = Image.open("image.jpg")

inputs = processor(images=image, return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
caption = processor.batch_decode(outputs, skip_special_tokens=True)[0]

Speech Recognition

transcriber = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base"
)

result = transcriber("audio.mp3")
# {'text': 'This is the transcribed text...'}

# With timestamps
result = transcriber("audio.mp3", return_timestamps=True)

Common Patterns and Best Practices

Saving and Loading Models

# Save entire model
model.save_pretrained("./my-model")
tokenizer.save_pretrained("./my-model")

# Load later
model = AutoModel.from_pretrained("./my-model")
tokenizer = AutoTokenizer.from_pretrained("./my-model")

# Push to Hugging Face Hub
model.push_to_hub("username/my-model")
tokenizer.push_to_hub("username/my-model")

# Load from Hub
model = AutoModel.from_pretrained("username/my-model")

Error Handling

from transformers import AutoModel
import torch

try:
    model = AutoModel.from_pretrained("model-name")
except OSError:
    print("Model not found - check internet connection or model name")
except torch.cuda.OutOfMemoryError:
    print("GPU memory exceeded - try quantization or smaller batch size")

Device Management

import torch

# Check device availability
device = "cuda" if torch.cuda.is_available() else "cpu"

# Move model to device
model = model.to(device)

# Or use device_map for automatic distribution
model = AutoModel.from_pretrained("model-name", device_map="auto")

# For inputs
inputs = tokenizer(text, return_tensors="pt").to(device)

Memory Management

import torch

# Clear CUDA cache
torch.cuda.empty_cache()

# Use context manager for inference
with torch.no_grad():
    outputs = model(**inputs)

# Delete unused models
del model
torch.cuda.empty_cache()

Resources

This skill includes comprehensive reference documentation and example scripts:

scripts/

quick_inference.py: Ready-to-use script for running inference with pipelines
fine_tune_classifier.py: Complete example for fine-tuning a text classifier
generate_text.py: Text generation with various strategies

Execute scripts directly or read them as implementation templates.

references/

api_reference.md: Comprehensive API documentation for key classes
training_guide.md: Detailed training patterns, optimization, and troubleshooting
generation_strategies.md: In-depth guide to text generation methods
quantization.md: Model quantization techniques comparison and usage
task_patterns.md: Quick reference for common task implementations

Load reference files when you need detailed information on specific topics. References contain extensive examples, parameter explanations, and best practices.

Troubleshooting

Import errors:

pip install transformers
pip install accelerate  # For device_map="auto"
pip install bitsandbytes  # For quantization

CUDA out of memory:

Reduce batch size
Enable gradient checkpointing
Use gradient accumulation
Try quantization (8-bit or 4-bit)
Use smaller model variant

Slow training:

Enable mixed precision (fp16/bf16)
Increase batch size (if memory allows)
Use torch.compile (PyTorch 2.0+)
Check data loading isn't bottleneck

Poor generation quality:

Adjust temperature (lower = more focused)
Try different decoding strategies (beam search vs sampling)
Increase max_length if outputs cut off
Use repetition_penalty to reduce repetition

For task-specific guidance, consult the appropriate reference file in the references/ directory.

24 KiB Raw Blame History