Files
claude-scientific-skills/scientific-packages/transformers/references/pipelines.md
2025-10-21 10:30:38 -07:00

5.5 KiB

Transformers Pipelines

Pipelines provide a simple and optimized interface for inference across many machine learning tasks. They abstract away the complexity of tokenization, model invocation, and post-processing.

Usage Pattern

from transformers import pipeline

# Basic usage
classifier = pipeline("text-classification")
result = classifier("This movie was amazing!")

# With specific model
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("This movie was amazing!")

Natural Language Processing Pipelines

Text Classification

classifier = pipeline("text-classification")
classifier("I love this product!")
# [{'label': 'POSITIVE', 'score': 0.9998}]

Zero-Shot Classification

classifier = pipeline("zero-shot-classification")
classifier("This is about climate change", candidate_labels=["politics", "science", "sports"])

Token Classification (NER)

ner = pipeline("token-classification")
ner("My name is Sarah and I work at Microsoft in Seattle")

Question Answering

qa = pipeline("question-answering")
qa(question="What is the capital?", context="The capital of France is Paris.")

Text Generation

generator = pipeline("text-generation")
generator("Once upon a time", max_length=50)

Text2Text Generation

generator = pipeline("text2text-generation", model="t5-base")
generator("translate English to French: Hello")

Summarization

summarizer = pipeline("summarization")
summarizer("Long article text here...", max_length=130, min_length=30)

Translation

translator = pipeline("translation_en_to_fr")
translator("Hello, how are you?")

Fill Mask

unmasker = pipeline("fill-mask")
unmasker("Paris is the [MASK] of France.")

Feature Extraction

extractor = pipeline("feature-extraction")
embeddings = extractor("This is a sentence")

Document Question Answering

doc_qa = pipeline("document-question-answering")
doc_qa(image="document.png", question="What is the invoice number?")

Table Question Answering

table_qa = pipeline("table-question-answering")
table_qa(table=data, query="How many employees?")

Computer Vision Pipelines

Image Classification

classifier = pipeline("image-classification")
classifier("cat.jpg")

Zero-Shot Image Classification

classifier = pipeline("zero-shot-image-classification")
classifier("cat.jpg", candidate_labels=["cat", "dog", "bird"])

Object Detection

detector = pipeline("object-detection")
detector("street.jpg")

Image Segmentation

segmenter = pipeline("image-segmentation")
segmenter("image.jpg")

Image-to-Image

img2img = pipeline("image-to-image", model="lllyasviel/sd-controlnet-canny")
img2img("input.jpg")

Depth Estimation

depth = pipeline("depth-estimation")
depth("image.jpg")

Video Classification

classifier = pipeline("video-classification")
classifier("video.mp4")

Keypoint Matching

matcher = pipeline("keypoint-matching")
matcher(image1="img1.jpg", image2="img2.jpg")

Audio Pipelines

Automatic Speech Recognition

asr = pipeline("automatic-speech-recognition")
asr("audio.wav")

Audio Classification

classifier = pipeline("audio-classification")
classifier("audio.wav")

Zero-Shot Audio Classification

classifier = pipeline("zero-shot-audio-classification")
classifier("audio.wav", candidate_labels=["speech", "music", "noise"])

Text-to-Audio/Text-to-Speech

synthesizer = pipeline("text-to-audio")
audio = synthesizer("Hello, how are you today?")

Multimodal Pipelines

Image-to-Text (Image Captioning)

captioner = pipeline("image-to-text")
captioner("image.jpg")

Visual Question Answering

vqa = pipeline("visual-question-answering")
vqa(image="image.jpg", question="What color is the car?")

Image-Text-to-Text (VLMs)

vlm = pipeline("image-text-to-text")
vlm(images="image.jpg", text="Describe this image in detail")

Zero-Shot Object Detection

detector = pipeline("zero-shot-object-detection")
detector("image.jpg", candidate_labels=["car", "person", "tree"])

Pipeline Configuration

Common Parameters

  • model: Specify model identifier or path
  • device: Set device (0 for GPU, -1 for CPU, or "cuda:0")
  • batch_size: Process multiple inputs at once
  • torch_dtype: Set precision (torch.float16, torch.bfloat16)
# GPU with half precision
pipe = pipeline("text-generation", model="gpt2", device=0, torch_dtype=torch.float16)

# Batch processing
pipe(["text 1", "text 2", "text 3"], batch_size=8)

Task-Specific Parameters

Each pipeline accepts task-specific parameters in the call:

# Text generation
generator("prompt", max_length=100, temperature=0.7, top_p=0.9, num_return_sequences=3)

# Summarization
summarizer("text", max_length=130, min_length=30, do_sample=False)

# Translation
translator("text", max_length=512, num_beams=4)

Best Practices

  1. Reuse pipelines: Create once, use multiple times for efficiency
  2. Batch processing: Use batches for multiple inputs to maximize throughput
  3. GPU acceleration: Set device=0 for GPU when available
  4. Model selection: Choose task-specific models for best results
  5. Memory management: Use torch_dtype=torch.float16 for large models