Improve the Hugging Face transformers skill

2026-03-28 07:33:45 +08:00 · 2025-11-03 16:44:15 -08:00
parent 86d8878eeb
commit c56fa43747
12 changed files with 2041 additions and 2705 deletions
--- a/scientific-packages/transformers/references/pipelines.md
+++ b/scientific-packages/transformers/references/pipelines.md
@@ -1,234 +1,335 @@
-# Transformers Pipelines
+# Pipeline API Reference

-Pipelines provide a simple and optimized interface for inference across many machine learning tasks. They abstract away the complexity of tokenization, model invocation, and post-processing.
+## Overview

-## Usage Pattern
+Pipelines provide the simplest way to use pre-trained models for inference. They abstract away tokenization, model loading, and post-processing, offering a unified interface for dozens of tasks.
+
+## Basic Usage
+
+Create a pipeline by specifying a task:

 ```python
 from transformers import pipeline

-# Basic usage
-classifier = pipeline("text-classification")
-result = classifier("This movie was amazing!")
-
-# With specific model
-classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
-result = classifier("This movie was amazing!")
+# Auto-select default model for task
+pipe = pipeline("text-classification")
+result = pipe("This is great!")
 ```

-## Natural Language Processing Pipelines
+Or specify a model:

-### Text Classification
+```python
+pipe = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
+```
+
+## Supported Tasks
+
+### Natural Language Processing
+
+**text-generation**: Generate text continuations
+```python
+generator = pipeline("text-generation", model="gpt2")
+output = generator("Once upon a time", max_length=50, num_return_sequences=2)
+```
+
+**text-classification**: Classify text into categories
 ```python
 classifier = pipeline("text-classification")
-classifier("I love this product!")
-# [{'label': 'POSITIVE', 'score': 0.9998}]
+result = classifier("I love this product!")  # Returns label and score
 ```

-### Zero-Shot Classification
+**token-classification**: Label individual tokens (NER, POS tagging)
 ```python
-classifier = pipeline("zero-shot-classification")
-classifier("This is about climate change", candidate_labels=["politics", "science", "sports"])
+ner = pipeline("token-classification", model="dslim/bert-base-NER")
+entities = ner("Hugging Face is based in New York City")
 ```

-### Token Classification (NER)
-```python
-ner = pipeline("token-classification")
-ner("My name is Sarah and I work at Microsoft in Seattle")
-```
-
-### Question Answering
+**question-answering**: Extract answers from context
 ```python
 qa = pipeline("question-answering")
-qa(question="What is the capital?", context="The capital of France is Paris.")
+result = qa(question="What is the capital?", context="Paris is the capital of France.")
 ```

-### Text Generation
+**fill-mask**: Predict masked tokens
 ```python
-generator = pipeline("text-generation")
-generator("Once upon a time", max_length=50)
+unmasker = pipeline("fill-mask", model="bert-base-uncased")
+result = unmasker("Paris is the [MASK] of France")
 ```

-### Text2Text Generation
+**summarization**: Summarize long texts
 ```python
-generator = pipeline("text2text-generation", model="t5-base")
-generator("translate English to French: Hello")
+summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
+summary = summarizer("Long article text...", max_length=130, min_length=30)
 ```

-### Summarization
+**translation**: Translate between languages
 ```python
-summarizer = pipeline("summarization")
-summarizer("Long article text here...", max_length=130, min_length=30)
+translator = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
+result = translator("Hello, how are you?")
 ```

-### Translation
+**zero-shot-classification**: Classify without training data
 ```python
-translator = pipeline("translation_en_to_fr")
-translator("Hello, how are you?")
+classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
+result = classifier(
+    "This is a course about Python programming",
+    candidate_labels=["education", "politics", "business"]
+)
 ```

-### Fill Mask
+**sentiment-analysis**: Alias for text-classification focused on sentiment
 ```python
-unmasker = pipeline("fill-mask")
-unmasker("Paris is the [MASK] of France.")
+sentiment = pipeline("sentiment-analysis")
+result = sentiment("This product exceeded my expectations!")
 ```

-### Feature Extraction
+### Computer Vision
+
+**image-classification**: Classify images
 ```python
-extractor = pipeline("feature-extraction")
-embeddings = extractor("This is a sentence")
+classifier = pipeline("image-classification", model="google/vit-base-patch16-224")
+result = classifier("path/to/image.jpg")
+# Or use PIL Image or URL
+from PIL import Image
+result = classifier(Image.open("image.jpg"))
 ```

-### Document Question Answering
+**object-detection**: Detect objects in images
 ```python
-doc_qa = pipeline("document-question-answering")
-doc_qa(image="document.png", question="What is the invoice number?")
+detector = pipeline("object-detection", model="facebook/detr-resnet-50")
+results = detector("image.jpg")  # Returns bounding boxes and labels
 ```

-### Table Question Answering
+**image-segmentation**: Segment images
 ```python
-table_qa = pipeline("table-question-answering")
-table_qa(table=data, query="How many employees?")
+segmenter = pipeline("image-segmentation", model="facebook/detr-resnet-50-panoptic")
+segments = segmenter("image.jpg")
 ```

-## Computer Vision Pipelines
-
-### Image Classification
+**depth-estimation**: Estimate depth from images
 ```python
-classifier = pipeline("image-classification")
-classifier("cat.jpg")
+depth = pipeline("depth-estimation", model="Intel/dpt-large")
+result = depth("image.jpg")
 ```

-### Zero-Shot Image Classification
+**zero-shot-image-classification**: Classify images without training
 ```python
-classifier = pipeline("zero-shot-image-classification")
-classifier("cat.jpg", candidate_labels=["cat", "dog", "bird"])
+classifier = pipeline("zero-shot-image-classification", model="openai/clip-vit-base-patch32")
+result = classifier("image.jpg", candidate_labels=["cat", "dog", "bird"])
 ```

-### Object Detection
+### Audio
+
+**automatic-speech-recognition**: Transcribe speech
 ```python
-detector = pipeline("object-detection")
-detector("street.jpg")
+asr = pipeline("automatic-speech-recognition", model="openai/whisper-base")
+text = asr("audio.mp3")
 ```

-### Image Segmentation
+**audio-classification**: Classify audio
 ```python
-segmenter = pipeline("image-segmentation")
-segmenter("image.jpg")
+classifier = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
+result = classifier("audio.wav")
 ```

-### Image-to-Image
+**text-to-speech**: Generate speech from text (with specific models)
 ```python
-img2img = pipeline("image-to-image", model="lllyasviel/sd-controlnet-canny")
-img2img("input.jpg")
+tts = pipeline("text-to-speech", model="microsoft/speecht5_tts")
+audio = tts("Hello, this is a test")
 ```

-### Depth Estimation
+### Multimodal
+
+**visual-question-answering**: Answer questions about images
 ```python
-depth = pipeline("depth-estimation")
-depth("image.jpg")
+vqa = pipeline("visual-question-answering", model="dandelin/vilt-b32-finetuned-vqa")
+result = vqa(image="image.jpg", question="What color is the car?")
 ```

-### Video Classification
+**document-question-answering**: Answer questions about documents
 ```python
-classifier = pipeline("video-classification")
-classifier("video.mp4")
+doc_qa = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
+result = doc_qa(image="document.png", question="What is the invoice number?")
 ```

-### Keypoint Matching
+**image-to-text**: Generate captions for images
 ```python
-matcher = pipeline("keypoint-matching")
-matcher(image1="img1.jpg", image2="img2.jpg")
+captioner = pipeline("image-to-text", model="Salesforce/blip-image-captioning-base")
+caption = captioner("image.jpg")
 ```

-## Audio Pipelines
-
-### Automatic Speech Recognition
-```python
-asr = pipeline("automatic-speech-recognition")
-asr("audio.wav")
-```
-
-### Audio Classification
-```python
-classifier = pipeline("audio-classification")
-classifier("audio.wav")
-```
-
-### Zero-Shot Audio Classification
-```python
-classifier = pipeline("zero-shot-audio-classification")
-classifier("audio.wav", candidate_labels=["speech", "music", "noise"])
-```
-
-### Text-to-Audio/Text-to-Speech
-```python
-synthesizer = pipeline("text-to-audio")
-audio = synthesizer("Hello, how are you today?")
-```
-
-## Multimodal Pipelines
-
-### Image-to-Text (Image Captioning)
-```python
-captioner = pipeline("image-to-text")
-captioner("image.jpg")
-```
-
-### Visual Question Answering
-```python
-vqa = pipeline("visual-question-answering")
-vqa(image="image.jpg", question="What color is the car?")
-```
-
-### Image-Text-to-Text (VLMs)
-```python
-vlm = pipeline("image-text-to-text")
-vlm(images="image.jpg", text="Describe this image in detail")
-```
-
-### Zero-Shot Object Detection
-```python
-detector = pipeline("zero-shot-object-detection")
-detector("image.jpg", candidate_labels=["car", "person", "tree"])
-```
-
-## Pipeline Configuration
+## Pipeline Parameters

 ### Common Parameters

- `model`: Specify model identifier or path
- `device`: Set device (0 for GPU, -1 for CPU, or "cuda:0")
- `batch_size`: Process multiple inputs at once
- `torch_dtype`: Set precision (torch.float16, torch.bfloat16)
-
+**model**: Model identifier or path
 ```python
-# GPU with half precision
-pipe = pipeline("text-generation", model="gpt2", device=0, torch_dtype=torch.float16)
-
-# Batch processing
-pipe(["text 1", "text 2", "text 3"], batch_size=8)
+pipe = pipeline("task", model="model-id")
 ```

-### Task-Specific Parameters
+**device**: GPU device index (-1 for CPU, 0+ for GPU)
+```python
+pipe = pipeline("task", device=0)  # Use first GPU
+```

-Each pipeline accepts task-specific parameters in the call:
+**device_map**: Automatic device allocation for large models
+```python
+pipe = pipeline("task", model="large-model", device_map="auto")
+```
+
+**dtype**: Model precision (reduces memory)
+```python
+import torch
+pipe = pipeline("task", torch_dtype=torch.float16)
+```
+
+**batch_size**: Process multiple inputs at once
+```python
+pipe = pipeline("task", batch_size=8)
+results = pipe(["text1", "text2", "text3"])
+```
+
+**framework**: Choose PyTorch or TensorFlow
+```python
+pipe = pipeline("task", framework="pt")  # or "tf"
+```
+
+## Batch Processing
+
+Process multiple inputs efficiently:

 ```python
-# Text generation
-generator("prompt", max_length=100, temperature=0.7, top_p=0.9, num_return_sequences=3)
+classifier = pipeline("text-classification")
+texts = ["Great product!", "Terrible experience", "Just okay"]
+results = classifier(texts)
+```

-# Summarization
-summarizer("text", max_length=130, min_length=30, do_sample=False)
+For large datasets, use generators or KeyDataset:

-# Translation
-translator("text", max_length=512, num_beams=4)
+```python
+from transformers.pipelines.pt_utils import KeyDataset
+import datasets
+
+dataset = datasets.load_dataset("dataset-name", split="test")
+pipe = pipeline("task", device=0)
+
+for output in pipe(KeyDataset(dataset, "text")):
+    print(output)
+```
+
+## Performance Optimization
+
+### GPU Acceleration
+
+Always specify device for GPU usage:
+```python
+pipe = pipeline("task", device=0)
+```
+
+### Mixed Precision
+
+Use float16 for 2x speedup on supported GPUs:
+```python
+import torch
+pipe = pipeline("task", torch_dtype=torch.float16, device=0)
+```
+
+### Batching Guidelines
+
+- **CPU**: Usually skip batching
+- **GPU with variable lengths**: May reduce efficiency
+- **GPU with similar lengths**: Significant speedup
+- **Real-time applications**: Skip batching (increases latency)
+
+```python
+# Good for throughput
+pipe = pipeline("task", batch_size=32, device=0)
+results = pipe(list_of_texts)
+```
+
+### Streaming Output
+
+For text generation, stream tokens as they're generated:
+
+```python
+from transformers import TextStreamer
+
+generator = pipeline("text-generation", model="gpt2", streamer=TextStreamer())
+generator("The future of AI", max_length=100)
+```
+
+## Custom Pipeline Configuration
+
+Specify tokenizer and model separately:
+
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+
+tokenizer = AutoTokenizer.from_pretrained("model-id")
+model = AutoModelForSequenceClassification.from_pretrained("model-id")
+pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
+```
+
+Use custom pipeline classes:
+
+```python
+from transformers import TextClassificationPipeline
+
+class CustomPipeline(TextClassificationPipeline):
+    def postprocess(self, model_outputs, **kwargs):
+        # Custom post-processing
+        return super().postprocess(model_outputs, **kwargs)
+
+pipe = pipeline("text-classification", model="model-id", pipeline_class=CustomPipeline)
+```
+
+## Input Formats
+
+Pipelines accept various input types:
+
+**Text tasks**: Strings or lists of strings
+```python
+pipe("single text")
+pipe(["text1", "text2"])
+```
+
+**Image tasks**: URLs, file paths, PIL Images, or numpy arrays
+```python
+pipe("https://example.com/image.jpg")
+pipe("local/path/image.png")
+pipe(PIL.Image.open("image.jpg"))
+pipe(numpy_array)
+```
+
+**Audio tasks**: File paths, numpy arrays, or raw waveforms
+```python
+pipe("audio.mp3")
+pipe(audio_array)
+```
+
+## Error Handling
+
+Handle common issues:
+
+```python
+try:
+    result = pipe(input_data)
+except Exception as e:
+    if "CUDA out of memory" in str(e):
+        # Reduce batch size or use CPU
+        pipe = pipeline("task", device=-1)
+    elif "does not appear to have a file named" in str(e):
+        # Model not found
+        print("Check model identifier")
+    else:
+        raise
 ```

 ## Best Practices

-1. **Reuse pipelines**: Create once, use multiple times for efficiency
-2. **Batch processing**: Use batches for multiple inputs to maximize throughput
-3. **GPU acceleration**: Set `device=0` for GPU when available
-4. **Model selection**: Choose task-specific models for best results
-5. **Memory management**: Use `torch_dtype=torch.float16` for large models
+1. **Use pipelines for prototyping**: Fast iteration without boilerplate
+2. **Specify models explicitly**: Default models may change
+3. **Enable GPU when available**: Significant speedup
+4. **Use batching for throughput**: When processing many inputs
+5. **Consider memory usage**: Use float16 or smaller models for large batches
+6. **Cache models locally**: Avoid repeated downloads