Transformers Library Quickstart
Hugging Face's transformers library is the most popular open-source library for working with pre-trained language models. It provides a unified API for thousands of models across NLP, computer vision, audio, and multimodal tasks.
Installation
pip install transformers torch datasets accelerate
The library supports PyTorch, TensorFlow, and JAX backends. We'll use PyTorch throughout this module.
The Transformers Philosophy
The Pipeline API
The pipeline() function is the simplest way to use pre-trained models. It handles tokenization, model inference, and post-processing in a single call.
Sentiment Analysis
from transformers import pipelineCreate a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")Single prediction
result = classifier("I love learning about AI!")
print(result)
[{'label': 'POSITIVE', 'score': 0.9998}]
Batch prediction
results = classifier([
"This movie was terrible.",
"The food was absolutely delicious!",
"I'm not sure how I feel about this."
])
for r in results:
print(f"{r['label']}: {r['score']:.4f}")
Text Summarization
summarizer = pipeline("summarization")article = """
Hugging Face has become the central hub for machine learning models.
Founded in 2016, the company initially built a chatbot app before
pivoting to become the GitHub of machine learning. Their Transformers
library supports over 200,000 models and is used by thousands of
organizations. The platform hosts models, datasets, and Spaces
for demo applications.
"""
summary = summarizer(article, max_length=50, min_length=20)
print(summary[0]['summary_text'])
Named Entity Recognition (NER)
ner = pipeline("ner", aggregation_strategy="simple")text = "Elon Musk founded SpaceX in Hawthorne, California."
entities = ner(text)
for entity in entities:
print(f"{entity['word']}: {entity['entity_group']} ({entity['score']:.3f})")
Elon Musk: PER (0.998)
SpaceX: ORG (0.995)
Hawthorne: LOC (0.993)
California: LOC (0.997)
Question Answering
qa = pipeline("question-answering")context = """
The transformer architecture was introduced in the 2017 paper
'Attention Is All You Need' by Vaswani et al. It replaced recurrent
layers with self-attention mechanisms, enabling massive parallelization
and leading to models like BERT and GPT.
"""
answer = qa(
question="Who introduced the transformer architecture?",
context=context
)
print(f"Answer: {answer['answer']} (score: {answer['score']:.3f})")
Answer: Vaswani et al (score: 0.892)
Zero-Shot Classification
zero_shot = pipeline("zero-shot-classification")result = zero_shot(
"I just got promoted to senior engineer!",
candidate_labels=["career", "health", "sports", "technology"]
)
print(f"Labels: {result['labels']}")
print(f"Scores: {[f'{s:.3f}' for s in result['scores']]}")
Labels: ['career', 'technology', 'sports', 'health']
Scores: ['0.891', '0.067', '0.024', '0.018']
Translation
translator = pipeline("translation_en_to_fr")
result = translator("Machine learning is transforming every industry.")
print(result[0]['translation_text'])
L'apprentissage automatique transforme chaque industrie.
Text Generation
generator = pipeline("text-generation", model="gpt2")output = generator(
"The future of artificial intelligence",
max_new_tokens=50,
num_return_sequences=1,
temperature=0.7
)
print(output[0]['generated_text'])
Specifying Models
AutoModel and AutoTokenizer
When you need more control than the pipeline provides, use AutoModel and AutoTokenizer directly. This is the standard approach for production code.
The Three Auto Classes
from transformers import AutoTokenizer, AutoModel, AutoConfigmodel_name = "bert-base-uncased"
Load just the config (no weights downloaded)
config = AutoConfig.from_pretrained(model_name)
print(f"Hidden size: {config.hidden_size}") # 768
print(f"Num layers: {config.num_hidden_layers}") # 12
print(f"Num heads: {config.num_attention_heads}") # 12Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)Load the model
model = AutoModel.from_pretrained(model_name)
Manual Inference
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torchmodel_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
Tokenize input
text = "I absolutely love this product!"
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)print(f"Input IDs shape: {inputs['input_ids'].shape}")
print(f"Attention mask shape: {inputs['attention_mask'].shape}")
print(f"Tokens: {tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])}")
Run inference (no gradient computation needed)
model.eval()
with torch.no_grad():
outputs = model(**inputs)Process logits
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()labels = model.config.id2label
print(f"Prediction: {labels[predicted_class]}")
print(f"Confidence: {probabilities[0][predicted_class]:.4f}")
AutoModel Variants
Batch Processing
For efficiency, always batch your inputs when processing multiple texts:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torchmodel_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
texts = [
"This is fantastic!",
"Terrible experience.",
"Pretty average, nothing special.",
"Best purchase I've ever made!",
"Would not recommend to anyone."
]
Tokenize as a batch - padding ensures uniform length
inputs = tokenizer(
texts,
return_tensors="pt",
padding=True, # Pad to longest in batch
truncation=True, # Truncate if over max length
max_length=128
)with torch.no_grad():
outputs = model(inputs)
probs = torch.softmax(outputs.logits, dim=-1)
predictions = torch.argmax(probs, dim=-1)
for text, pred, prob in zip(texts, predictions, probs):
label = model.config.id2label[pred.item()]
confidence = prob[pred.item()].item()
print(f"[{label} {confidence:.2f}] {text}")
Device Management
Move models and inputs to GPU for faster inference:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassificationdevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name).to(device)
Inputs must also be on the same device
text = "Great movie!"
inputs = tokenizer(text, return_tensors="pt").to(device)with torch.no_grad():
outputs = model(inputs)
For pipelines, use the device argument
classifier = pipeline(
"sentiment-analysis",
device=0 # GPU index, or -1 for CPU
)