Fine-Tuning with the Trainer API
Fine-tuning adapts a pre-trained model to your specific task and data. The Hugging Face Trainer class provides a high-level API that handles the entire training loop, including gradient accumulation, mixed precision, distributed training, and evaluation.
Why Fine-Tune?
Pre-trained models are generalists. Fine-tuning makes them specialists:
| Approach | Pros | Cons |
|---|---|---|
| Zero-shot | No training needed | Lower accuracy for specific tasks |
| Few-shot (prompting) | No training, quick | Limited by context window |
| Fine-tuning | Best accuracy for your domain | Requires labeled data and compute |
Setting Up
pip install transformers datasets evaluate accelerate
The Fine-Tuning Pipeline
Dataset Preparation with datasets Library
The datasets library provides efficient, memory-mapped data loading:
from datasets import load_datasetLoad a dataset from the Hub
dataset = load_dataset("imdb")
print(dataset)
DatasetDict({
train: Dataset({features: ['text', 'label'], num_rows: 25000}),
test: Dataset({features: ['text', 'label'], num_rows: 25000}),
unsupervised: Dataset({features: ['text', 'label'], num_rows: 50000})
})
Inspect a sample
print(dataset["train"][0])
{'text': 'I rented I Am Curious...', 'label': 0}
Create a smaller subset for faster training
small_train = dataset["train"].shuffle(seed=42).select(range(2000))
small_test = dataset["test"].shuffle(seed=42).select(range(500))
Tokenizing the Dataset
from transformers import AutoTokenizertokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize_function(examples):
"""Tokenize a batch of examples."""
return tokenizer(
examples["text"],
padding="max_length",
truncation=True,
max_length=256
)
Apply tokenization to entire dataset (batched for speed)
tokenized_train = small_train.map(tokenize_function, batched=True)
tokenized_test = small_test.map(tokenize_function, batched=True)Set format for PyTorch
tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "label"])
tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "label"])
Why batched=True?
Training with Trainer
TrainingArguments
from transformers import TrainingArgumentstraining_args = TrainingArguments(
# Output and logging
output_dir="./results",
logging_dir="./logs",
logging_steps=50,
# Training hyperparameters
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=32,
learning_rate=2e-5,
weight_decay=0.01,
warmup_steps=100,
# Evaluation
evaluation_strategy="epoch", # Evaluate after each epoch
save_strategy="epoch", # Save checkpoint each epoch
load_best_model_at_end=True, # Load best model when done
metric_for_best_model="f1", # Use F1 to determine best
# Optimization
fp16=True, # Mixed precision (if GPU supports it)
gradient_accumulation_steps=2, # Simulate larger batch size
# Hub integration
push_to_hub=False,
report_to="tensorboard", # or "wandb"
)
Defining Metrics
import evaluate
import numpy as npLoad metrics
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")def compute_metrics(eval_pred):
"""Compute metrics for evaluation."""
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
return {
"accuracy": accuracy_metric.compute(
predictions=predictions, references=labels
)["accuracy"],
"f1": f1_metric.compute(
predictions=predictions, references=labels
)["f1"],
"precision": precision_metric.compute(
predictions=predictions, references=labels
)["precision"],
"recall": recall_metric.compute(
predictions=predictions, references=labels
)["recall"],
}
Creating and Running the Trainer
from transformers import (
AutoModelForSequenceClassification,
Trainer,
DataCollatorWithPadding,
)Load model with classification head
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=2,
id2label={0: "NEGATIVE", 1: "POSITIVE"},
label2id={"NEGATIVE": 0, "POSITIVE": 1},
)Data collator handles dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)Create Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_train,
eval_dataset=tokenized_test,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)Train!
train_result = trainer.train()
print(f"Training loss: {train_result.training_loss:.4f}")
print(f"Training time: {train_result.metrics['train_runtime']:.1f}s")Evaluate
eval_results = trainer.evaluate()
print(f"Eval accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Eval F1: {eval_results['eval_f1']:.4f}")
What Does Trainer Handle Automatically?
Training Checkpoints
Trainer automatically saves checkpoints during training:
results/
checkpoint-500/
config.json
model.safetensors
optimizer.pt
scheduler.pt
training_args.bin
trainer_state.json
checkpoint-1000/
...
Resuming Training
# Resume from a checkpoint
trainer.train(resume_from_checkpoint="./results/checkpoint-500")Or auto-detect the latest checkpoint
trainer.train(resume_from_checkpoint=True)
Pushing to the Hub
# Push the fine-tuned model
trainer.push_to_hub(
commit_message="Fine-tuned DistilBERT on IMDB",
tags=["text-classification", "sentiment-analysis"],
)Or save locally first, then push
trainer.save_model("./my-model")
tokenizer.save_pretrained("./my-model")
Logging and Monitoring
TensorBoard
tensorboard --logdir ./logs
Weights & Biases
# Set report_to="wandb" in TrainingArguments
Then run:
import wandb
wandb.init(project="my-fine-tuning")training_args = TrainingArguments(
...
report_to="wandb",
)