Skip to main content

Fine-Tuning with Trainer API

Trainer class, dataset preparation, training loops, evaluation, and logging

~55 min
Listen to this lesson

Fine-Tuning with the Trainer API

Fine-tuning adapts a pre-trained model to your specific task and data. The Hugging Face Trainer class provides a high-level API that handles the entire training loop, including gradient accumulation, mixed precision, distributed training, and evaluation.

Why Fine-Tune?

Pre-trained models are generalists. Fine-tuning makes them specialists:

ApproachProsCons
Zero-shotNo training neededLower accuracy for specific tasks
Few-shot (prompting)No training, quickLimited by context window
Fine-tuningBest accuracy for your domainRequires labeled data and compute

Setting Up

pip install transformers datasets evaluate accelerate

The Fine-Tuning Pipeline

1. Load a pre-trained model and tokenizer 2. Prepare your dataset (load, tokenize, format) 3. Define training arguments (learning rate, epochs, batch size) 4. Define evaluation metrics 5. Create a Trainer and call trainer.train() 6. Evaluate and push to Hub

Dataset Preparation with datasets Library

The datasets library provides efficient, memory-mapped data loading:

from datasets import load_dataset

Load a dataset from the Hub

dataset = load_dataset("imdb") print(dataset)

DatasetDict({

train: Dataset({features: ['text', 'label'], num_rows: 25000}),

test: Dataset({features: ['text', 'label'], num_rows: 25000}),

unsupervised: Dataset({features: ['text', 'label'], num_rows: 50000})

})

Inspect a sample

print(dataset["train"][0])

{'text': 'I rented I Am Curious...', 'label': 0}

Create a smaller subset for faster training

small_train = dataset["train"].shuffle(seed=42).select(range(2000)) small_test = dataset["test"].shuffle(seed=42).select(range(500))

Tokenizing the Dataset

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def tokenize_function(examples): """Tokenize a batch of examples.""" return tokenizer( examples["text"], padding="max_length", truncation=True, max_length=256 )

Apply tokenization to entire dataset (batched for speed)

tokenized_train = small_train.map(tokenize_function, batched=True) tokenized_test = small_test.map(tokenize_function, batched=True)

Set format for PyTorch

tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "label"]) tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "label"])

Why batched=True?

Using batched=True in dataset.map() processes multiple examples at once, which is significantly faster than one-by-one processing. The tokenizer is optimized for batch operations, making this 10-100x faster for large datasets.

Training with Trainer

TrainingArguments

from transformers import TrainingArguments

training_args = TrainingArguments( # Output and logging output_dir="./results", logging_dir="./logs", logging_steps=50,

# Training hyperparameters num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=32, learning_rate=2e-5, weight_decay=0.01, warmup_steps=100,

# Evaluation evaluation_strategy="epoch", # Evaluate after each epoch save_strategy="epoch", # Save checkpoint each epoch load_best_model_at_end=True, # Load best model when done metric_for_best_model="f1", # Use F1 to determine best

# Optimization fp16=True, # Mixed precision (if GPU supports it) gradient_accumulation_steps=2, # Simulate larger batch size

# Hub integration push_to_hub=False, report_to="tensorboard", # or "wandb" )

Defining Metrics

import evaluate
import numpy as np

Load metrics

accuracy_metric = evaluate.load("accuracy") f1_metric = evaluate.load("f1") precision_metric = evaluate.load("precision") recall_metric = evaluate.load("recall")

def compute_metrics(eval_pred): """Compute metrics for evaluation.""" logits, labels = eval_pred predictions = np.argmax(logits, axis=-1)

return { "accuracy": accuracy_metric.compute( predictions=predictions, references=labels )["accuracy"], "f1": f1_metric.compute( predictions=predictions, references=labels )["f1"], "precision": precision_metric.compute( predictions=predictions, references=labels )["precision"], "recall": recall_metric.compute( predictions=predictions, references=labels )["recall"], }

Creating and Running the Trainer

from transformers import (
    AutoModelForSequenceClassification,
    Trainer,
    DataCollatorWithPadding,
)

Load model with classification head

model = AutoModelForSequenceClassification.from_pretrained( "distilbert-base-uncased", num_labels=2, id2label={0: "NEGATIVE", 1: "POSITIVE"}, label2id={"NEGATIVE": 0, "POSITIVE": 1}, )

Data collator handles dynamic padding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

Create Trainer

trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_train, eval_dataset=tokenized_test, tokenizer=tokenizer, data_collator=data_collator, compute_metrics=compute_metrics, )

Train!

train_result = trainer.train() print(f"Training loss: {train_result.training_loss:.4f}") print(f"Training time: {train_result.metrics['train_runtime']:.1f}s")

Evaluate

eval_results = trainer.evaluate() print(f"Eval accuracy: {eval_results['eval_accuracy']:.4f}") print(f"Eval F1: {eval_results['eval_f1']:.4f}")

What Does Trainer Handle Automatically?

The Trainer manages: training loop, gradient computation and optimization, learning rate scheduling, mixed precision (FP16/BF16), gradient accumulation, distributed training across GPUs, checkpoint saving and loading, evaluation loops, logging to TensorBoard/W&B, and early stopping.

Training Checkpoints

Trainer automatically saves checkpoints during training:

results/
  checkpoint-500/
    config.json
    model.safetensors
    optimizer.pt
    scheduler.pt
    training_args.bin
    trainer_state.json
  checkpoint-1000/
    ...

Resuming Training

# Resume from a checkpoint
trainer.train(resume_from_checkpoint="./results/checkpoint-500")

Or auto-detect the latest checkpoint

trainer.train(resume_from_checkpoint=True)

Pushing to the Hub

# Push the fine-tuned model
trainer.push_to_hub(
    commit_message="Fine-tuned DistilBERT on IMDB",
    tags=["text-classification", "sentiment-analysis"],
)

Or save locally first, then push

trainer.save_model("./my-model") tokenizer.save_pretrained("./my-model")

Logging and Monitoring

TensorBoard

tensorboard --logdir ./logs

Weights & Biases

# Set report_to="wandb" in TrainingArguments

Then run:

import wandb wandb.init(project="my-fine-tuning")

training_args = TrainingArguments( ... report_to="wandb", )