Fine-Tuning with the Trainer API

Fine-tuning adapts a pre-trained model to your specific task and data. The Hugging Face Trainer class provides a high-level API that handles the entire training loop, including gradient accumulation, mixed precision, distributed training, and evaluation.

Why Fine-Tune?

Pre-trained models are generalists. Fine-tuning makes them specialists:

Approach	Pros	Cons
Zero-shot	No training needed	Lower accuracy for specific tasks
Few-shot (prompting)	No training, quick	Limited by context window
Fine-tuning	Best accuracy for your domain	Requires labeled data and compute

Setting Up

pip install transformers datasets evaluate accelerate

The Fine-Tuning Pipeline

1. Load a pre-trained model and tokenizer 2. Prepare your dataset (load, tokenize, format) 3. Define training arguments (learning rate, epochs, batch size) 4. Define evaluation metrics 5. Create a Trainer and call trainer.train() 6. Evaluate and push to Hub

Dataset Preparation with datasets Library

The datasets library provides efficient, memory-mapped data loading:

from datasets import load_dataset
Load a dataset from the Hub
dataset = load_dataset("imdb")
print(dataset)
DatasetDict({
    train: Dataset({features: ['text', 'label'], num_rows: 25000}),
    test: Dataset({features: ['text', 'label'], num_rows: 25000}),
    unsupervised: Dataset({features: ['text', 'label'], num_rows: 50000})
})
Inspect a sample
print(dataset["train"][0])
{'text': 'I rented I Am Curious...', 'label': 0}
Create a smaller subset for faster training
small_train = dataset["train"].shuffle(seed=42).select(range(2000))
small_test = dataset["test"].shuffle(seed=42).select(range(500))

Tokenizing the Dataset

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize_function(examples):
    """Tokenize a batch of examples."""
    return tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=256
    )
Apply tokenization to entire dataset (batched for speed)
tokenized_train = small_train.map(tokenize_function, batched=True)
tokenized_test = small_test.map(tokenize_function, batched=True)
Set format for PyTorch
tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "label"])
tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "label"])

Why batched=True?

Using batched=True in dataset.map() processes multiple examples at once, which is significantly faster than one-by-one processing. The tokenizer is optimized for batch operations, making this 10-100x faster for large datasets.

Training with Trainer

TrainingArguments

from transformers import TrainingArguments
training_args = TrainingArguments(
    # Output and logging
    output_dir="./results",
    logging_dir="./logs",
    logging_steps=50,
    # Training hyperparameters
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    learning_rate=2e-5,
    weight_decay=0.01,
    warmup_steps=100,
    # Evaluation
    evaluation_strategy="epoch",    # Evaluate after each epoch
    save_strategy="epoch",          # Save checkpoint each epoch
    load_best_model_at_end=True,    # Load best model when done
    metric_for_best_model="f1",     # Use F1 to determine best
    # Optimization
    fp16=True,                      # Mixed precision (if GPU supports it)
    gradient_accumulation_steps=2,  # Simulate larger batch size    # Hub integration
    push_to_hub=False,
    report_to="tensorboard",        # or "wandb"
)

Defining Metrics

import evaluate
import numpy as np
Load metrics
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")
def compute_metrics(eval_pred):
    """Compute metrics for evaluation."""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)    return {
        "accuracy": accuracy_metric.compute(
            predictions=predictions, references=labels
        )["accuracy"],
        "f1": f1_metric.compute(
            predictions=predictions, references=labels
        )["f1"],
        "precision": precision_metric.compute(
            predictions=predictions, references=labels
        )["precision"],
        "recall": recall_metric.compute(
            predictions=predictions, references=labels
        )["recall"],
    }

Creating and Running the Trainer

from transformers import (
    AutoModelForSequenceClassification,
    Trainer,
    DataCollatorWithPadding,
)
Load model with classification head
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)
Data collator handles dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Create Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
Train!
train_result = trainer.train()
print(f"Training loss: {train_result.training_loss:.4f}")
print(f"Training time: {train_result.metrics['train_runtime']:.1f}s")
Evaluate
eval_results = trainer.evaluate()
print(f"Eval accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Eval F1: {eval_results['eval_f1']:.4f}")

What Does Trainer Handle Automatically?

The Trainer manages: training loop, gradient computation and optimization, learning rate scheduling, mixed precision (FP16/BF16), gradient accumulation, distributed training across GPUs, checkpoint saving and loading, evaluation loops, logging to TensorBoard/W&B, and early stopping.

Training Checkpoints

Trainer automatically saves checkpoints during training:

results/
  checkpoint-500/
    config.json
    model.safetensors
    optimizer.pt
    scheduler.pt
    training_args.bin
    trainer_state.json
  checkpoint-1000/
    ...

Resuming Training

# Resume from a checkpoint
trainer.train(resume_from_checkpoint="./results/checkpoint-500")
Or auto-detect the latest checkpoint
trainer.train(resume_from_checkpoint=True)

Pushing to the Hub

# Push the fine-tuned model
trainer.push_to_hub(
    commit_message="Fine-tuned DistilBERT on IMDB",
    tags=["text-classification", "sentiment-analysis"],
)
Or save locally first, then push
trainer.save_model("./my-model")
tokenizer.save_pretrained("./my-model")

Logging and Monitoring

TensorBoard

tensorboard --logdir ./logs

Weights & Biases

# Set report_to="wandb" in TrainingArguments
Then run:
import wandb
wandb.init(project="my-fine-tuning")training_args = TrainingArguments(
    ...
    report_to="wandb",
)

Fine-Tuning with the Trainer API

Why Fine-Tune?

Pre-trained models are generalists. Fine-tuning makes them specialists:

Approach	Pros	Cons
Zero-shot	No training needed	Lower accuracy for specific tasks
Few-shot (prompting)	No training, quick	Limited by context window
Fine-tuning	Best accuracy for your domain	Requires labeled data and compute

Setting Up

pip install transformers datasets evaluate accelerate

The Fine-Tuning Pipeline

Dataset Preparation with datasets Library

The datasets library provides efficient, memory-mapped data loading:

from datasets import load_dataset
Load a dataset from the Hub
dataset = load_dataset("imdb")
print(dataset)
DatasetDict({
    train: Dataset({features: ['text', 'label'], num_rows: 25000}),
    test: Dataset({features: ['text', 'label'], num_rows: 25000}),
    unsupervised: Dataset({features: ['text', 'label'], num_rows: 50000})
})
Inspect a sample
print(dataset["train"][0])
{'text': 'I rented I Am Curious...', 'label': 0}
Create a smaller subset for faster training
small_train = dataset["train"].shuffle(seed=42).select(range(2000))
small_test = dataset["test"].shuffle(seed=42).select(range(500))

Tokenizing the Dataset

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
def tokenize_function(examples):
    """Tokenize a batch of examples."""
    return tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=256
    )
Apply tokenization to entire dataset (batched for speed)
tokenized_train = small_train.map(tokenize_function, batched=True)
tokenized_test = small_test.map(tokenize_function, batched=True)
Set format for PyTorch
tokenized_train.set_format("torch", columns=["input_ids", "attention_mask", "label"])
tokenized_test.set_format("torch", columns=["input_ids", "attention_mask", "label"])

Why batched=True?

Training with Trainer

TrainingArguments

from transformers import TrainingArguments
training_args = TrainingArguments(
    # Output and logging
    output_dir="./results",
    logging_dir="./logs",
    logging_steps=50,
    # Training hyperparameters
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    learning_rate=2e-5,
    weight_decay=0.01,
    warmup_steps=100,
    # Evaluation
    evaluation_strategy="epoch",    # Evaluate after each epoch
    save_strategy="epoch",          # Save checkpoint each epoch
    load_best_model_at_end=True,    # Load best model when done
    metric_for_best_model="f1",     # Use F1 to determine best
    # Optimization
    fp16=True,                      # Mixed precision (if GPU supports it)
    gradient_accumulation_steps=2,  # Simulate larger batch size    # Hub integration
    push_to_hub=False,
    report_to="tensorboard",        # or "wandb"
)

Defining Metrics

import evaluate
import numpy as np
Load metrics
accuracy_metric = evaluate.load("accuracy")
f1_metric = evaluate.load("f1")
precision_metric = evaluate.load("precision")
recall_metric = evaluate.load("recall")
def compute_metrics(eval_pred):
    """Compute metrics for evaluation."""
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)    return {
        "accuracy": accuracy_metric.compute(
            predictions=predictions, references=labels
        )["accuracy"],
        "f1": f1_metric.compute(
            predictions=predictions, references=labels
        )["f1"],
        "precision": precision_metric.compute(
            predictions=predictions, references=labels
        )["precision"],
        "recall": recall_metric.compute(
            predictions=predictions, references=labels
        )["recall"],
    }

Creating and Running the Trainer

from transformers import (
    AutoModelForSequenceClassification,
    Trainer,
    DataCollatorWithPadding,
)
Load model with classification head
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)
Data collator handles dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
Create Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)
Train!
train_result = trainer.train()
print(f"Training loss: {train_result.training_loss:.4f}")
print(f"Training time: {train_result.metrics['train_runtime']:.1f}s")
Evaluate
eval_results = trainer.evaluate()
print(f"Eval accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Eval F1: {eval_results['eval_f1']:.4f}")

What Does Trainer Handle Automatically?

Training Checkpoints

Trainer automatically saves checkpoints during training:

results/
  checkpoint-500/
    config.json
    model.safetensors
    optimizer.pt
    scheduler.pt
    training_args.bin
    trainer_state.json
  checkpoint-1000/
    ...

Resuming Training

# Resume from a checkpoint
trainer.train(resume_from_checkpoint="./results/checkpoint-500")
Or auto-detect the latest checkpoint
trainer.train(resume_from_checkpoint=True)

Pushing to the Hub

# Push the fine-tuned model
trainer.push_to_hub(
    commit_message="Fine-tuned DistilBERT on IMDB",
    tags=["text-classification", "sentiment-analysis"],
)
Or save locally first, then push
trainer.save_model("./my-model")
tokenizer.save_pretrained("./my-model")

Logging and Monitoring

TensorBoard

tensorboard --logdir ./logs

Weights & Biases

# Set report_to="wandb" in TrainingArguments
Then run:
import wandb
wandb.init(project="my-fine-tuning")training_args = TrainingArguments(
    ...
    report_to="wandb",
)

Fine-Tuning with Trainer API

Fine-Tuning with the Trainer API

Why Fine-Tune?

Setting Up

The Fine-Tuning Pipeline

Dataset Preparation with datasets Library

Load a dataset from the Hub

DatasetDict({

train: Dataset({features: ['text', 'label'], num_rows: 25000}),

test: Dataset({features: ['text', 'label'], num_rows: 25000}),

unsupervised: Dataset({features: ['text', 'label'], num_rows: 50000})

})

Inspect a sample

{'text': 'I rented I Am Curious...', 'label': 0}

Create a smaller subset for faster training

Tokenizing the Dataset

Apply tokenization to entire dataset (batched for speed)

Set format for PyTorch

Why batched=True?

Training with Trainer

TrainingArguments

Defining Metrics

Load metrics

Creating and Running the Trainer

Load model with classification head

Data collator handles dynamic padding

Create Trainer

Train!

Evaluate

What Does Trainer Handle Automatically?

Training Checkpoints

Resuming Training

Or auto-detect the latest checkpoint

Pushing to the Hub

Or save locally first, then push

Logging and Monitoring

TensorBoard

Weights & Biases

Then run:

Fine-Tuning with Trainer API

Fine-Tuning with the Trainer API

Why Fine-Tune?

Setting Up

The Fine-Tuning Pipeline

Dataset Preparation with datasets Library

Load a dataset from the Hub

DatasetDict({

train: Dataset({features: ['text', 'label'], num_rows: 25000}),

test: Dataset({features: ['text', 'label'], num_rows: 25000}),

unsupervised: Dataset({features: ['text', 'label'], num_rows: 50000})

})

Inspect a sample

{'text': 'I rented I Am Curious...', 'label': 0}

Create a smaller subset for faster training

Tokenizing the Dataset

Apply tokenization to entire dataset (batched for speed)

Set format for PyTorch

Why batched=True?

Training with Trainer

TrainingArguments

Defining Metrics

Load metrics

Creating and Running the Trainer

Load model with classification head

Data collator handles dynamic padding

Create Trainer

Train!

Evaluate

What Does Trainer Handle Automatically?

Training Checkpoints

Resuming Training

Or auto-detect the latest checkpoint

Pushing to the Hub

Or save locally first, then push

Logging and Monitoring

TensorBoard

Weights & Biases

Then run: