Text Generation & Summarization
So far we've focused on understanding text — classification, entity extraction, parsing. Now we tackle the other side of NLP: generating text. This includes:
All of these tasks share a common framework: given an input sequence, produce an output sequence. This is the sequence-to-sequence (seq2seq) paradigm.
Sequence-to-Sequence Architecture
The classic seq2seq model has two components:
1. Encoder — reads the input sequence and compresses it into a fixed-size context vector 2. Decoder — generates the output sequence one token at a time, conditioned on the context
Think of it like a human translator: you read the entire French sentence (encoder), form an understanding in your mind (context vector), then produce the English translation word by word (decoder).
1import tensorflow as tf
2from tensorflow.keras import layers, models
3
4# --- Simple Seq2Seq with LSTM ---
5# Encoder
6encoder_inputs = layers.Input(shape=(None,), name="encoder_input")
7encoder_embedding = layers.Embedding(input_dim=10000, output_dim=256)(encoder_inputs)
8encoder_lstm = layers.LSTM(256, return_state=True)
9encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)
10# state_h, state_c = the "context" passed to the decoder
11
12# Decoder
13decoder_inputs = layers.Input(shape=(None,), name="decoder_input")
14decoder_embedding = layers.Embedding(input_dim=10000, output_dim=256)(decoder_inputs)
15decoder_lstm = layers.LSTM(256, return_sequences=True, return_state=True)
16# Initialize decoder with encoder's final hidden state
17decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=[state_h, state_c])
18decoder_dense = layers.Dense(10000, activation="softmax")
19output = decoder_dense(decoder_outputs)
20
21model = models.Model([encoder_inputs, decoder_inputs], output)
22model.compile(optimizer="adam", loss="sparse_categorical_crossentropy")
23model.summary()
24
25# The encoder compresses input into (state_h, state_c)
26# The decoder uses that state to generate output token by tokenThe Bottleneck Problem
Attention in Seq2Seq
Attention in the seq2seq context works as follows:
1. The encoder produces a hidden state for every input token (not just the final one) 2. At each decoder step, compute attention scores between the current decoder state and all encoder states 3. Use these scores to create a weighted sum (context vector) of encoder states 4. Concatenate this context with the decoder state to make the prediction
This means the decoder can "look back" at the input, focusing on different parts at each step. When translating "Le chat noir" to "The black cat", the decoder focuses on "chat" when generating "cat" and on "noir" when generating "black".
Decoding Strategies: Greedy vs. Beam Search
Once we have a model that produces probability distributions over the vocabulary, how do we actually generate text?
Greedy Decoding
At each step, pick the single most probable token. Fast but often produces suboptimal sequences.Step 1: P("The") = 0.6, P("A") = 0.3 → pick "The"
Step 2: P("cat") = 0.5, P("dog") = 0.4 → pick "cat"
Result: "The cat"
Beam Search
Keep track of the top-k (beam width) most probable sequences at each step. Explores more possibilities.Beam width = 2:
Step 1: Keep ["The" (0.6), "A" (0.3)]
Step 2: Expand both:
"The cat" (0.6 × 0.5 = 0.30)
"The dog" (0.6 × 0.4 = 0.24)
"A small" (0.3 × 0.7 = 0.21)
"A big" (0.3 × 0.2 = 0.06)
Keep top 2: ["The cat" (0.30), "The dog" (0.24)]
Beam search finds better overall sequences because a locally suboptimal choice (like "A" instead of "The") might lead to a globally better sequence.
1from transformers import pipeline, set_seed
2
3# --- Text generation with different strategies ---
4generator = pipeline("text-generation", model="gpt2")
5set_seed(42)
6
7prompt = "Artificial intelligence will"
8
9# Greedy decoding
10greedy = generator(prompt, max_length=30, do_sample=False)
11print("Greedy:", greedy[0]["generated_text"])
12
13# Beam search (num_beams > 1)
14beam = generator(prompt, max_length=30, num_beams=5, do_sample=False)
15print("Beam: ", beam[0]["generated_text"])
16
17# Sampling with temperature (more creative)
18sampled = generator(prompt, max_length=30, do_sample=True, temperature=0.7, top_p=0.9)
19print("Sample:", sampled[0]["generated_text"])
20
21# Lower temperature = more focused/deterministic
22# Higher temperature = more random/creative
23# top_p (nucleus sampling) = only sample from tokens whose cumulative probability ≤ pSummarization: Extractive vs. Abstractive
There are two fundamentally different approaches to summarization:
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Extractive | Select the most important existing sentences | Always grammatically correct, faithful to source | Can be choppy, limited to source vocabulary |
| Abstractive | Generate new text that captures the key ideas | More natural, can rephrase and compress | May hallucinate or distort facts |
1from transformers import pipeline
2import numpy as np
3
4# --- Extractive Summarization (simple TF-IDF approach) ---
5from sklearn.feature_extraction.text import TfidfVectorizer
6from sklearn.metrics.pairwise import cosine_similarity
7
8def extractive_summarize(text, num_sentences=3):
9 """Select the most representative sentences using TF-IDF."""
10 sentences = text.split(". ")
11 sentences = [s.strip() for s in sentences if len(s.strip()) > 10]
12
13 if len(sentences) <= num_sentences:
14 return ". ".join(sentences)
15
16 # Compute TF-IDF for each sentence
17 vectorizer = TfidfVectorizer()
18 tfidf_matrix = vectorizer.fit_transform(sentences)
19
20 # Score each sentence by similarity to the overall document
21 doc_vector = tfidf_matrix.mean(axis=0)
22 scores = cosine_similarity(tfidf_matrix, doc_vector).flatten()
23
24 # Select top sentences (maintain original order)
25 top_indices = sorted(np.argsort(scores)[-num_sentences:])
26 summary = ". ".join([sentences[i] for i in top_indices])
27 return summary
28
29# --- Abstractive Summarization (Hugging Face) ---
30abstractive = pipeline("summarization", model="facebook/bart-large-cnn")
31
32article = """
33The global semiconductor shortage that began in 2020 has had far-reaching
34consequences across multiple industries. Automakers were among the hardest
35hit, with major manufacturers like Toyota, Ford, and Volkswagen forced to
36cut production by millions of vehicles. The shortage was triggered by a
37perfect storm of factors: pandemic-driven factory shutdowns, a surge in
38demand for consumer electronics as people worked from home, and the
39inherently long lead times required to build new chip fabrication plants.
40Governments responded with massive investment programs. The US passed the
41CHIPS Act, allocating $52 billion for domestic semiconductor manufacturing.
42The European Union announced a similar European Chips Act worth 43 billion
43euros. These investments aim to reduce dependence on Asian manufacturers,
44particularly Taiwan's TSMC, which produces over 50% of the world's advanced
45chips. Industry analysts expect the shortage to fully resolve by 2025, but
46the geopolitical implications of semiconductor supply chain concentration
47will persist for decades.
48"""
49
50# Extractive
51print("=== Extractive Summary ===")
52print(extractive_summarize(article, num_sentences=2))
53
54# Abstractive
55print("\n=== Abstractive Summary ===")
56result = abstractive(article, max_length=80, min_length=30)
57print(result[0]["summary_text"])Evaluation Metrics for Generated Text
How do we measure the quality of generated text? Several metrics exist, each with different strengths:
BLEU (Bilingual Evaluation Understudy)
ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
BERTScore
| Metric | Focus | Best For | Limitation |
|---|---|---|---|
| BLEU | Precision (n-gram) | Translation | Penalizes valid paraphrases |
| ROUGE | Recall (n-gram) | Summarization | Doesn't capture meaning |
| BERTScore | Semantic similarity | Any generation | Computationally expensive |
BLEU for Translation, ROUGE for Summarization
1# --- BLEU Score ---
2from nltk.translate.bleu_score import sentence_bleu, corpus_bleu
3
4reference = [["the", "cat", "sat", "on", "the", "mat"]]
5candidate = ["the", "cat", "is", "on", "the", "mat"]
6
7# BLEU with different n-gram weights
8bleu_1 = sentence_bleu(reference, candidate, weights=(1, 0, 0, 0)) # Unigrams only
9bleu_2 = sentence_bleu(reference, candidate, weights=(0.5, 0.5, 0, 0)) # Up to bigrams
10bleu_4 = sentence_bleu(reference, candidate) # Default: 1-4 grams equally weighted
11
12print(f"BLEU-1: {bleu_1:.4f}")
13print(f"BLEU-2: {bleu_2:.4f}")
14print(f"BLEU-4: {bleu_4:.4f}")
15
16# --- ROUGE Score ---
17from rouge_score import rouge_scorer
18
19scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)
20
21reference_text = "The cat sat on the mat near the window"
22generated_text = "A cat was sitting on the mat"
23
24scores = scorer.score(reference_text, generated_text)
25for metric, score in scores.items():
26 print(f"{metric}: Precision={score.precision:.4f}, Recall={score.recall:.4f}, F1={score.fmeasure:.4f}")
27
28# --- BERTScore ---
29# pip install bert-score
30from bert_score import score as bert_score
31
32references = ["The cat sat on the mat"]
33candidates = ["A feline rested on the rug"] # Paraphrase!
34
35P, R, F1 = bert_score(candidates, references, lang="en")
36print(f"\nBERTScore — P: {P.mean():.4f}, R: {R.mean():.4f}, F1: {F1.mean():.4f}")
37# BERTScore will give high similarity despite different words (captures semantics)Translation Pipelines
Modern translation is straightforward with pre-trained models:
1from transformers import pipeline
2
3# Translation with Hugging Face
4translator_en_fr = pipeline("translation_en_to_fr", model="Helsinki-NLP/opus-mt-en-fr")
5translator_en_de = pipeline("translation_en_to_de", model="Helsinki-NLP/opus-mt-en-de")
6
7text = "Machine learning is transforming how we process natural language."
8
9fr_result = translator_en_fr(text)
10de_result = translator_en_de(text)
11
12print(f"English: {text}")
13print(f"French: {fr_result[0]['translation_text']}")
14print(f"German: {de_result[0]['translation_text']}")
15
16# --- Evaluate translation quality ---
17from nltk.translate.bleu_score import sentence_bleu
18
19# If we have reference translations
20reference_fr = "L'apprentissage automatique transforme notre façon de traiter le langage naturel".split()
21generated_fr = fr_result[0]["translation_text"].split()
22
23bleu = sentence_bleu([reference_fr], generated_fr)
24print(f"\nBLEU score: {bleu:.4f}")Limitations of Automatic Metrics
Practical Text Generation with Hugging Face
Here's a comprehensive example showing different generation tasks:
1from transformers import pipeline
2
3# --- Summarization ---
4summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
5
6article = """
7Scientists have discovered a new species of deep-sea fish in the Mariana
8Trench. The fish, named Pseudoliparis swirei, was found at a depth of
98,178 meters, making it the deepest-living fish ever recorded. The
10discovery was made using autonomous underwater vehicles equipped with
11cameras and traps. The fish has a translucent body and lacks scales,
12adaptations that help it survive the extreme pressure at such depths.
13Researchers believe studying this species could provide insights into
14how life adapts to extreme environments.
15"""
16
17summary = summarizer(article, max_length=60, min_length=20)
18print("Summary:", summary[0]["summary_text"])
19
20# --- Question Answering ---
21qa = pipeline("question-answering")
22result = qa(question="At what depth was the fish found?", context=article)
23print(f"\nAnswer: {result['answer']} (confidence: {result['score']:.4f})")
24
25# --- Text Generation (completion) ---
26generator = pipeline("text-generation", model="gpt2")
27prompt = "The future of artificial intelligence depends on"
28output = generator(prompt, max_length=50, num_return_sequences=2, temperature=0.8)
29print("\nGenerated continuations:")
30for i, seq in enumerate(output):
31 print(f" {i+1}. {seq['generated_text']}")