Object Detection
Object detection goes beyond classification by identifying what objects are in an image and where they are located, using bounding boxes.
Problem Formulation
Each detection consists of:
Intersection over Union (IoU)
IoU measures the overlap between a predicted bounding box and a ground truth box:
$$\text{IoU} = \frac{\text{Area of Intersection}}{\text{Area of Union}}$$
Two-Stage Detectors
Two-stage detectors first propose regions that might contain objects, then classify and refine those regions.
R-CNN (2014)
1. Use Selective Search to propose ~2000 region candidates 2. Warp each region to a fixed size 3. Pass each through a CNN to extract features 4. Classify with SVMs + refine boxes with regressionFast R-CNN (2015)
1. Pass the entire image through a CNN once to get a feature map 2. Project region proposals onto the feature map 3. Use RoI Pooling to extract fixed-size features for each proposal 4. Classify + regress in a single networkFaster R-CNN (2016)
1. Replace Selective Search with a Region Proposal Network (RPN) 2. RPN shares the CNN backbone, producing proposals nearly for free 3. RoI Pooling + classification + regression as in Fast R-CNNAnchor Boxes
One-Stage Detectors
One-stage detectors skip the region proposal step and predict boxes and classes directly from the feature map in a single pass. They are typically faster but historically less accurate (though this gap has largely closed).
SSD (Single Shot MultiBox Detector, 2016)
YOLO Family
You Only Look Once — the most popular object detection framework.
#### YOLO v1 (2016)
#### YOLO v2/v3 (2017-2018)
#### YOLO v5 / v8 (Ultralytics)
#### YOLO v11 / YOLO-World (2024+)
Non-Maximum Suppression (NMS)
Detectors produce many overlapping boxes. NMS filters them: 1. Sort all detections by confidence score 2. Take the highest-scoring box, add it to the final results 3. Remove all remaining boxes with IoU > threshold (e.g., 0.5) with the selected box 4. Repeat until no boxes remain
Evaluation Metrics
Precision and Recall (per class)
Average Precision (AP)
The area under the Precision-Recall curve for a single class, at a given IoU threshold.Mean Average Precision (mAP)
1# ==============================================================
2# Using Ultralytics YOLOv8 for object detection
3# pip install ultralytics
4# ==============================================================
5from ultralytics import YOLO
6from PIL import Image
7import matplotlib.pyplot as plt
8import matplotlib.patches as patches
9import numpy as np
10
11# Load a pretrained YOLOv8 model
12model = YOLO("yolov8n.pt") # nano model — fast and lightweight
13
14# Run inference on an image
15results = model("https://ultralytics.com/images/bus.jpg")
16
17# Parse results
18result = results[0]
19boxes = result.boxes
20
21print(f"Detected {len(boxes)} objects:\n")
22for box in boxes:
23 cls_id = int(box.cls[0])
24 cls_name = result.names[cls_id]
25 confidence = float(box.conf[0])
26 x1, y1, x2, y2 = box.xyxy[0].tolist()
27 print(f" {cls_name}: {confidence:.2f} at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")
28
29# Visualize results
30fig, ax = plt.subplots(1, figsize=(12, 8))
31img = Image.open(result.path)
32ax.imshow(img)
33
34colors = plt.cm.Set3(np.linspace(0, 1, len(result.names)))
35for box in boxes:
36 cls_id = int(box.cls[0])
37 conf = float(box.conf[0])
38 x1, y1, x2, y2 = box.xyxy[0].tolist()
39 color = colors[cls_id % len(colors)]
40 rect = patches.Rectangle(
41 (x1, y1), x2 - x1, y2 - y1,
42 linewidth=2, edgecolor=color, facecolor="none"
43 )
44 ax.add_patch(rect)
45 ax.text(x1, y1 - 5, f"{result.names[cls_id]} {conf:.2f}",
46 color="white", fontsize=10,
47 bbox=dict(boxstyle="round,pad=0.2", facecolor=color, alpha=0.8))
48
49ax.axis("off")
50plt.tight_layout()
51plt.show()1# ==============================================================
2# Calculating IoU from scratch
3# ==============================================================
4def calculate_iou(box1, box2):
5 """
6 Calculate IoU between two boxes in [x1, y1, x2, y2] format.
7 """
8 # Intersection coordinates
9 x1 = max(box1[0], box2[0])
10 y1 = max(box1[1], box2[1])
11 x2 = min(box1[2], box2[2])
12 y2 = min(box1[3], box2[3])
13
14 # Intersection area (0 if no overlap)
15 intersection = max(0, x2 - x1) * max(0, y2 - y1)
16
17 # Union area
18 area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
19 area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
20 union = area1 + area2 - intersection
21
22 return intersection / union if union > 0 else 0.0
23
24# Example
25pred_box = [100, 100, 200, 200]
26gt_box = [120, 110, 210, 210]
27iou = calculate_iou(pred_box, gt_box)
28print(f"IoU: {iou:.3f}") # ~0.53
29
30# ==============================================================
31# Non-Maximum Suppression from scratch
32# ==============================================================
33def nms(boxes, scores, iou_threshold=0.5):
34 """
35 Apply Non-Maximum Suppression.
36 boxes: list of [x1, y1, x2, y2]
37 scores: list of confidence scores
38 Returns: indices of kept boxes
39 """
40 indices = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
41 keep = []
42
43 while indices:
44 current = indices.pop(0)
45 keep.append(current)
46 indices = [
47 i for i in indices
48 if calculate_iou(boxes[current], boxes[i]) < iou_threshold
49 ]
50
51 return keep
52
53# Example: Multiple overlapping detections of same object
54boxes = [
55 [100, 100, 200, 200], # High confidence
56 [105, 95, 205, 195], # Overlapping, lower confidence
57 [110, 100, 210, 200], # Overlapping, even lower
58 [300, 300, 400, 400], # Different object
59]
60scores = [0.95, 0.88, 0.75, 0.92]
61
62kept = nms(boxes, scores, iou_threshold=0.5)
63print(f"Kept box indices: {kept}") # [0, 3] — one per cluster