NumPy: The Foundation of ML in Python

NumPy (Numerical Python) is the backbone of nearly every ML library in Python. TensorFlow, PyTorch, scikit-learn — they all rely on NumPy arrays under the hood. If you want to do ML, you must be fluent in NumPy.

Why NumPy?

Speed: Operations run in optimized C/Fortran, not slow Python loops

Memory: Stores data in contiguous memory blocks (cache-friendly)

Ecosystem: The universal data exchange format for ML libraries

Broadcasting: Powerful rules for combining arrays of different shapes

Creating Arrays

NumPy arrays (ndarray) are the fundamental data structure. Here are the most common ways to create them:

python

1import numpy as np
2
3# From Python lists
4a = np.array([1, 2, 3, 4, 5])
5print(a)          # [1 2 3 4 5]
6print(a.dtype)    # int64
7print(a.shape)    # (5,)
8
9# 2D array (matrix)
10matrix = np.array([[1, 2, 3],
11                    [4, 5, 6]])
12print(matrix.shape)  # (2, 3) — 2 rows, 3 columns
13
14# Common creation functions
15zeros = np.zeros((3, 4))          # 3x4 matrix of zeros
16ones = np.ones((2, 5))            # 2x5 matrix of ones
17full = np.full((3, 3), 7)         # 3x3 matrix filled with 7
18eye = np.eye(4)                   # 4x4 identity matrix
19rand = np.random.randn(3, 4)     # 3x4 matrix of random normal values
20arange = np.arange(0, 10, 2)     # [0, 2, 4, 6, 8]
21linspace = np.linspace(0, 1, 5)  # [0.0, 0.25, 0.5, 0.75, 1.0]

Reshaping Arrays

Reshaping is one of the most critical skills in ML. You'll constantly reshape data to match what models expect.

python

1import numpy as np
2
3a = np.arange(12)
4print(a)         # [ 0  1  2  3  4  5  6  7  8  9 10 11]
5print(a.shape)   # (12,)
6
7# Reshape to 3 rows x 4 columns
8b = a.reshape(3, 4)
9print(b)
10# [[ 0  1  2  3]
11#  [ 4  5  6  7]
12#  [ 8  9 10 11]]
13
14# Using -1 lets NumPy infer the dimension
15c = a.reshape(2, -1)   # 2 rows, NumPy figures out 6 columns
16print(c.shape)          # (2, 6)
17
18d = a.reshape(-1, 3)   # NumPy figures out 4 rows, 3 columns
19print(d.shape)          # (4, 3)
20
21# Flatten back to 1D
22flat = b.flatten()      # Returns a copy
23raveled = b.ravel()     # Returns a view (more memory efficient)
24
25# Add a dimension (critical for ML)
26x = np.array([1, 2, 3])          # shape: (3,)
27row_vec = x[np.newaxis, :]       # shape: (1, 3) — row vector
28col_vec = x[:, np.newaxis]       # shape: (3, 1) — column vector
29# Equivalent: x.reshape(1, -1) and x.reshape(-1, 1)

Image Tensors

In computer vision, images are represented as NumPy arrays. Understanding their shape is essential.

python

1import numpy as np
2
3# A single RGB image: (height, width, channels)
4image = np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8)
5print(image.shape)   # (224, 224, 3)
6print(image.dtype)   # uint8 (values 0–255)
7
8# A batch of images: (batch_size, height, width, channels)
9batch = np.random.randint(0, 256, size=(32, 224, 224, 3), dtype=np.uint8)
10print(batch.shape)   # (32, 224, 224, 3)
11
12# Access the 5th image in the batch
13fifth_image = batch[4]         # shape: (224, 224, 3)
14
15# Get the red channel of the first image
16red_channel = batch[0, :, :, 0]  # shape: (224, 224)
17
18# Normalize pixel values to [0, 1] for neural networks
19normalized = batch.astype(np.float32) / 255.0
20print(normalized.dtype)  # float32
21print(normalized.max())  # 1.0

Indexing and Slicing

NumPy provides powerful ways to access and modify array elements.

python

1import numpy as np
2
3a = np.array([[10, 20, 30, 40],
4              [50, 60, 70, 80],
5              [90, 100, 110, 120]])
6
7# Basic indexing (row, column)
8print(a[0, 1])     # 20 — first row, second column
9print(a[2, -1])    # 120 — last row, last column
10
11# Slicing: a[row_start:row_end, col_start:col_end]
12print(a[0:2, 1:3])
13# [[20 30]
14#  [60 70]]
15
16# All rows, specific columns
17print(a[:, 0])     # [10 50 90] — first column
18print(a[:, -1])    # [40 80 120] — last column
19
20# Boolean indexing (filtering)
21mask = a > 50
22print(mask)
23# [[False False False False]
24#  [False  True  True  True]
25#  [ True  True  True  True]]
26print(a[mask])     # [ 60  70  80  90 100 110 120]
27
28# Fancy indexing (index with arrays)
29rows = np.array([0, 2])
30cols = np.array([1, 3])
31print(a[rows, cols])  # [20 120] — elements at (0,1) and (2,3)
32
33# Combining boolean and fancy indexing
34scores = np.array([85, 42, 91, 67, 55, 99])
35passing = scores[scores >= 60]
36print(passing)  # [85 91 67 99]

Broadcasting

Broadcasting is NumPy's way of performing arithmetic on arrays of different shapes. Instead of copying data, NumPy virtually "stretches" smaller arrays to match larger ones. **The rules:** 1. If arrays have different numbers of dimensions, the smaller one is padded with 1s on the left 2. Arrays with size 1 in a dimension act as if they had the size of the largest array in that dimension 3. If sizes disagree and neither is 1, you get an error **Example:** Adding a (3, 4) matrix and a (4,) vector works because the vector is broadcast across all rows. This is how we can subtract the mean from every row, normalize every column, or add a bias to every sample — without writing a single loop.

python

1import numpy as np
2
3# Scalar broadcast: operates on every element
4a = np.array([[1, 2, 3],
5              [4, 5, 6]])
6print(a * 10)
7# [[10 20 30]
8#  [40 50 60]]
9
10# Vector broadcast: vector applied to every row
11row_means = a.mean(axis=1, keepdims=True)  # shape (2, 1)
12centered = a - row_means  # subtracts each row's mean from that row
13
14# Common ML pattern: normalize features (columns)
15data = np.random.randn(100, 5)  # 100 samples, 5 features
16mean = data.mean(axis=0)        # shape (5,) — mean of each feature
17std = data.std(axis=0)          # shape (5,) — std of each feature
18normalized = (data - mean) / std  # broadcasting! shape stays (100, 5)
19
20# Outer product via broadcasting
21x = np.array([1, 2, 3])[:, np.newaxis]  # shape (3, 1)
22y = np.array([10, 20, 30])[np.newaxis, :]  # shape (1, 3)
23outer = x * y  # shape (3, 3)
24print(outer)
25# [[ 10  20  30]
26#  [ 20  40  60]
27#  [ 30  60  90]]

Vectorization: Why NumPy is Fast

The #1 rule of NumPy: avoid Python loops. Use vectorized operations instead. The difference is dramatic.

python

1import numpy as np
2import time
3
4size = 1_000_000
5a = np.random.randn(size)
6b = np.random.randn(size)
7
8# --- SLOW: Python loop ---
9start = time.time()
10result_loop = []
11for i in range(size):
12    result_loop.append(a[i] + b[i])
13loop_time = time.time() - start
14print(f"Python loop: {loop_time:.4f} seconds")
15
16# --- FAST: Vectorized NumPy ---
17start = time.time()
18result_vec = a + b
19vec_time = time.time() - start
20print(f"NumPy vectorized: {vec_time:.6f} seconds")
21
22print(f"Speedup: {loop_time / vec_time:.0f}x faster!")
23# Typical output:
24# Python loop: 0.2500 seconds
25# NumPy vectorized: 0.001200 seconds
26# Speedup: 208x faster!

Vectorization Mindset

Whenever you find yourself writing a for-loop over array elements, stop and ask: "Can I express this as a NumPy operation?" Common replacements: - `for x in arr: total += x` -> `arr.sum()` - `for i: result[i] = a[i] * b[i]` -> `a * b` - `for row in matrix: row / row.sum()` -> `matrix / matrix.sum(axis=1, keepdims=True)` This mindset is essential because the same pattern applies to TensorFlow and PyTorch.

Essential Operations for ML

Here are the NumPy operations you'll reach for constantly in ML work:

python

1import numpy as np
2
3data = np.random.randn(5, 3)
4
5# Aggregation along axes
6print(data.sum(axis=0))    # sum each column — shape (3,)
7print(data.sum(axis=1))    # sum each row — shape (5,)
8print(data.mean(axis=0))   # mean of each feature
9print(data.std(axis=0))    # std of each feature
10
11# Matrix operations
12A = np.random.randn(3, 4)
13B = np.random.randn(4, 2)
14C = A @ B                  # matrix multiply — shape (3, 2)
15# Equivalent: np.dot(A, B) or np.matmul(A, B)
16
17# Transpose
18print(A.T.shape)            # (4, 3)
19
20# Stacking arrays
21x1 = np.array([1, 2, 3])
22x2 = np.array([4, 5, 6])
23vertical = np.vstack([x1, x2])    # shape (2, 3)
24horizontal = np.hstack([x1, x2])  # shape (6,)
25
26# Argmax / Argmin (critical for classification)
27predictions = np.array([0.1, 0.7, 0.2])
28predicted_class = np.argmax(predictions)  # 1
29print(predicted_class)
30
31# Where (conditional selection)
32scores = np.array([85, 42, 91, 67])
33result = np.where(scores >= 60, "pass", "fail")
34print(result)  # ['pass' 'fail' 'pass' 'pass']

1import numpy as np 2 3# From Python lists 4a = np.array([1, 2, 3, 4, 5]) 5print(a) # [1 2 3 4 5] 6print(a.dtype) # int64 7print(a.shape) # (5,) 8 9# 2D array (matrix) 10matrix = np.array([[1, 2, 3], 11 [4, 5, 6]]) 12print(matrix.shape) # (2, 3) — 2 rows, 3 columns 13 14# Common creation functions 15zeros = np.zeros((3, 4)) # 3x4 matrix of zeros 16ones = np.ones((2, 5)) # 2x5 matrix of ones 17full = np.full((3, 3), 7) # 3x3 matrix filled with 7 18eye = np.eye(4) # 4x4 identity matrix 19rand = np.random.randn(3, 4) # 3x4 matrix of random normal values 20arange = np.arange(0, 10, 2) # [0, 2, 4, 6, 8] 21linspace = np.linspace(0, 1, 5) # [0.0, 0.25, 0.5, 0.75, 1.0]

1import numpy as np 2 3a = np.arange(12) 4print(a) # [ 0 1 2 3 4 5 6 7 8 9 10 11] 5print(a.shape) # (12,) 6 7# Reshape to 3 rows x 4 columns 8b = a.reshape(3, 4) 9print(b) 10# [[ 0 1 2 3] 11# [ 4 5 6 7] 12# [ 8 9 10 11]] 13 14# Using -1 lets NumPy infer the dimension 15c = a.reshape(2, -1) # 2 rows, NumPy figures out 6 columns 16print(c.shape) # (2, 6) 17 18d = a.reshape(-1, 3) # NumPy figures out 4 rows, 3 columns 19print(d.shape) # (4, 3) 20 21# Flatten back to 1D 22flat = b.flatten() # Returns a copy 23raveled = b.ravel() # Returns a view (more memory efficient) 24 25# Add a dimension (critical for ML) 26x = np.array([1, 2, 3]) # shape: (3,) 27row_vec = x[np.newaxis, :] # shape: (1, 3) — row vector 28col_vec = x[:, np.newaxis] # shape: (3, 1) — column vector 29# Equivalent: x.reshape(1, -1) and x.reshape(-1, 1)

1import numpy as np 2 3# A single RGB image: (height, width, channels) 4image = np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8) 5print(image.shape) # (224, 224, 3) 6print(image.dtype) # uint8 (values 0–255) 7 8# A batch of images: (batch_size, height, width, channels) 9batch = np.random.randint(0, 256, size=(32, 224, 224, 3), dtype=np.uint8) 10print(batch.shape) # (32, 224, 224, 3) 11 12# Access the 5th image in the batch 13fifth_image = batch[4] # shape: (224, 224, 3) 14 15# Get the red channel of the first image 16red_channel = batch[0, :, :, 0] # shape: (224, 224) 17 18# Normalize pixel values to [0, 1] for neural networks 19normalized = batch.astype(np.float32) / 255.0 20print(normalized.dtype) # float32 21print(normalized.max()) # 1.0

1import numpy as np 2 3a = np.array([[10, 20, 30, 40], 4 [50, 60, 70, 80], 5 [90, 100, 110, 120]]) 6 7# Basic indexing (row, column) 8print(a[0, 1]) # 20 — first row, second column 9print(a[2, -1]) # 120 — last row, last column 10 11# Slicing: a[row_start:row_end, col_start:col_end] 12print(a[0:2, 1:3]) 13# [[20 30] 14# [60 70]] 15 16# All rows, specific columns 17print(a[:, 0]) # [10 50 90] — first column 18print(a[:, -1]) # [40 80 120] — last column 19 20# Boolean indexing (filtering) 21mask = a > 50 22print(mask) 23# [[False False False False] 24# [False True True True] 25# [ True True True True]] 26print(a[mask]) # [ 60 70 80 90 100 110 120] 27 28# Fancy indexing (index with arrays) 29rows = np.array([0, 2]) 30cols = np.array([1, 3]) 31print(a[rows, cols]) # [20 120] — elements at (0,1) and (2,3) 32 33# Combining boolean and fancy indexing 34scores = np.array([85, 42, 91, 67, 55, 99]) 35passing = scores[scores >= 60] 36print(passing) # [85 91 67 99]

1import numpy as np 2 3# Scalar broadcast: operates on every element 4a = np.array([[1, 2, 3], 5 [4, 5, 6]]) 6print(a * 10) 7# [[10 20 30] 8# [40 50 60]] 9 10# Vector broadcast: vector applied to every row 11row_means = a.mean(axis=1, keepdims=True) # shape (2, 1) 12centered = a - row_means # subtracts each row's mean from that row 13 14# Common ML pattern: normalize features (columns) 15data = np.random.randn(100, 5) # 100 samples, 5 features 16mean = data.mean(axis=0) # shape (5,) — mean of each feature 17std = data.std(axis=0) # shape (5,) — std of each feature 18normalized = (data - mean) / std # broadcasting! shape stays (100, 5) 19 20# Outer product via broadcasting 21x = np.array([1, 2, 3])[:, np.newaxis] # shape (3, 1) 22y = np.array([10, 20, 30])[np.newaxis, :] # shape (1, 3) 23outer = x * y # shape (3, 3) 24print(outer) 25# [[ 10 20 30] 26# [ 20 40 60] 27# [ 30 60 90]]

1import numpy as np 2import time 3 4size = 1_000_000 5a = np.random.randn(size) 6b = np.random.randn(size) 7 8# --- SLOW: Python loop --- 9start = time.time() 10result_loop = [] 11for i in range(size): 12 result_loop.append(a[i] + b[i]) 13loop_time = time.time() - start 14print(f"Python loop: {loop_time:.4f} seconds") 15 16# --- FAST: Vectorized NumPy --- 17start = time.time() 18result_vec = a + b 19vec_time = time.time() - start 20print(f"NumPy vectorized: {vec_time:.6f} seconds") 21 22print(f"Speedup: {loop_time / vec_time:.0f}x faster!") 23# Typical output: 24# Python loop: 0.2500 seconds 25# NumPy vectorized: 0.001200 seconds 26# Speedup: 208x faster!

1import numpy as np 2 3data = np.random.randn(5, 3) 4 5# Aggregation along axes 6print(data.sum(axis=0)) # sum each column — shape (3,) 7print(data.sum(axis=1)) # sum each row — shape (5,) 8print(data.mean(axis=0)) # mean of each feature 9print(data.std(axis=0)) # std of each feature 10 11# Matrix operations 12A = np.random.randn(3, 4) 13B = np.random.randn(4, 2) 14C = A @ B # matrix multiply — shape (3, 2) 15# Equivalent: np.dot(A, B) or np.matmul(A, B) 16 17# Transpose 18print(A.T.shape) # (4, 3) 19 20# Stacking arrays 21x1 = np.array([1, 2, 3]) 22x2 = np.array([4, 5, 6]) 23vertical = np.vstack([x1, x2]) # shape (2, 3) 24horizontal = np.hstack([x1, x2]) # shape (6,) 25 26# Argmax / Argmin (critical for classification) 27predictions = np.array([0.1, 0.7, 0.2]) 28predicted_class = np.argmax(predictions) # 1 29print(predicted_class) 30 31# Where (conditional selection) 32scores = np.array([85, 42, 91, 67]) 33result = np.where(scores >= 60, "pass", "fail") 34print(result) # ['pass' 'fail' 'pass' 'pass']