Tensors & Automatic Differentiation
PyTorch is built around tensors — multi-dimensional arrays similar to NumPy's ndarrays, but with two superpowers:
1. GPU acceleration — tensors can live on CUDA-capable GPUs for massively parallel computation. 2. Automatic differentiation — PyTorch tracks every operation on tensors and can compute gradients automatically.
These two features make PyTorch the backbone of modern deep learning research and production systems.
Creating Tensors
There are many ways to create tensors in PyTorch:
1import torch
2
3# From Python lists
4a = torch.tensor([1, 2, 3])
5b = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
6
7# Common factory functions
8zeros = torch.zeros(3, 4) # 3x4 matrix of zeros
9ones = torch.ones(2, 3, 5) # 2x3x5 tensor of ones
10rand = torch.rand(4, 4) # uniform random [0, 1)
11randn = torch.randn(4, 4) # standard normal distribution
12arange = torch.arange(0, 10, 2) # [0, 2, 4, 6, 8]
13linspace = torch.linspace(0, 1, 5) # [0.0, 0.25, 0.5, 0.75, 1.0]
14eye = torch.eye(3) # 3x3 identity matrix
15
16# From NumPy (shared memory — changes to one affect the other!)
17import numpy as np
18np_array = np.array([1.0, 2.0, 3.0])
19from_numpy = torch.from_numpy(np_array)
20
21# Specifying dtype
22x = torch.tensor([1, 2, 3], dtype=torch.float32)
23y = torch.zeros(3, dtype=torch.int64)
24
25print(f"Shape: {b.shape}, Dtype: {b.dtype}, Device: {b.device}")
26# Shape: torch.Size([2, 2]), Dtype: torch.float32, Device: cpuTensor Data Types (dtypes)
Tensor Operations
PyTorch tensors support a rich set of operations for reshaping, slicing, and computation:
1import torch
2
3x = torch.arange(12, dtype=torch.float32)
4
5# --- Reshaping ---
6a = x.reshape(3, 4) # Reshape to 3x4 (may copy)
7b = x.view(3, 4) # Reshape to 3x4 (requires contiguous memory)
8c = x.reshape(2, -1) # -1 means "infer this dimension" -> 2x6
9
10# --- Squeeze / Unsqueeze ---
11t = torch.zeros(1, 3, 1, 4)
12print(t.shape) # torch.Size([1, 3, 1, 4])
13print(t.squeeze().shape) # torch.Size([3, 4]) — removes all size-1 dims
14print(t.squeeze(0).shape) # torch.Size([3, 1, 4]) — removes dim 0 only
15
16u = torch.zeros(3, 4)
17print(u.unsqueeze(0).shape) # torch.Size([1, 3, 4]) — add dim at position 0
18print(u.unsqueeze(-1).shape) # torch.Size([3, 4, 1]) — add dim at end
19
20# --- Indexing and Slicing (NumPy-style) ---
21m = torch.arange(12).reshape(3, 4)
22print(m[0]) # First row: tensor([0, 1, 2, 3])
23print(m[:, 1]) # Second column: tensor([1, 5, 9])
24print(m[1:, :2]) # Rows 1+, first 2 cols: tensor([[4, 5], [8, 9]])
25
26# --- Transpose and Permute ---
27t = torch.randn(2, 3, 4)
28print(t.transpose(0, 2).shape) # torch.Size([4, 3, 2])
29print(t.permute(2, 0, 1).shape) # torch.Size([4, 2, 3])
30
31# --- Concatenation and Stacking ---
32a = torch.ones(2, 3)
33b = torch.zeros(2, 3)
34cat = torch.cat([a, b], dim=0) # Shape: [4, 3]
35stack = torch.stack([a, b], dim=0) # Shape: [2, 2, 3]
36
37# --- Math operations ---
38x = torch.tensor([1.0, 2.0, 3.0])
39y = torch.tensor([4.0, 5.0, 6.0])
40print(x + y) # Element-wise addition
41print(x * y) # Element-wise multiplication
42print(x @ y) # Dot product: tensor(32.)
43print(torch.matmul(x.unsqueeze(0), y.unsqueeze(1))) # Matrix multiply.view() vs .reshape()
GPU Operations
PyTorch makes it easy to move computations to GPU:
1import torch
2
3# Check GPU availability
4device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
5print(f"Using device: {device}")
6
7# Move tensors to GPU
8x = torch.randn(1000, 1000)
9x_gpu = x.to(device) # Move to GPU (or stay on CPU)
10x_gpu = x.cuda() # Explicitly move to CUDA (errors if no GPU)
11
12# Create directly on GPU
13y = torch.randn(1000, 1000, device=device)
14
15# Operations between tensors must be on the same device!
16z = x_gpu @ y # Both on same device — works
17# z = x @ x_gpu # ERROR: can't mix CPU and GPU tensors
18
19# Move back to CPU (e.g., for NumPy conversion)
20result = z.cpu().numpy()
21
22# For Apple Silicon Macs:
23if torch.backends.mps.is_available():
24 mps_device = torch.device("mps")
25 x_mps = x.to(mps_device)Device Mismatches
Automatic Differentiation (Autograd)
This is the magic that makes training neural networks possible. PyTorch builds a computational graph dynamically as you perform operations, then uses it to compute gradients via backpropagation.
1import torch
2
3# --- Basic autograd ---
4# requires_grad=True tells PyTorch to track operations for gradient computation
5x = torch.tensor(3.0, requires_grad=True)
6
7# Forward pass: compute y = x^2 + 2x + 1
8y = x**2 + 2*x + 1
9
10# Backward pass: compute dy/dx
11y.backward()
12
13# The gradient is stored in x.grad
14print(x.grad) # tensor(8.0) because dy/dx = 2x + 2 = 2(3) + 2 = 8
15
16# --- With vectors ---
17x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
18y = (x ** 2).sum() # Need a scalar for .backward()
19y.backward()
20print(x.grad) # tensor([2., 4., 6.]) — dy/dx_i = 2*x_i
21
22# --- Gradient accumulation (important!) ---
23x = torch.tensor(2.0, requires_grad=True)
24
25y1 = x ** 2
26y1.backward()
27print(x.grad) # tensor(4.0)
28
29y2 = x ** 3
30y2.backward()
31print(x.grad) # tensor(16.0) — gradients ACCUMULATE! 4 + 12 = 16
32
33# Always zero gradients before a new computation
34x.grad.zero_()
35y3 = x ** 3
36y3.backward()
37print(x.grad) # tensor(12.0) — now correctComputational Graph
Detach and No-Grad
Sometimes you need to stop gradient tracking:
1import torch
2
3x = torch.tensor(3.0, requires_grad=True)
4y = x ** 2
5
6# .detach() creates a new tensor that shares data but has no gradient history
7z = y.detach()
8print(z.requires_grad) # False
9# z is a "view" of y's data, but gradient won't flow through it
10
11# torch.no_grad() context manager — disables gradient computation entirely
12# Use during inference for speed and memory savings
13x = torch.randn(1000, 1000, requires_grad=True)
14with torch.no_grad():
15 y = x @ x.T # No computational graph built
16 print(y.requires_grad) # False
17
18# Common pattern: evaluation / inference
19model = ... # some nn.Module
20model.eval()
21with torch.no_grad():
22 predictions = model(test_data)
23
24# torch.inference_mode() — even faster than no_grad (PyTorch 1.9+)
25with torch.inference_mode():
26 predictions = model(test_data)