Privacy & Security in AI
AI systems are uniquely vulnerable to privacy and security threats. Unlike traditional software, ML models can memorize training data, be reverse-engineered to reveal private information, and be manipulated through adversarial inputs. This lesson covers the techniques for building privacy-preserving and secure AI systems.
The Privacy Paradox of ML
Differential Privacy
Differential privacy (DP) provides a mathematical guarantee that the output of a computation does not reveal whether any individual's data was included in the input.
The Core Idea
A mechanism M is epsilon-differentially private if for any two datasets D and D' that differ by one record, and any set of outputs S:
P[M(D) in S] <= e^epsilon * P[M(D') in S]
Understanding Epsilon
The privacy budget (epsilon) controls the privacy-utility trade-off:
| Epsilon | Privacy Level | Utility |
|---|---|---|
| 0.1 | Very strong privacy | Lower accuracy |
| 1.0 | Strong privacy | Good accuracy |
| 10.0 | Weak privacy | High accuracy |
| infinity | No privacy | Maximum accuracy |
Common Mechanisms
1. Laplace Mechanism: Adds noise drawn from a Laplace distribution to numeric query results 2. Gaussian Mechanism: Adds Gaussian noise (requires relaxed "approximate DP") 3. Exponential Mechanism: For categorical outputs — selects results with probability proportional to their quality score 4. Randomized Response: For surveys — each respondent flips a coin to decide whether to answer truthfully
1import numpy as np
2
3def laplace_mechanism(true_value: float, sensitivity: float,
4 epsilon: float) -> float:
5 """Add Laplace noise for differential privacy.
6
7 Args:
8 true_value: The actual query result
9 sensitivity: Max change in output when one record changes
10 epsilon: Privacy budget (lower = more private)
11
12 Returns:
13 Noisy result satisfying epsilon-differential privacy
14 """
15 scale = sensitivity / epsilon
16 noise = np.random.laplace(0, scale)
17 return true_value + noise
18
19def gaussian_mechanism(true_value: float, sensitivity: float,
20 epsilon: float, delta: float = 1e-5) -> float:
21 """Add Gaussian noise for (epsilon, delta)-differential privacy."""
22 sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
23 noise = np.random.normal(0, sigma)
24 return true_value + noise
25
26# Example: Computing average salary with differential privacy
27np.random.seed(42)
28salaries = np.array([50000, 65000, 72000, 48000, 95000,
29 61000, 58000, 83000, 71000, 55000])
30
31true_mean = np.mean(salaries)
32sensitivity = (max(salaries) - min(salaries)) / len(salaries)
33
34print(f"True mean salary: ${true_mean:,.0f}")
35print(f"Sensitivity: ${sensitivity:,.0f}")
36print()
37
38# Compare different epsilon values
39for eps in [0.1, 1.0, 5.0, 10.0]:
40 noisy_results = [
41 laplace_mechanism(true_mean, sensitivity, eps)
42 for _ in range(1000)
43 ]
44 avg_noisy = np.mean(noisy_results)
45 std_noisy = np.std(noisy_results)
46 print(f"epsilon={eps:5.1f}: "
47 f"mean=${avg_noisy:>10,.0f}, "
48 f"std=${std_noisy:>10,.0f}")Federated Learning
Federated learning trains ML models across multiple decentralized devices or organizations without sharing raw data. Each participant trains on their local data and shares only model updates (gradients).
How It Works
1. Central server sends the current model to all participants 2. Each participant trains the model on their local data 3. Participants send only model updates (gradients) back to the server 4. Server aggregates updates (e.g., FedAvg) to produce an improved global model 5. Repeat until convergence
Key Benefits
Challenges
1import numpy as np
2
3def federated_averaging(client_models, client_sizes):
4 """Federated Averaging (FedAvg) algorithm.
5
6 Aggregates model parameters from multiple clients,
7 weighted by the number of samples each client has.
8
9 Args:
10 client_models: List of model weight arrays (one per client)
11 client_sizes: List of dataset sizes (one per client)
12
13 Returns:
14 Aggregated global model weights
15 """
16 total_samples = sum(client_sizes)
17 # Weighted average of all client models
18 global_model = np.zeros_like(client_models[0])
19 for model, size in zip(client_models, client_sizes):
20 weight = size / total_samples
21 global_model += weight * model
22 return global_model
23
24def simulate_federated_learning(num_clients=5, num_rounds=10):
25 """Simulate federated learning for a simple linear model."""
26 np.random.seed(42)
27
28 # True model: y = 3x + 2 + noise
29 true_weights = np.array([3.0, 2.0]) # [slope, intercept]
30
31 # Each client has different data (non-IID simulation)
32 client_data = []
33 client_sizes = []
34 for i in range(num_clients):
35 n = np.random.randint(50, 200)
36 x = np.random.uniform(i * 2, i * 2 + 5, n) # Different ranges
37 y = 3 * x + 2 + np.random.normal(0, 1, n)
38 client_data.append((x, y))
39 client_sizes.append(n)
40
41 # Initialize global model
42 global_weights = np.array([0.0, 0.0])
43 lr = 0.01
44
45 print("Federated Learning Simulation")
46 print(f"True weights: {true_weights}")
47 print(f"Clients: {num_clients}, Rounds: {num_rounds}")
48 print(f"Client sizes: {client_sizes}\n")
49
50 for round_num in range(num_rounds):
51 client_models = []
52
53 for i in range(num_clients):
54 x, y = client_data[i]
55 # Local training (1 epoch of gradient descent)
56 local_weights = global_weights.copy()
57 for _ in range(5): # 5 local steps
58 preds = local_weights[0] * x + local_weights[1]
59 errors = preds - y
60 grad_w = 2 * np.mean(errors * x)
61 grad_b = 2 * np.mean(errors)
62 local_weights[0] -= lr * grad_w
63 local_weights[1] -= lr * grad_b
64 client_models.append(local_weights)
65
66 # Aggregate with FedAvg
67 global_weights = federated_averaging(
68 client_models, client_sizes
69 )
70
71 if (round_num + 1) % 2 == 0:
72 print(f"Round {round_num + 1:2d}: "
73 f"weights = [{global_weights[0]:.3f}, "
74 f"{global_weights[1]:.3f}]")
75
76 print(f"\nFinal: [{global_weights[0]:.3f}, {global_weights[1]:.3f}]")
77 print(f"Target: [{true_weights[0]:.3f}, {true_weights[1]:.3f}]")
78
79simulate_federated_learning()Attacks on ML Models
Model Inversion Attacks
An attacker with access to a model's predictions reconstructs the training data. For example, given a facial recognition model, an attacker can generate approximate images of faces in the training set.
Membership Inference Attacks
An attacker determines whether a specific data point was in the training set. The key insight: models tend to be more confident on data they were trained on.
Adversarial Examples
Carefully crafted inputs that look normal to humans but cause the model to make incorrect predictions. A tiny perturbation to an image can flip a classifier's prediction with high confidence.
Data Poisoning
An attacker injects malicious data into the training set to manipulate the model's behavior. This can create backdoors — e.g., a stop sign with a small sticker is classified as a speed limit sign.
Defenses
| Attack | Defense |
|---|---|
| Model inversion | Differential privacy, output perturbation, limiting prediction confidence |
| Membership inference | Differential privacy, regularization, limiting query access |
| Adversarial examples | Adversarial training, input preprocessing, certified defenses |
| Data poisoning | Data validation, anomaly detection, robust aggregation |
Secure Aggregation
In federated learning, secure aggregation ensures the server can compute the aggregate of client updates without seeing any individual client's update. This is achieved through cryptographic protocols (homomorphic encryption, secret sharing).
Data Anonymization Techniques
| Technique | Description | Limitation |
|---|---|---|
| k-anonymity | Each record is indistinguishable from k-1 others | Vulnerable to attribute disclosure |
| l-diversity | Each group has at least l distinct sensitive values | Doesn't prevent probabilistic inference |
| t-closeness | Distribution of sensitive values in each group is close to overall distribution | Computationally expensive |
| Synthetic data | Generate artificial data that preserves statistical properties | May not capture rare patterns |