Skip to main content

Linear & Logistic Regression

Understand regression fundamentals, regularization techniques, and classification with logistic regression

~45 min
Listen to this lesson

Linear & Logistic Regression

Linear models are the workhorses of machine learning. They are fast, interpretable, and surprisingly powerful. Understanding them deeply gives you a foundation for understanding all other models.

Linear Regression

Linear regression models the relationship between features and a continuous target as a weighted sum:

**y = w1*x1 + w2*x2 + ... + wn*xn + b

Where:

  • w (weights/coefficients) determine the importance of each feature
  • b (bias/intercept) is the baseline prediction when all features are zero
  • The goal is to find w and b that minimize the error
  • Ordinary Least Squares (OLS)

    The most common approach minimizes the Mean Squared Error (MSE):

    MSE = (1/n) * sum((y_pred - y_actual)^2)

    This is called Ordinary Least Squares because it minimizes the sum of squared residuals. There are two ways to solve it:

    1. Normal Equation: Closed-form solution (fast for small datasets) 2. Gradient Descent**: Iterative optimization (scales to large datasets)

    The Cost Function

    The cost function (also called loss function or objective function) measures how wrong your model is. For linear regression, we use MSE. Training = finding the parameters that minimize this cost. Every ML algorithm has a cost function it optimizes.
    python
    1from sklearn.linear_model import LinearRegression
    2from sklearn.datasets import make_regression
    3from sklearn.model_selection import train_test_split
    4from sklearn.metrics import mean_squared_error, r2_score
    5import numpy as np
    6
    7# Generate synthetic regression data
    8X, y = make_regression(n_samples=200, n_features=3, noise=10, random_state=42)
    9X_train, X_test, y_train, y_test = train_test_split(
    10    X, y, test_size=0.2, random_state=42
    11)
    12
    13# Train a linear regression model
    14model = LinearRegression()
    15model.fit(X_train, y_train)
    16
    17# Evaluate
    18y_pred = model.predict(X_test)
    19mse = mean_squared_error(y_test, y_pred)
    20r2 = r2_score(y_test, y_pred)
    21
    22print(f"Coefficients: {model.coef_}")
    23print(f"Intercept:    {model.intercept_:.4f}")
    24print(f"MSE:          {mse:.4f}")
    25print(f"R2 Score:     {r2:.4f}")

    Regularization: Ridge, Lasso, and ElasticNet

    Plain linear regression can overfit when you have many features or correlated features. Regularization adds a penalty term to the cost function that discourages large coefficients.

    Ridge Regression (L2 Regularization)

    Adds the sum of squared coefficients to the cost:

    **Cost = MSE + alpha * sum(w^2)

  • Shrinks coefficients toward zero but never exactly to zero
  • Good when you have many correlated features
  • All features are kept in the model
  • Lasso Regression (L1 Regularization)

    Adds the sum of absolute coefficients to the cost:

    Cost = MSE + alpha * sum(|w|)

  • Can shrink coefficients exactly to zero (feature selection!)
  • Good when you suspect only a few features matter
  • Automatically removes irrelevant features
  • ElasticNet (L1 + L2 Combined)

    Cost = MSE + alpha * (l1_ratio * sum(|w|) + (1 - l1_ratio) * sum(w^2))**

  • Combines benefits of both Ridge and Lasso
  • The l1_ratio parameter controls the mix (0 = Ridge, 1 = Lasso)
  • L1 vs L2 Regularization

    L1 (Lasso) produces sparse models by driving some weights to exactly zero, effectively performing feature selection. L2 (Ridge) distributes the penalty evenly across all features, shrinking them toward zero but keeping them all. Use L1 when you want automatic feature selection; use L2 when all features might be relevant.
    python
    1from sklearn.linear_model import Ridge, Lasso, ElasticNet
    2import numpy as np
    3
    4# Compare regularization methods
    5models = {
    6    "Linear": LinearRegression(),
    7    "Ridge (L2)": Ridge(alpha=1.0),
    8    "Lasso (L1)": Lasso(alpha=1.0),
    9    "ElasticNet": ElasticNet(alpha=1.0, l1_ratio=0.5),
    10}
    11
    12print(f"{'Model':<18} {'R2':>8} {'Non-zero coefs':>16}")
    13print("-" * 44)
    14for name, model in models.items():
    15    model.fit(X_train, y_train)
    16    r2 = model.score(X_test, y_test)
    17    non_zero = np.sum(np.abs(model.coef_) > 1e-6)
    18    print(f"{name:<18} {r2:>8.4f} {non_zero:>16}")

    Logistic Regression

    Despite its name, logistic regression is a classification algorithm. It uses the sigmoid function to map any real number to a probability between 0 and 1:

    sigma(z) = 1 / (1 + e^(-z))

    Where z = w*x + b (just like linear regression). The sigmoid "squashes" the output:

  • Large positive z -> probability near 1
  • Large negative z -> probability near 0
  • z = 0 -> probability = 0.5 (decision boundary)
  • The model predicts class 1 if the probability > 0.5, and class 0 otherwise.

    python
    1from sklearn.linear_model import LogisticRegression
    2from sklearn.datasets import load_breast_cancer
    3from sklearn.model_selection import train_test_split
    4from sklearn.metrics import accuracy_score, classification_report
    5
    6# Load binary classification dataset
    7X, y = load_breast_cancer(return_X_y=True)
    8X_train, X_test, y_train, y_test = train_test_split(
    9    X, y, test_size=0.2, random_state=42, stratify=y
    10)
    11
    12# Train logistic regression
    13model = LogisticRegression(max_iter=5000, random_state=42)
    14model.fit(X_train, y_train)
    15
    16# Predict probabilities and classes
    17y_prob = model.predict_proba(X_test)[:5]  # First 5 probabilities
    18y_pred = model.predict(X_test)
    19
    20print("First 5 predicted probabilities [class 0, class 1]:")
    21for i, probs in enumerate(y_prob):
    22    print(f"  Sample {i}: [{probs[0]:.4f}, {probs[1]:.4f}] -> class {y_pred[i]}")
    23
    24print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.4f}")
    25print(f"\n{classification_report(y_test, y_pred)}")

    Multi-Class Classification

    For more than 2 classes, logistic regression extends via:

  • One-vs-Rest (OvR): Train one binary classifier per class. Each classifier answers "Is it this class or not?" The class with the highest confidence wins.
  • Multinomial / Softmax: Directly model probabilities across all classes. The softmax function generalizes the sigmoid to multiple classes:
  • P(class_k) = e^(z_k) / sum(e^(z_j) for all j)

    Softmax ensures all class probabilities sum to 1.

    python
    1from sklearn.linear_model import LogisticRegression
    2from sklearn.datasets import load_iris
    3
    4X, y = load_iris(return_X_y=True)
    5X_train, X_test, y_train, y_test = train_test_split(
    6    X, y, test_size=0.2, random_state=42, stratify=y
    7)
    8
    9# Multi-class with softmax (multinomial)
    10model = LogisticRegression(
    11    multi_class="multinomial",
    12    solver="lbfgs",
    13    max_iter=200,
    14    random_state=42
    15)
    16model.fit(X_train, y_train)
    17
    18# Predict probabilities for all 3 classes
    19sample_probs = model.predict_proba(X_test[:3])
    20class_names = load_iris().target_names
    21
    22print("Predicted probabilities:")
    23for i, probs in enumerate(sample_probs):
    24    print(f"  Sample {i}: {dict(zip(class_names, probs.round(4)))}")
    25
    26print(f"\nAccuracy: {model.score(X_test, y_test):.4f}")

    When to Use Linear vs Logistic Regression

    Use Linear Regression when your target is continuous (prices, temperatures, scores). Use Logistic Regression when your target is categorical (spam/not spam, species, diagnosis). A common beginner mistake is using linear regression for classification -- it can produce predictions outside [0,1] and doesn't model probabilities correctly.