Module 6: Cross-Validation & Regularization

What this module covers

This module tackles two of the most important challenges in machine learning: underfitting and overfitting. These are described in terms of bias (when a model is too simple and misses important patterns) and variance (when a model is too complex and memorizes noise in the training data).

You’ll learn how cross-validation (CV) provides a more reliable way to evaluate models by repeatedly splitting data into training and validation folds, helping detect when a model is underfitting or overfitting. You’ll then learn regularization — adding a penalty term to the cost function that automatically discourages overly large coefficients:

  • Ridge (L2) regularization shrinks all coefficients smoothly toward zero
  • Lasso (L1) can eliminate some coefficients entirely, performing automatic feature selection
  • Elastic Net combines both approaches

Materials

This module has two lecture notebooks (covering distinct but related topics) and one practice exercise.

Slides: Regularization (pdf)

Lecture Part 1: Bias & Variance — understanding underfitting, overfitting, and how to diagnose them with learning curves.

Lecture Part 2: Cross-Validation & Regularization — k-fold CV, Ridge, Lasso, Elastic Net, and hyperparameter tuning.

Practice: CV, Lasso, Ridge & Pipelines — implement logistic regression with polynomial features inside a pipeline, explore cross-validation, add regularization, and perform grid search with GridSearchCV.

Prerequisites

Module 5 — data preprocessing and scikit-learn pipelines.


Next module: Module 7: Artificial Neural Networks — the stepping stone to deep learning.