Module 6: Cross-Validation & Regularization

What this module covers

This module tackles two of the most important challenges in machine learning: underfitting and overfitting. These are described in terms of bias (when a model is too simple and misses important patterns) and variance (when a model is too complex and memorizes noise in the training data).

You’ll learn how cross-validation (CV) provides a more reliable way to evaluate models by repeatedly splitting data into training and validation folds, helping detect when a model is underfitting or overfitting. You’ll then learn regularization — adding a penalty term to the cost function that automatically discourages overly large coefficients:

Ridge (L2) regularization shrinks all coefficients smoothly toward zero
Lasso (L1) can eliminate some coefficients entirely, performing automatic feature selection
Elastic Net combines both approaches

Materials

This module has two lecture notebooks (covering distinct but related topics) and one practice exercise.

Slides: Regularization (pdf)

Lecture Part 1: Bias & Variance — understanding underfitting, overfitting, and how to diagnose them with learning curves.

Lecture Part 2: Cross-Validation & Regularization — k-fold CV, Ridge, Lasso, Elastic Net, and hyperparameter tuning.

Practice: CV, Lasso, Ridge & Pipelines — implement logistic regression with polynomial features inside a pipeline, explore cross-validation, add regularization, and perform grid search with GridSearchCV.

Prerequisites

Module 5 — data preprocessing and scikit-learn pipelines.

Next module: Module 7: Artificial Neural Networks — the stepping stone to deep learning.