Labs Overview

All 12 labs at a glance

The labs progress from “I just got a Unity account” to “I can run a reproducible job on a GPU node.” Each lab takes roughly 1 hour of focused work and produces a verifiable artifact (a config file, a Slurm log, a seff report, a screenshot, an environment.yml, etc.).

You should generally do the labs in order — later ones assume earlier setup is in place.

Module 1 — Connection & environment

Lab 01 — SSH config and multiplexing: Generate an SSH key, copy it to ASC and Unity, write a ~/.ssh/config with connection multiplexing, and demonstrate that ssh unity works with only one Duo prompt per ten-minute window. Deliverable: working SSH config + screenshot/log showing two consecutive logins with only one Duo tap.
Lab 02 — VS Code Remote-SSH + an AI coding assistant: Install VS Code, set up Remote-SSH timeouts, connect to Unity, and install one of three AI coding assistants — Claude Code, GitHub Copilot, or Gemini Code Assist — on the remote side. Have your chosen assistant write and run a small “hello, cluster” script. Deliverable: screenshot of VS Code connected to Unity with your chosen assistant’s extension active, plus the generated hello_cluster.py.
Lab 03 — Shell environment foundations: Inspect your .bash_profile and .bashrc, add umask 002, install 5 useful aliases, find your Unix groups and Slurm partitions. Deliverable: annotated .bashrc diff + output of groups and sacctmgr show association user=$USER.
Lab 04 — Persistent sessions: tmux + livenode: Practice basic tmux (new / detach / attach / kill), then install the livenode() function and demonstrate full reconnection after a disconnect. Deliverable: transcript showing a tmux session that survived a forced disconnect, with a Python process still running inside.

Module 2 — Python environments & notebooks

Lab 05 — Mamba and your first project env: Install Miniforge3, create a project env, install scientific packages, export environment.yml, register the env as a Jupyter kernel. Deliverable: environment.yml + output of mamba list.
Lab 06 — The pip-conda trap and how to recover: Deliberately break a mamba env by pip install-ing something that re-installs NumPy from PyPI; diagnose what happened; recover by recreating; redo correctly with pip install --no-deps. Deliverable: mini incident report + the fixed env.
Lab 07 — Jupyter on the cluster: OnDemand vs. SSH tunnel: Launch Jupyter via Unity OnDemand, switch to JupyterLab via the /lab URL trick. Then do the headless approach (sinteractive → jupyter notebook --no-browser → ssh -L ... mynode) and compare. Deliverable: screenshots of both approaches plus a notebook with a small plot rendered from cluster data.

Module 3 — Slurm: submitting jobs

Lab 08 — Your first Slurm batch job: Write a minimal myjob.slurm that runs a Python script, submit, monitor with squeue, find output, run seff on the completed job. Deliverable: Slurm script + completion log + seff output.
Lab 09 — Right-sizing: the headline lab: Take a provided single-threaded scikit-learn job that needs ~30 GB on a 96 GB/48-CPU node. Submit it with a deliberate over-allocation (whole node), examine seff’s efficiency report, then resubmit right-sized. Deliverable: two side-by-side seff outputs + brief reflection on backfill scheduling.
Lab 10 — Measuring memory + the diagnostic wrapper: Instrument a provided Python script with psutil checkpoints, run it under /usr/bin/time -v, and build a personal batch-script template based on the handbook’s diagnostic wrapper. Deliverable: instrumented Python script + your diagnostic.slurm template + log from a real run.
Lab 11 — Job arrays for many independent tasks: Given 20 provided input files, write a Slurm array job (with a concurrent-tasks cap) that processes each in parallel. Verify all 20 outputs were produced. Deliverable: the array job script + the 20 outputs + seff for one task.
Lab 12 — GPU jobs (or alternative: hyperparameter sweep): Track A (GPU-access students): run a small PyTorch MNIST training on a Unity GPU node with the GPU-utilization logger; identify whether you have a dataloader bottleneck. Track B (no GPU access): run a CPU-side hyperparameter sweep as a job array. Deliverable: training log + nvidia-smi log (Track A) or sweep summary (Track B).

Capstone (weeks 13–15)

Capstone — Reproducible HPC repo: Pick a real problem from your research (or choose from suggested defaults) and produce a complete reproducible repo: environment.yml, Slurm scripts using the right-sizing methodology, a README explaining how to recreate, and actual results (figures, model, dataset, etc.) running end-to-end on Unity or OSC. Deliverable: public GitHub repo URL.