Labs Overview
All 12 labs at a glance
The labs progress from “I just got a Unity account” to “I can run a reproducible job on a GPU node.” Each lab takes roughly 1 hour of focused work and produces a verifiable artifact (a config file, a Slurm log, a seff report, a screenshot, an environment.yml, etc.).
You should generally do the labs in order — later ones assume earlier setup is in place.
Module 1 — Connection & environment
- Lab 01 — SSH config and multiplexing
-
Generate an SSH key, copy it to ASC and Unity, write a
~/.ssh/configwith connection multiplexing, and demonstrate thatssh unityworks with only one Duo prompt per ten-minute window. Deliverable: working SSH config + screenshot/log showing two consecutive logins with only one Duo tap. - Lab 02 — VS Code Remote-SSH + an AI coding assistant
-
Install VS Code, set up Remote-SSH timeouts, connect to Unity, and install one of three AI coding assistants — Claude Code, GitHub Copilot, or Gemini Code Assist — on the remote side. Have your chosen assistant write and run a small “hello, cluster” script. Deliverable: screenshot of VS Code connected to Unity with your chosen assistant’s extension active, plus the generated
hello_cluster.py. - Lab 03 — Shell environment foundations
-
Inspect your
.bash_profileand.bashrc, addumask 002, install 5 useful aliases, find your Unix groups and Slurm partitions. Deliverable: annotated.bashrcdiff + output ofgroupsandsacctmgr show association user=$USER. - Lab 04 — Persistent sessions: tmux + livenode
-
Practice basic
tmux(new / detach / attach / kill), then install thelivenode()function and demonstrate full reconnection after a disconnect. Deliverable: transcript showing a tmux session that survived a forced disconnect, with a Python process still running inside.
Module 2 — Python environments & notebooks
- Lab 05 — Mamba and your first project env
-
Install Miniforge3, create a project env, install scientific packages, export
environment.yml, register the env as a Jupyter kernel. Deliverable:environment.yml+ output ofmamba list. - Lab 06 — The pip-conda trap and how to recover
-
Deliberately break a mamba env by
pip install-ing something that re-installs NumPy from PyPI; diagnose what happened; recover by recreating; redo correctly withpip install --no-deps. Deliverable: mini incident report + the fixed env. - Lab 07 — Jupyter on the cluster: OnDemand vs. SSH tunnel
-
Launch Jupyter via Unity OnDemand, switch to JupyterLab via the
/labURL trick. Then do the headless approach (sinteractive→jupyter notebook --no-browser→ssh -L ... mynode) and compare. Deliverable: screenshots of both approaches plus a notebook with a small plot rendered from cluster data.
Module 3 — Slurm: submitting jobs
- Lab 08 — Your first Slurm batch job
-
Write a minimal
myjob.slurmthat runs a Python script, submit, monitor withsqueue, find output, runseffon the completed job. Deliverable: Slurm script + completion log +seffoutput. - Lab 09 — Right-sizing: the headline lab
-
Take a provided single-threaded scikit-learn job that needs ~30 GB on a 96 GB/48-CPU node. Submit it with a deliberate over-allocation (whole node), examine
seff’s efficiency report, then resubmit right-sized. Deliverable: two side-by-sideseffoutputs + brief reflection on backfill scheduling. - Lab 10 — Measuring memory + the diagnostic wrapper
-
Instrument a provided Python script with
psutilcheckpoints, run it under/usr/bin/time -v, and build a personal batch-script template based on the handbook’s diagnostic wrapper. Deliverable: instrumented Python script + yourdiagnostic.slurmtemplate + log from a real run. - Lab 11 — Job arrays for many independent tasks
-
Given 20 provided input files, write a Slurm array job (with a concurrent-tasks cap) that processes each in parallel. Verify all 20 outputs were produced. Deliverable: the array job script + the 20 outputs +
sefffor one task. - Lab 12 — GPU jobs (or alternative: hyperparameter sweep)
-
Track A (GPU-access students): run a small PyTorch MNIST training on a Unity GPU node with the GPU-utilization logger; identify whether you have a dataloader bottleneck. Track B (no GPU access): run a CPU-side hyperparameter sweep as a job array. Deliverable: training log +
nvidia-smilog (Track A) or sweep summary (Track B).
Capstone (weeks 13–15)
- Capstone — Reproducible HPC repo
-
Pick a real problem from your research (or choose from suggested defaults) and produce a complete reproducible repo:
environment.yml, Slurm scripts using the right-sizing methodology, a README explaining how to recreate, and actual results (figures, model, dataset, etc.) running end-to-end on Unity or OSC. Deliverable: public GitHub repo URL.