Syllabus
BuckAI HPC Practicum — Syllabus
A 1-credit-hour, self-guided, asynchronous practicum in real-world HPC, taught entirely online through this site and the companion BuckAI HPC Handbook.
1. Course description
A growing subset of grad students in the natural sciences need to run code on a shared compute cluster at some point in their PhD — typically once their work involves deep learning, large datasets (satellite imagery, climate model output, video, genomics), parameter sweeps, or simulations that outgrow a laptop. If you’re not sure whether that includes you, read Is this course for you? first — many PhDs complete fine without ever touching a cluster, and that’s also a valid path.
For students who do end up needing HPC, the journey from “I have an account” to “I can submit a reproducible, right-sized job that uses cluster resources responsibly” is rarely covered in any class. This practicum fills that gap.
Over a semester, you will work through 12 weekly hands-on labs plus a 3-week capstone project, building up from your first SSH connection to a complete reproducible research repo running on Unity or OSC. Each lab is structured to take roughly one hour of focused work (though expect 30–90 minutes depending on prior background and how hard you get bitten by Slurm’s queue gods).
2. Learning outcomes
By the end of this practicum you will be able to:
- Connect to OSU’s Unity HPC cluster securely from your laptop via SSH, with multiplexing so Duo only prompts once.
- Develop on the cluster using VS Code Remote-SSH plus an AI coding assistant of your choice (Claude Code, GitHub Copilot, or Gemini Code Assist).
- Configure your shell environment (
.bashrc, aliases,PATH,umask 002for group collaboration) for productive long-term use. - Maintain persistent sessions that survive disconnects using
tmuxand thelivenodepattern. - Build and share reproducible Python environments with mamba, understanding the failure modes of mixing pip with conda.
- Run Jupyter notebooks on Unity GPUs/CPUs, viewed in your laptop’s browser via either OnDemand or an SSH tunnel.
- Submit, monitor, and debug Slurm batch jobs, including array jobs and GPU jobs.
- Right-size your resource requests (memory, CPUs, walltime) based on real measurements — not guesses — so your jobs run sooner and don’t block other users.
- Diagnose resource problems using
seff,nvidia-smi,/usr/bin/time -v, andpsutilinstrumentation. - Package a small research project as a reproducible repo with
environment.yml,conda-lock.yml, Slurm scripts, and a README.
3. Audience and prerequisites
- Primary audience: Grad students in Earth Sciences and related departments in the OSU College of Arts and Sciences.
- Welcome: Postdocs, advanced undergrads, anyone else at OSU who wants to learn HPC the practical way.
- Prerequisites:
- Comfort with Python (loops, functions, virtual environments).
- Basic Unix terminal skills (
cd,ls,cat, editing files withnanoorvim). - A laptop (macOS preferred for the SSH parts of the handbook, but Linux works analogously; Windows users should use WSL2).
- An OSU email account.
- NOT required: Prior HPC experience. Cluster knowledge. C/C++ or Fortran. Linux sysadmin background.
4. Structure: 15 weeks at a glance
| Week | Topic | Lab | Reading |
|---|---|---|---|
| 1 | SSH and connection multiplexing | Lab 01 | Handbook: SSH Setup, SSH Keys |
| 2 | VS Code Remote-SSH + an AI coding assistant | Lab 02 | Handbook: VS Code Remote-SSH |
| 3 | Shell environment foundations | Lab 03 | Handbook: Shell Environment §1–4 |
| 4 | Persistent sessions: tmux + livenode | Lab 04 | Handbook: Persistent Sessions |
| 5 | Mamba and your first project env | Lab 05 | Handbook: Python Environments §1–5 |
| 6 | The pip-conda trap and how to recover | Lab 06 | Handbook: Python Environments §6 |
| 7 | Jupyter on the cluster: OnDemand vs. SSH tunnel | Lab 07 | Handbook: Jupyter & TensorBoard |
| 8 | Your first Slurm batch job | Lab 08 | Handbook: Slurm Basics |
| 9 | Right-sizing — the headline lab | Lab 09 | Handbook: Slurm Best Practices §1–4 |
| 10 | Measuring memory + the diagnostic wrapper | Lab 10 | Handbook: Slurm Best Practices §5–9 |
| 11 | Job arrays — many independent tasks | Lab 11 | Handbook: CPU Templates §4 |
| 12 | GPU jobs (or alternative: hyperparam sweep) | Lab 12 | Handbook: GPU Templates |
| 13 | Capstone: scope, design, set up repo | Capstone | — |
| 14 | Capstone: implement, run, iterate | Capstone | — |
| 15 | Capstone: finalize, document, share | Capstone | — |
5. How each lab is structured
Every lab follows the same fixed template so you always know where to find what:
- Reading — which handbook chapter(s) to read first
- Learning objectives — 3–4 concrete things you’ll be able to do after
- Setup / Prerequisites — what should already be working from earlier labs
- Tasks — numbered, concrete steps
- Deliverables — exact artifacts to produce (file contents, screenshots, command outputs)
- Self-check — how to know you got it right
- Common issues — troubleshooting pointers for the predictable failure modes
- Time estimate — what to expect; budget more if HPC is your first rodeo
- Extensions — optional deeper challenges for the curious or already-experienced
6. Expectations and self-assessment
This is a self-guided practicum. There is no instructor watching over your shoulder, and the labs are designed so you can verify your own success against the deliverables.
A good rhythm:
- Pick a fixed time slot each week — Tuesday after lab meeting, Saturday morning, whenever works for you. Consistency matters more than total hours.
- Read the assigned handbook chapter first, then do the lab. The labs assume you’ve done the reading.
- When something doesn’t work (and it will), read the error message, check the “Common issues” section of the lab, search the handbook, then ask a peer or your PI/advisor. Cluster admins are also reachable for genuinely cluster-side problems.
- Keep your work in a personal git repo from week 1. By week 15 you’ll have a portfolio of your HPC journey.
7. Getting accounts
You’ll need at least one of these to do the labs:
- OSU Unity (ASC): Most labs target Unity. Account requests through your department or asctech.osu.edu. Free for OSU students; takes 1–3 business days.
- OSC (Pitzer / Cardinal / Owens): Optional but useful for cross-cluster comparison. Free academic accounts via my.osc.edu.
You also need a working laptop with:
- macOS (most handbook examples) or Linux (analogous); Windows users: use WSL2 for the SSH parts.
- VS Code (free, code.visualstudio.com)
- An AI coding assistant account. You’ll pick one of these in Lab 2 and use it for the rest of the course:
- Claude Code (claude.ai) — free tier works
- GitHub Copilot — free for verified students through GitHub Education
- Gemini Code Assist — through your OSU Google Workspace /
buckeyemail.osu.eduaccount
8. Academic integrity and AI use
This course explicitly expects you to use AI coding assistants — installing one (your choice among Claude Code, GitHub Copilot, or Gemini Code Assist) is a deliverable in Lab 2. Using AI to debug, explain errors, suggest commands, or write Slurm scripts is encouraged.
What we ask of you instead:
- Understand what the AI writes before submitting it as your work. If the AI gives you a
#SBATCH --gpus=4line, you should be able to explain why 4 GPUs versus 1. - Verify that AI-generated code actually does what you think. Run it. Read the output. Don’t trust blindly.
- Cite when an AI suggested a non-obvious solution — a one-line comment in your code or a note in your README is enough.
The capstone project must be your own scientific question and your own design decisions, even if AI helped you implement them.
9. Acknowledgments
Built at the BuckAI Observatory at OSU. Reading material is from the BuckAI HPC Handbook. Course content is open-source; feedback and pull requests welcome.