BuckAI HPC Handbook
What this handbook is
A practical, opinionated reference for using OSUβs Unity HPC cluster β written for BuckAI Observatory students, postdocs, and collaborators. It takes you from βI just got my HPC account and the SSH prompt is rejecting meβ to βIβm training models on GPU nodes from VS Code on my laptop, with my AI coding assistant (Claude Code / Copilot / Gemini) helping me debug, while sharing a reproducible mamba environment with my labmates.β
Itβs opinionated about workflows that work well in practice, and warns about the ones that look fine but break in subtle ways.
Browse by topic
SSH
Connect securely from your laptop to Unity through the ASC jumphost β with one Duo prompt per ten-minute window instead of one per connection.
- SSH Setup β config, multiplexing, keepalives
- SSH Keys β concepts and best practices
- VS Code Remote-SSH + AI Coding Assistants β the recommended dev environment, with options for Claude Code, GitHub Copilot, and Gemini Code Assist
HPC fundamentals
The shell, environments, and patterns that turn a raw cluster account into a productive setup.
- Shell Environment β
.bashrc, aliases,PATH,umask, groups - Persistent Sessions β
tmux,nohup, and thelivenodepattern - Jupyter & TensorBoard β OnDemand and SSH-tunnel workflows
- Python Environments β mamba, conda, pip β and the pip trap
Slurm
Submit jobs, request the right resources, and avoid blocking the cluster for others.
- Slurm Basics β
sbatch,squeue,#SBATCHdirectives - Best Practices β measuring memory and CPU needs; the diagnostics wrapper
- CPU Templates β drop-in scripts for the common cases
- GPU Templates β single-GPU, multi-GPU, hyperparameter sweeps
Where to start
If youβve just been given an OSU HPC account and want a sensible reading order, work through these in sequence:
- SSH Setup β write a working
~/.ssh/configwith connection multiplexing so you only Duo-tap once. - SSH Keys β the concepts behind why your setup is secure, and how to manage keys long-term.
- VS Code Remote-SSH + AI Coding Assistants β set up the editor + AI assistant (Claude Code, Copilot, or Gemini) that youβll spend most of your time in.
- Shell Environment β make
.bashrcwork for you (aliases,PATH,umask 002for group collaboration, your Unix groups and Slurm partitions). - Persistent Sessions β keep work alive across disconnects with
tmuxand thelivenodefunction. - Python Environments β install mamba, create per-project envs, and avoid the nightmare of mixing pip with conda incorrectly.
- Jupyter & TensorBoard β run interactive notebooks and live training dashboards in your laptopβs browser while compute happens on Unity.
- Slurm Basics β how to submit unattended jobs with
sbatch. - Slurm Best Practices β the most important Slurm skill: right-sizing memory, CPU, and walltime requests so your jobs run sooner and donβt block others.
Returning to look something up? Use the search box at the top of the sidebar β it indexes every page.
Conventions used in this handbook
- Copy-paste code: Code blocks are ready to paste, with placeholders you replace:
yourname.##β your OSU username (e.g.smith.123)<group>β your Slurm partition name (oftenbatch) or your Unix group name<username>β same as above, in path contextsmynodeβ a real compute-node hostname (on Unity these followuXXX:u101,u250,u500, etc.)buckai_keyβ the name we use for your SSH private key
- Emoji legend:
- β β a recommended practice or thing to do
- β β a problem symptom or common mistake
- β β a verification step (βyou should see thisβ)
- β β a caution or non-obvious gotcha
About
This handbook is maintained by the BuckAI Observatory at Ohio State University.
- Source: github.com/buckai-observatory/buckai-hpc-handbook
- Corrections, additions, and pull requests welcome β open an issue or PR.
- Built with Quarto.