Lab 10 — Measuring memory + the diagnostic wrapper
Goal
Stop guessing how much memory and CPU your code needs — measure it. By the end of this lab you’ll have:
- A Python script instrumented with
psutilso it self-reports memory at every phase - A Slurm script wrapped in
/usr/bin/time -vthat reports peak resident memory - A personal
diagnostic.slurmtemplate you’ll reuse for every batch job for the rest of your research career
This lab puts a measurement loop around what you did in Lab 09 — instead of guessing-then-correcting, you’ll measure once and right-size on the first try.
Reading
- Handbook: Slurm Best Practices §5–9 — the five measurement techniques, the diagnostic wrapper template, and the
seffpost-mortem loop.
Budget ~25 minutes for the reading.
Learning objectives
- Find the “Maximum resident set size” line in
/usr/bin/time -voutput and convert it to GB. - Instrument Python code with
psutilto log memory usage at specific phases. - Build a reusable
diagnostic.slurmtemplate that includes vital-signs logging, thread-count exports, and a/usr/bin/timewrapper. - Use the resulting log to right-size future submissions on the first try.
Setup / prerequisites
- Labs 01–09 complete. In particular,
eslabenv haspsutilinstalled (it was in the Lab 5 install list).
Tasks
1. Set up the lab directory (3 min)
cd ~/hpc_practicum
mkdir -p lab10 lab10/logs ~/templates
cd lab10The ~/templates/ directory will hold your reusable diagnostic.slurm — a template you’ll copy for new projects from now on.
2. Write a memory-instrumented Python script (15 min)
Save as mem_intensive.py:
"""
mem_intensive.py — a script with three phases of distinctly different memory footprints.
Phase 1: load data (medium memory)
Phase 2: do something memory-hungry (peak memory)
Phase 3: write output (back to small memory)
"""
import os
import time
import psutil
import numpy as np
def memreport(label, t0):
"""Print current RSS for the running process."""
rss_gb = psutil.Process(os.getpid()).memory_info().rss / 1e9
elapsed = time.time() - t0
print(f"[{label:25s}] RSS: {rss_gb:6.2f} GB at t={elapsed:6.1f}s", flush=True)
def main():
t0 = time.time()
print(f"Started at {time.strftime('%Y-%m-%d %H:%M:%S')}")
memreport("startup", t0)
# ─── Phase 1: load some "data" ────────────────────────────
print("\nPhase 1: allocating a few moderately large arrays...")
A = np.random.randn(5_000_000, 32).astype(np.float32) # ~640 MB
B = np.random.randn(5_000_000, 32).astype(np.float32) # ~640 MB
memreport("after Phase 1 load", t0)
# ─── Phase 2: deliberately memory-hungry ──────────────────
print("\nPhase 2: computing a large pairwise distance-like matrix...")
# This intentionally creates a much larger temp array
chunk = 5000
distances = np.empty((chunk, chunk), dtype=np.float32)
for i in range(0, chunk):
diff = A[i:i+1, :] - B[:chunk, :] # (chunk, 32)
distances[i, :] = np.linalg.norm(diff, axis=1)
if i == chunk // 4:
memreport("Phase 2 mid", t0)
memreport("after Phase 2 compute", t0)
# ─── Phase 3: write output (release big arrays) ───────────
print("\nPhase 3: writing output, releasing memory...")
np.save("distances.npy", distances)
del A, B, distances
import gc
gc.collect()
memreport("after Phase 3 cleanup", t0)
print(f"\nFinished at {time.strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Total walltime: {time.time()-t0:.1f}s")
if __name__ == "__main__":
main()This script’s memory profile is intentionally asymmetric: Phase 2 is the peak. Most real research scripts have a similar profile — a peak somewhere in the middle, not at the end. That’s why a single mamba list after the script finishes doesn’t tell you the peak.
3. Run it interactively first (10 min)
Use an sinteractive (or livenode) session for this — you don’t want to be debugging on a login node, but you also don’t need a batch job yet.
# In a livenode or sinteractive session:
sinteractive -p batch --cpus-per-task=2 --mem=8G --time=01:00:00
mamba activate eslab
cd ~/hpc_practicum/lab10
python mem_intensive.pyRead the output. Note:
- Approximate peak RSS (it’s the highest of the
[after Phase 2 compute]or[Phase 2 mid]readouts) - Total walltime
Now run it again wrapped in /usr/bin/time -v:
/usr/bin/time -v python mem_intensive.py 2> time_output.txt
cat time_output.txt | head -25Scroll to find the line that says “Maximum resident set size (kbytes):”. Convert to GB:
Maximum resident set size (kbytes): 1234567
→ 1234567 / 1024 / 1024 ≈ 1.18 GB
This should roughly match the peak RSS you saw from the psutil checkpoints.
4. Build your diagnostic.slurm template (15 min)
Save as ~/templates/diagnostic.slurm (a reusable template you’ll copy for future projects). This is your “every batch script” wrapper:
#!/bin/bash
#SBATCH --job-name=CHANGE_ME
#SBATCH --partition=batch
#SBATCH --time=CHANGE_ME # e.g. 02:00:00
#SBATCH --cpus-per-task=CHANGE_ME # e.g. 1, 4, 8
#SBATCH --mem=CHANGE_ME # e.g. 8G, 32G — measured + ~30% headroom
#SBATCH --output=logs/%x-%j.out
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=CHANGE_ME@osu.edu
set -euo pipefail
# ───────────────────────────────────────────────────────────
# Tell numerical libraries how many threads they may use.
# Without this, NumPy/BLAS/MKL grab every CPU on the node.
# ───────────────────────────────────────────────────────────
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export OPENBLAS_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export NUMEXPR_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
# ───────────────────────────────────────────────────────────
# Vital signs — recorded at the top of every job's log.
# ───────────────────────────────────────────────────────────
echo "==================== JOB INFO ===================="
echo "Job ID: $SLURM_JOB_ID"
echo "Job name: $SLURM_JOB_NAME"
echo "Partition: $SLURM_JOB_PARTITION"
echo "Node: $(hostname)"
echo "CPUs: $SLURM_CPUS_PER_TASK"
echo "Memory (MB): ${SLURM_MEM_PER_NODE:-${SLURM_MEM_PER_CPU:-unset}}"
echo "GPUs: ${SLURM_GPUS:-${SLURM_JOB_GPUS:-none}}"
echo "Working dir: $(pwd)"
echo "Started: $(date)"
echo "=================================================="
# ───────────────────────────────────────────────────────────
# Environment activation
# ───────────────────────────────────────────────────────────
source ~/miniforge3/etc/profile.d/conda.sh
mamba activate eslab
echo "Python: $(which python)"
python -c "import sys; print(f'Python version: {sys.version.split()[0]}')"
# ───────────────────────────────────────────────────────────
# (Optional) GPU usage logger in the background
# ───────────────────────────────────────────────────────────
if command -v nvidia-smi &> /dev/null; then
(
while sleep 30; do
echo "--- $(date '+%H:%M:%S') ---"
nvidia-smi --query-gpu=utilization.gpu,memory.used,memory.total --format=csv,noheader
done
) > "logs/gpu-${SLURM_JOB_ID}.log" 2>&1 &
GPU_LOGGER_PID=$!
fi
# ───────────────────────────────────────────────────────────
# ACTUAL WORK — replace this line for each new project.
# /usr/bin/time -v captures the peak RSS for right-sizing later.
# ───────────────────────────────────────────────────────────
echo "===================== WORK ======================="
/usr/bin/time -v python mem_intensive.py
echo "=================================================="
# Clean up background logger
if [ -n "${GPU_LOGGER_PID:-}" ]; then
kill $GPU_LOGGER_PID 2>/dev/null || true
fi
echo "Finished: $(date)"
echo ""
echo "Run 'seff $SLURM_JOB_ID' after the job's epilog completes"
echo "for a one-page efficiency report (CPU% and Memory%)."This is your drop-in template. For each new project, copy it to that project’s directory and replace the four CHANGE_ME placeholders + the python line at the bottom.
5. Use the template for a real submission (10 min)
cp ~/templates/diagnostic.slurm ~/hpc_practicum/lab10/mem.slurm
cd ~/hpc_practicum/lab10Edit mem.slurm: - --job-name=lab10_mem - --time=00:30:00 (more than enough for this script) - --cpus-per-task=1 (the script is mostly single-threaded NumPy) - --mem=4G (your interactive measurement said peak ~1 GB; safety margin to 4G) - --mail-user=yourname@osu.edu
Submit:
sbatch mem.slurmWhen it finishes, examine the log:
cat logs/lab10_mem-<jobid>.out | head -40 # vital-signs block
grep -A 30 "Command being timed" logs/lab10_mem-<jobid>.out | head -35
seff <jobid>In the /usr/bin/time -v output, find: - Maximum resident set size (kbytes): — convert to GB - Elapsed (wall clock) time — actual runtime
In seff: - Memory Efficiency — should be reasonable (>25%, ideally >50%) if you chose --mem=4G correctly
6. (Optional) Add real-time GPU tracking (5 min — skip if you don’t have GPU access)
The template already includes the GPU-usage logger (Section 4). If you ran a GPU job, you’d find logs/gpu-<jobid>.log with nvidia-smi samples every 30 seconds. You’ll use this in Lab 12.
Deliverables
Save to lab10/ in your personal repo:
lab10/mem_intensive.py— the instrumented script from Task 2.lab10/diagnostic.slurm— a copy of your template (the one in~/templates/diagnostic.slurm). Redact any real email if you have--mail-user.lab10/mem.slurm— the project-specific version you used in Task 5.lab10/job_log.txt— the full log from your real Slurm submission. Should include:- Vital-signs block at the top
- All four
psutilcheckpoints - The
/usr/bin/time -voutput block (with peak RSS in kbytes)
lab10/right_sized.md— a short writeup:- What was the peak RSS reported by
/usr/bin/time -v? (Convert to GB.) - What was Memory Efficiency from
seff? - If you ran this 100 times in production, what
--memwould you settle on, and why?
- What was the peak RSS reported by
Self-check
Common issues
❌ /usr/bin/time -v not found — only the bash builtin
On Unity it should be there as /usr/bin/time. If not, install via:
mamba install -n eslab time…or use the explicit path /usr/bin/time -v (don’t use the bash builtin time, which doesn’t have -v).
❌ psutil not installed
mamba activate eslab
mamba install psutilIf you skipped psutil in Lab 5, add it now.
❌ The /usr/bin/time -v output goes to stderr and mixes with my Python output
That’s normal — /usr/bin/time -v writes to stderr. Slurm captures both stdout and stderr into your --output= file, so it all ends up in the log. To split them in a more advanced setup, use --output= and --error= to separate files.
❌ psutil peak RSS is much smaller than /usr/bin/time -v peak RSS
That can happen if your code allocates and quickly releases memory between the psutil checkpoints. /usr/bin/time -v captures the all-time peak; psutil only captures what was alive when you called it. To improve, add more checkpoints, or sample psutil periodically in a background thread.
❌ Memory Efficiency from seff is 99% but the job didn’t OOM
Slurm rounds reservations to discrete sizes. If you asked for 4G but the actual node-side reservation was 4096 MiB and your peak was 4080 MiB, you’re at 99% — fine, no kill. If you’d asked for 5G you’d be at ~80% — pick the level you want.
Time estimate
- Reading: ~25 min
- Tasks: ~50 min (mostly running things and reading their output)
- Deliverables: ~15 min
Total: ~1.5 hours
Extensions (optional)
Sample psutil periodically in a background thread
For better continuous tracking, set up a sampler:
import threading, time, psutil, os
def memlog(interval=5):
proc = psutil.Process(os.getpid())
while True:
rss_gb = proc.memory_info().rss / 1e9
print(f"[bg-sampler] RSS: {rss_gb:.2f} GB at {time.strftime('%H:%M:%S')}", flush=True)
time.sleep(interval)
threading.Thread(target=memlog, daemon=True).start()This logs memory every 5 seconds throughout the run, without needing to scatter manual checkpoints.
Try memray for fancier profiling
memray is a more powerful memory profiler. Install with mamba install -c conda-forge memray, then:
memray run --output mem.bin python mem_intensive.py
memray flamegraph mem.binProduces an HTML flamegraph showing where memory accumulates by call site.
Use sstat for live tracking during a running job
While a Slurm job is in R state:
sstat -j <jobid> --format=JobID,MaxRSS,AveRSS,MaxVMSizeReturns the current max-resident-set-size live. Useful for catching a job that’s about to OOM before it dies.
What’s next?
You can now measure resources. Lab 11 — Job arrays for many independent tasks scales the same techniques to processing many input files in parallel via a single submission.