[TUHS] Re: VM over-commit (and the OOM killers)

Lawrence Stewart

28 Feb 2025 28 Feb '25

3:57 p.m.

I’m probably a lost soul on this issue, but swap space is just a way to turn program bugs into performance problems. In HPC one says “real programs need real memory”. At SiCortex we ran 972 node cluster machines without any swap space (4 or 8 GB per node) and it worked fine. Of course we didn’t have any disks either, so we made a virtue of necessity. It is perfectly true that the OOM killer was feared and hated, but only because it couldn’t identify the actual bad apple. I realize this attitude only works when you (pretty much) dedicate a node to running a single program at a time, but that is how most HPC systems of the time worked. -L

Back to the thread

Back to the list