[TUHS] signals and blocked in I/O

Larry McVoy lm at mcvoy.com
Sat Dec 2 09:09:34 AEST 2017


On Fri, Dec 01, 2017 at 11:03:02PM +0000, Ralph Corderoy wrote:
> Hi Larry,
> 
> > > So OOM code kills a (random) process in hopes of freeing up some
> > > pages but if this process is stuck in diskIO, nothing can be freed
> > > and everything grinds to a halt.
> >
> > Yep, exactly.
> 
> Is that because the pages have been dirty for so long they've reached
> the VM-writeback timeout even though there's no pressure to use them for
> something else?  Or has that been lengthened because you don't fear
> power loss wiping volatile RAM?

I'm tinkering with the pageout daemon so I'm trying to apply memory
pressure.  I have 10 25GB processes (25GB malloced) and the processes just
walk the memory over and over.  This is on a 256GB main memory machine
(2 socket haswell, 28 cpus, 28 1TB SSDs, on loan from Netflix).

It's the old "10 pounds of shit in a 5 pound bag" problem, same old stuff,
just a bigger bag.

The problem is that OOM can't kill the processes that are the problem,
they are stuck in disk wait.  That's why I started asking why can't you
kill a process that's in the middle of I/O.


More information about the TUHS mailing list