On Fri, Dec 01, 2017 at 03:09:34PM -0800, Larry McVoy wrote:
It's the old "10 pounds of shit in a 5 pound bag" problem, same old
stuff,
just a bigger bag.
The problem is that OOM can't kill the processes that are the problem,
they are stuck in disk wait. That's why I started asking why can't you
kill a process that's in the middle of I/O.
You may need to solve the problem much earlier, by write throttling
the process which is generating so many dirty pages in the first
place. At one point Linux would press-gang the badly behaved process
which was generating lots of dirty pages into helping to deactivate
and clean pages; it doesn't do this any more, but stopping processes
which are being badly behaved until the writeback daemons can catch up
is certainly kinder than OOM-killing the bad process.
Are you using ZFS? It does have a write throttling knob, apparently.
- Ted