On Fri, 01 Dec 2017 16:48:50 -0800 Larry McVoy <lm(a)mcvoy.com> wrote:
Larry McVoy writes:
On Fri, Dec 01, 2017 at 03:42:15PM -0800, Bakul Shah
wrote:
On Fri, 01 Dec 2017 15:09:34 -0800 Larry McVoy
<lm(a)mcvoy.com> wrote:
Larry McVoy writes:
> On Fri, Dec 01, 2017 at 11:03:02PM +0000, Ralph Corderoy wrote:
> > Hi Larry,
> >
> > > > So OOM code kills a (random) process in hopes of freeing up some
> > > > pages but if this process is stuck in diskIO, nothing can be freed
> > > > and everything grinds to a halt.
> > >
> > > Yep, exactly.
> >
> > Is that because the pages have been dirty for so long they've reached
> > the VM-writeback timeout even though there's no pressure to use them fo
r
> > something else? Or has that been
lengthened because you don't fear
> > power loss wiping volatile RAM?
>
> I'm tinkering with the pageout daemon so I'm trying to apply memory
> pressure. I have 10 25GB processes (25GB malloced) and the processes jus
t
walk the
memory over and over. This is on a 256GB main memory machine
(2 socket haswell, 28 cpus, 28 1TB SSDs, on loan from Netflix).
How many times do processes walk their memory before this condition
occurs?
Until free memory goes to ~0. That's the point, I'm trying to
improve things when there is too much pressure on memory.
You said 10x25GB but you have 256GB. So there is still
6GB left...
So what may be happening is that a process
references a page,
it page faults, the kernel finds its phys page has been paged
out, so it looks for a free page and once a free page is
found, the process will block on page in. Or if there is no
free page, it has to wait until some other dirty page is paged
out (but this would be a different wait queue). As more and
more processes do this, the system runs out of all free pages.
Yeah.
Can you find out how many processes are waiting
under what
conditions, how long they wait and how these queue lengths are
changing over time?
So I have 10 processes, they all run until the system starts to
thrash, then they are all in wait mode for memory but there isn't
any (and there is no swap configured).
The fundamental problem is that they are sleeping waiting for memory to
be freed. They are NOT in I/O mode, there is no DMA happening, this is
main memory, it is not backed by swap, there is no swap. So they are
sleeping waiting for the pageout daemon to free some memory. It's not
going to free their memory because there is no place to stash (no swap).
So it's trying to free other memory.
This confuses me. Before I make more false assumptions,
can you show the code?
The real question is where did they go to sleep and
why did they sleep
without PCATCH on? If I can find that place where they are trying to
alloc a page and failed and they go to sleep there, I could either
Can you use kgdb to find out where they sleep?
a) commit seppuku because we are out of memory and
I'm part of the problem
b) go into a sleep / wakeup / check signals loop
I am reminded by you all that we ask the process to do it to itself but
there does seem to be a way to sleep and respect signals, the tty stuff
does that. So if I can find this place, determine that I'm just asking
for memory, not I/O, and sleep with PCATCH on then I might be golden.
Won't kgdb tell you? Or you can insert printfs. You should also print
something in your program as it walks a few pages and see if this happens
at almost the same time (when all pages are used up).
tty code probably assumes memory shortfall is a short term issue
(which is likely). Not the case with your test (unless I misguessed).
Where "golden" means I can kill the process
and the OOM thread could do
it for me.
Thoughts?
Now I am starting to think this happens as soon as all the
phys pages are used up. But looking at your program would
help.
[removd cc: TUHS]