When I was working for a big chip testing company, a STDIO vs MMAP problem came up.
The tester was at it’s heart a SPARC VME system running Solaris. The tester read in
‘patterns’
from disk, it it literally took hours to read in all the test patterns. At the scale of
the large chip vendors,
every minute you can’t test because the tester is booting, etc means dollars are lost.
We wrote a bunch of macros that replaced the STDIO file system I/O calls with the
equivalent
mmap calls. . It turns out STDIO does a lot of prefetching and has some assumptions that
you’re
going to read a file linearly from beginning to end, whereas we wanted to jump around a
lot in the pattern files.
Pattern loading went from 4 hours to 30 minutes. Our customer was ecstatic.
Joe McGuckin
ViaNet Communications
joe(a)via.net
650-207-0372 cell
650-213-1302 office
650-969-2124 fax
On Feb 5, 2021, at 5:18 PM, John Gilmore
<gnu(a)toad.com> wrote:
On Thu, Feb 04, 2021 at 09:17:54PM -0800, Bakul Shah wrote:
Write(2)ing to a mapped page sounds pretty dodgy.
Likely to get you
in trouble in any case. Similarly read(2)ing.
Uh, no. You misunderstand completely.
The purpose of the kernel is to provide a reliable interface to system
facilities, that lets processes NOT DEPEND on what other processes are
doing.
The decision about whether Tool X uses mmap() versus read() to access a
file, or mmap() versus write() to change one, is a decision that DOES
NOT DEPEND on what Tool Y is doing. Tools X and Y may have been written
by different groups in different decades. Tool X may have been written
to use stdio, which used read(). Three years later, stdio got rewritten
to use mmap() for speed, but that's invisible to the author of Tool X.
And maybe an end user in 2025 decides to use both Tool X and Tool Y on
the same file. So only much later will any malign interactions between
read/write and mmap actually be noticed by end users. And the fix is
not to create new dependencies between Tool X, stdio, and Tool Y. It is
to fix the kernel so they do not depend on each other!
Here is a real-life example from my own experience.
There is a long-standing bug in the Linux kernel, in which the inotify()
system call simply didn't work on nested file systems. This caused a
long-standing bug in Ubuntu, which I reported in 2012 here:
https://bugs.launchpad.net/ubuntu/+source/rpcbind/+bug/977847
The symptom was that after booting from a LiveCD image, "apt-get
install" for system services (in my case an NFS client package) wouldn't
work. Turned out the system startup scripts used inotify() to notice
and start newly installed system services. The root cause was that
inotify failed because the root file system was an "overlayfs" that
overlaid a RAMdisk on top of the read-only LiveCD file system. The
people who implemented "overlayfs" didn't think inotify() was important,
or they thought it would be too much work to make it actually meet its
specs, so they just made it ignore changes to the files in the overlaid
file system. So the startup daemon's inotify() would never report the
creation of new files about the new services, because those files were
in the overlaying RAM disk, and so it would not start them and the user
would notice the error.
The underlying overlayfs bug was reported in 2011 here:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/882147
As far as I know it has never been fixed. (The bug report was
closed in 2019 for one of the usual bogus reasons.)
The problem came because real tools (like systemd, or the tail command)
actually started using inotify, assuming that as a well documented
kernel interface, it would actually meet its specs. And because a
completely unrelated other real tool (like the LiveCD installer)
actually started using overlayfs, assuming that as a well documented
kernel interface, it too would actually meet its specs. And then one
day somebody tried to use both those tools together and they failed.
That's why telling people "Don't use mmap() on the same file that you
use read() on" is an invalid attitude for a Real Kernel Maintainer.
Props to Larry McVoy for caring about this. Boos to the Linux
maintainers of overlayfs who didn't give a shit.
John