On Thu, Feb 04, 2021 at 09:17:54PM -0800, Bakul Shah wrote:
On Feb 4, 2021, at 4:33 PM, Larry McVoy
<lm(a)mcvoy.com> wrote:
Ignoring the page cache and make their own cache has big problems.
You can mmap() ZFS files and doing so means that when a page is referenced
it is copied from the ZFS cache to the page cache. That creates a
coherency problem, I can write via the mapping and I can write via
write(2) and now you have two copies of the data that don't match,
that's pretty much OS no-no #1.
Write(2)ing to a mapped page sounds pretty dodgy. Likely to get you
in trouble in any case. Similarly read(2)ing.
The entire point of the SunOS 4.0 VM system was that the page you
saw via mmap(2) is the exact same page you saw via read(2). It's
the page cache, it has page sized chunks of memory that cache
file,offset pairs.
There is one, and only one, copy of the truth. Doesn't matter how
you get at it, there is only one "it".
ZFS broke that contract and that was a step backwards in terms of
OS design.
Let me repeat a part of my response you cut out:
And you can keep track of mapped pages and read/write from them if
necessary even if you have a separate cache for any compressed pages.
In essence you pass the ownership of a page's data from a compressed
page cache to the mapped page. Just like in processor cache coherence
algorithms there is one source of truth: the current owner of a cached
unit (line or page or whatever). In other words, the you see via mmap(2)
will be the exact same page you will see via read(2). Not having actually
tried this I may have missed corner cases + any practical considerations
complicating things but *conceptually* this doesn't seem hard.
Warner mentions not using ZFS for its double copying. May be omething
like the above can a step in the direction of integrating the caches?
As Ron says, I too would like to hear what the authors of ZFS have to say....