On Sep 28, 2017, at 7:07 AM, Larry McVoy
<lm(a)mcvoy.com> wrote:
On Thu, Sep 28, 2017 at 07:49:17AM -0600, arnold(a)skeeve.com wrote:
Kevin Bowling <kevin.bowling(a)kev009.com>
wrote:
I guess alternatively, what was interesting or
neat, about RFS, if
anything? And what was bad?
Good: Stateful implementation, remote devices worked.
I'd argue that stateful is really hard to get right when machines panic
or reboot. Maybe you can do it on the client but how does one save all
that state on the server when the server crashes?
NFS seems simple in hindsight but like a lot of things, getting to that
simple wasn't chance, it was designed to be stateless because nobody
had a way to save the state in any reasonable way.
I have some first hand experience with this.... in 1984.
Valid Logic Systems Inc, an early VLSI design vendor hired me
as a contractor to fix bugs in this funky ethernet driver they
had (from Lucas films, IIRC) that did some remote file
operations. I proposed that instead I do a "proper" networked
file system and to my amazement they agreed to let me build a
prototype.
I first built an RPC layer (ethertype 1600 -- see RFC 1700!)
and then EFS (extended FS) that allowed access to remote
files. Being a one man team I punted on generality. Just
hand-built separate marshall/unmarshall function for each
remote procedure. No mounts. Every node's FS was visible to
everyone else (subject to Unix permissions). /net/ path prefix
was for remote files.
All this took about 2-3 months. Performance was decent for a
1984 era workstation. Encouraged by the progress I suggested
we add in missing functionality such as the ability to chdir
to a remote dir etc. Yes, state! And complications!
On bootup every node advertized its presence & a "generation"
number (incremented by 1 from the last gen) so that other
nodes can drop old outstanding state -- not unlike a disk
dying but still messy to clean things up. Next had to make
scheduling priority for remote operations to be interruptible.
People didn't like "cd /net/foo" hanging indefinitely! unlink
and mv were a problem (machine A wouldn't know if machine B
did this). rm was easy to fix -- just add a refcount for every
remote machine with an open. mv not so. I don't think I ever
solved this. Local FS read/write are atomic so I tried very
hard to make the remote read/writes atomic as well. This can
get interesting in presence of a node crashing....
At about this time, Sun gave a presentation on NFS to Valid.
I suspect Valid also realized that doing this properly was
a much bigger than a one man project. Result: they terminated
the project. It was a fun project while it lasted. The fact
this much was done was thanks to a lot of invaluable help
from my friend Jamie Markevitch (also a contractor @ Valid
at that point).
At the time I thought all of these stateful problems were
solvable given more time but now I am not so sure. But as
a result of that belief I never really liked NFS. I felt
they took the easy way out.