After a day and an evening of fighting with modern hardware,
the modern tangle that passes for UNIX nowadays, and modern
e-merchandising, I am too lazy to go look up the details.
But as I remember it, two syncs was indeed probably enough.
I believe that when sync(2) returned, all unflushed I/O had
been queued to the device driver, but not necessarily finished,
so the second sync was just a time-filling no-op. If all the
disks were in view, it probably sufficed just to watch them
until all the lights (little incandescent bulbs in those days,
not LEDs) had stopped blinking.
I usually typed sync three or four times myself. It gave me
a comfortable feeling (the opposite of a syncing feeling, I
suppose). I still occasionally type `sync' to the shell as
a sort of comfort word while thinking about what I'm going
to do next. Old habits die hard.
(sync; sync; sync)
Norman Wilson
Toronto ON
> From: Doug McIlroy <doug(a)cs.dartmouth.edu>
> Process A spawns process B, which reads stdin with buffering. B gets
> all it deserves from stdin and exits. What's left in the buffer,
> intehded for A, is lost.
Ah. Got it.
The problem is not with buffering as a generic approach, the problem is that
you're trying to use a buffering package intended for simple,
straight-forward situations in one which doesn't fall into that category! :-)
Clearly, either B has to i) be able to put back data which was not for it
('ungets' as a system call), or ii) not read the data that's not for it - but
that may be incompatible with the concept of buffering the input (depending
on the syntax, and thus the ability to predict the approaching of the data B
wants, the only way to avoid the need for ungetc() might be to read a byte at
a time).
If B and its upstream (U) are written together, that could be another way to
deal with it: if U knows where B's syntatical boundaries are, it can give it
advance warning, and B could then use a non-trivial buffering package to do
the right thing. E.g. if U emits 'records' with a header giving the record
length X, B could tell its buffering package 'don't read ahead more than X
bytes until I tell you to go ahead with the next record'.
Of course, that's not a general solution; it only works with prepared U's.
Really, the only general, efficient way to deal with that situation that I can
see is to add 'ungets' to the operating system...
Noel
>> From: Doug McIlroy <doug(a)cs.dartmouth.edu>
>> The spec below isn't hard: just hook two buffer chains together and
>> twiddle a couple of file desciptors.
> In thinking about how to implement it, I was thinking that if there was
> any buffered data in an output pipe, that the process doing the
> splice() would wait (inside the splice() system call) on all the
> buffered data being read by the down-stream process.
> ...
> As a side-benefit, if one adopted that line, one wouldn't have to deal
> with the case (in the middle of the chain) of a pipe-pipe splice with u
> buffered data in both pipes (where one would have to copy the data
> across); instead one could just use the exact same code for both cases
So a couple of days ago I suffered a Big Hack Attack and actually wrote the
code for splice() (for V6, of course :-).
It took me a day or so to get 'mostly' running. (I got tripped up by pointer
arithmetic issues in a number of places, because V6 declares just about
_everything_ to be "int *", so e.g. "ip + 1" doesn't produce the right value
for sleep() if ip is declared to be "struct inode *", which is what I did
automatically.)
My code only had one real bug so far (I forgot to mark the user's channels as
closed, which resulted in their file entries getting sub-zero usage counts
when the middle (departing) process exited).
However, now I have run across a real problem: I was just copying the system
file table entry for the middle process' input channel over to the entry for
the downstream's input (so further reads on its part would read the channel
the middle process used to be reading). Copying the data from one entry to
another meant I didn't have to go chase down file table pointers in the other
process' U structure, etc.
Alas, this simple approach doesn't work.
Using the approach I outlined (where the middle channel waits for the
downstream pipe to be empty, so it can discard it and do the splice by
copying the file table entries) doesn't work, because the downstream process
is in the middle of a read call (waiting for more data to be put in the
pipe), and it has already computed a pointer to the pipe's inode, and it's
looping waiting for that inode to have data.
So now I have to regroup and figure out how to deal with that. My most likely
approach is to copy the inode data across (so I don't have to go mess with the
downstream process to get it to go look at another inode), but i) I want to
think about it a bit first, and ii) I have to check that it won't screw
anything else up if I move the inode data to another slot.
Noel
> From: Mark Longridge <cubexyz(a)gmail.com>
> I was wondering if there might be a better way to do a shutdown on
> early unix.
Not really; I don't seem to recall our having one on the MIT V6 machine.
(We did add a 'reboot' system call so we could reboot the machine without
having to take the elevator up to the machine room [the console was on our
floor, and the reboot() call just jumped into the hardware bootstrap], but in
the source it doesn't even bother to do an update(). Well, I should't say
that: I only have the source for the kernel, which doesn't; I don't at the
moment have access to the source for the rest of the system - although I do
have some full dump tapes, once I can work out how to read them. Anyway, so
maybe the user command for rebooting the system did a sync() first.)
I suppose you could set the switch register to 173030 and send a 'kill -1 1',
which IIRC kills of all shells except the one on the console, but somehow
I doubt you're running multi-user anyway... :-)
Noel
>> the cp command seems different from all other versions, I'm not sure I
>> understand it so I used the mv command instead which worked as expected.
>
> I'm intrigued; in what way is it different?
It seems that one must first cp a file to another file then do a mv to
actually put it into a different directory:
e.g. while in /usr/src
as ctr0.s
cp a.out ctr0.o
mv ctr0.o /usr/lib
...rather than trying to just "cp a.out /usr/lib/ctr0.o"
Mark
Yes, an evil necessary to get things going.
The very definition of original sin.
Doug
Larry McVoy wrote:
>>>> For stdio, of course, one would need fsplice(3), which must flush the
>>>> in-process buffers--penance for stdio's original sin of said buffering.
>>> Err, why is buffering data in the process a sin? (Or was this just a
>>> humourous aside?)
>> Process A spawns process B, which reads stdin with buffering. B gets
>> all it deserves from stdin and exits. What's left in the buffer,
>> intehded for A, is lost. Sinful.
> It really depends on what you want. That buffering is a big win for
> some use cases. Even on today's processors reading a byte at a time via
> read(2) is costly. Like 5000x more costly on the laptop I'm typing on:
> Err, why is buffering data in the process a sin? (Or was this just a
humourous aside?)
Process A spawns process B, which reads stdin with buffering. B gets
all it deserves from stdin and exits. What's left in the buffer,
intehded for A, is lost. Sinful.
> From: Doug McIlroy <doug(a)cs.dartmouth.edu>
> The spec below isn't hard: just hook two buffer chains together and
> twiddle a couple of file desciptors.
How amusing! I was about to send a message with almost the exact same
description - it even had the exact same syntax for the splice() call! A
couple of points from my thoughts which were not covered in your message:
In thinking about how to implement it, I was thinking that if there was any
buffered data in an output pipe, that the process doing the splice() would
wait (inside the splice() system call) on all the buffered data being read by
the down-stream process.
The main point of this is for the case where the up-stream is the head of the
chain (i.e. it's reading from a file), where one more or less has to wait,
because one will want to set the down-streams' file descriptor to point to
the file - but one can't really do that until all the buffered data was
consumed (else it will be lost - one can't exactly put it into the file :-).
As a side-benefit, if one adopted that line, one wouldn't have to deal with
the case (in the middle of the chain) of a pipe-pipe splice with buffered
data in both pipes (where one would have to copy the data across); instead
one could just use the exact same code for both cases, and in that case the
wait would be until the down-stream pipe can simply be discarded.
One thing I couldn't decide is what to do if the upstream is a pipe with
buffered data, and the downstream is a file - does one discard the buffered
data, write it to the file, abort the system call so the calling process can
deal with the buffered data, or what? Perhaps there could be a flag argument
to control the behaviour in such cases.
Speaking of which, I'm not sure I quite grokked this:
> If file descriptor fd0 is associated with a pipe and fd1 is not, then
> fd1 is updated to reflect the effect of buffered data for fd0, and the
> pipe's other descriptor is replaced with a duplicate of fd1.
But what happens to the data? Is it written to the file? (That's the
implication, but it's not stated directly.)
> The same statement holds when "fd0" is exchanged with "fd1" and "write"
> is exchanged with "read".
Ditto - what happens to the data? One can't simply stuff it into the input
file? I think the 'wait in the system call until it drains' approach is
better.
Also, it seemed to me that the right thing to do was to bash the entry in the
system-wide file table (i.e. not the specific pointers in the u area). That
would automatically pick up any children.
Finally, there are 'potential' security issues (I say 'potential' because I'm
not sure they're really problems). For instance, suppose that an end process
(i.e. reading/writing a file) has access to that file (e.g. because it
executed a SUID program), but its neighbour process does not. If the end
process wants to go away, should the neighbour process be allowed access to
the file? A 'simple' implementation would do so (since IIRC file permissions
are only checked at open time, not read/write time).
I don't pretend that this is a complete list of issues - just what I managed
to think up while considering the new call.
> For stdio, of course, one would need fsplice(3), which must flush the
> in-process buffers--penance for stdio's original sin of said buffering.
Err, why is buffering data in the process a sin? (Or was this just a
humourous aside?)
Noel
Larry wrote in separate emails
> If you really think that this could be done I'd suggest trying to
> write the man page for the call.
> I already claimed splice(2) back in 1998; the Linux guys did
> implement part of it ...
I began to write the following spec without knowing that Linux had
appropriated the name "splice" for a capability that was in DTSS
over 40 years ago under a more accurate name, "copy". The spec
below isn't hard: just hook two buffer chains together and twiddle
a couple of file desciptors. For stdio, of course, one would need
fsplice(3), which must flush the in-process buffers--penance for
stdio's original sin of said buffering.
Incidentally, the question is not abstract. I have code that takes
quadratic time because it grows a pipeline of length proportional
to the input, though only a bounded number of the processes are
usefully active at any one time; the rest are cats. Splicing out
the cats would make it linear. Linear approaches that don't need
splice are not nearly as clean.
Doug
SPLICE(2)
SYNOPSIS
int splice(int fd0, int fd1);
DESCRIPTION
Splice connects the source for a reading file descriptor fd0
directly to the destination for a writing file descriptor fd1
and closes both fd0 and fd1. Either the source or the destination
must be another process (via a pipe). Data buffered for fd0 at
the time of splicing follows such data for fd1. If both source
and destination are processes, they become connected by a pipe. If
the source (destination) is a process, the file descriptor
in that process becomes write-only (read-only).
If file descriptor fd0 is associated with a pipe and fd1 is not,
then fd1 is updated to reflect the effect of buffered data for fd0,
and the pipe's other descriptor is replaced with a duplicate of fd1.
The same statement holds when "fd0" is exchanged with "fd1" and
"write" is exchanged with "read".
Splice's effect on any file descriptor propagates to shared file
descriptors in all processes.
NOTES
One file must be a pipe lest the spliced data stream have no
controlling process. It might seem that a socket would suffice,
ceding control to a remote system; but that would allow the
uncontrolled connection file-socket-socket-file.
The provision about a file descriptor becoming either write-only or
read-only sidesteps complications due to read-write file descriptors.
> From: Dave Horsfall <dave(a)horsfall.org>
> crt0.s -> C Run Time (support). It jiggers the stack pointer in some
> obscure manner
It's the initial startup; it sets up the arguments into the canonical C form,
and then calls main(). (It does not do the initial stack frame, a canonical
call to CSV from inside main() will do that.) Here are the exact details:
On an exec(), once the exec() returns, the arguments are available at the
very top of memory: the arguments themselves are at the top, as a sequence of
zero-terminated byte strings. Below them is an array of word pointers to the
arguments, with a -1 in the last entry. (I.e. if there are N arguments, the
array of pointers has N+1 entries, with the last being -1.) Below that is a
word containing the size of that array (i.e. N+1).
The Stack Pointer register points to that count word; all other registers
(including the PC) are cleared.
All CRT0.s does is move that argument count word down one location on the
stack, adjust the SP to point to it, and put a pointer to the argument
pointer table in the now-free word (between the argument count, and the first
element of the argument pointer table). Hence the canonical C main() argument
list of:
int argc;
int **argv;
If/when main() returns, it takes the return value (passed in r0) and calls
exit() with it. (If using the stdio library, that exit() flushes the buffers
and closes all open files.) Should _that_ return, it does a 'sys exit'.
There are two variant forms: fcrt0.s arranges for the floating point
emulation to be loaded, and hooked up; mcrt0.s (much more complicated)
arranges for process monitoring to be done.
Noel