> the downstream process is in the middle of a read call (waiting for
> more data to be put in the pipe), and it has already computed a pointer
> to the pipe's inode, and it's looping waiting for that inode to have
> data.
> So now I have to regroup and figure out how to deal with that. My most
> likely approach is to copy the inode data across
So I've had a good look at the pipe code, and it turns out that the simple
hack won't work, for two reasons.
First, the pipe on the _other_ side of the middle process is _also_ probably
in the middle of a write call, and so you can't snarf its inode out from
underneath it. (This whole problem reminds me of 'musical chairs' - I just
want the music to stop so everything will go quiet so I can move things
around! :-)
Second, if the process that wants to close down and do a splice is either the
start or end process, its neighbour is going to go from having a pipe to
having a plain file - and the pipe code knows the inode for a pipe has two
users, etc.
So I think it would be necessary to make non-trivial adjustments to the pipe
and file reading/writing code to make this work; either i) some sort of flag
bit to say 'you've been spliced, take appropriate action' which the pipe code
would have to check on being woken up, and then back out to let the main file
reading/writing code take another crack at it, or ii) perhaps some sort of
non-local goto to forcefully back out the call to readp()/writep(), back to
the start of the read/write sequence.
(Simply terminating the read/write call will not work, I think, because that
will often, AFAICT, return with 0 bytes transferred, which will look like an
EOF, etc; so the I/O will have to be restarted.)
I'm not sure I want to do the work to make this actually work - it's not
clear if anyone is really that interested? And it's not something that I'm
interested in having for my own use.
Anyway, none of this is in any way a problem with the fundamental service
model - it's purely kernel implementation issues.
Noel
Ok, this is cheating a bit but I was wondering if I could possibly
compile my unix v6 version of unirubik which has working file IO and
run it under unix v5.
At first I couldn't figure out how to send a binary from unix v6 to
unix v5 but I did some experimenting and found:
tp m1r unirubik
which would output unirubik to mag tape #1 and
tp m1x unirubik
which would input unirubik from mag tape #1.
I don't know what cc does exactly but I thought "well if it compiles
to PDP-11 machine code and it's statically linked it could work". And
it actually does work!
I still want to try to get unirubik to compile under Unix v5 cc but
it's interesting that a program that uses iolib functions can work
under unix v5.
Mark
> From: Norman Wilson <norman(a)oclsc.org>
> I believe that when sync(2) returned, all unflushed I/O had been queued
> to the device driver, but not necessarily finished
Yes. I have just looked at update() (the internal version of 'sync') again,
and it does three things: writes out super-blocks, any modified inodes, and
(finally) any cached disk blocks (in that order).
In all three cases, the code calls (either directly or indirectly) bwrite(),
the exact operation of which (wait for completion, or merely schedule the
operation) on any given buffer depends on the flag bits on that buffer.
At least one of the cases (the third), it sets the 'ASYNC' bit on the buffer,
i.e. it doesn't wait for the I/O to complete, merely schedules it. For the
first two, though, it looks like it probably waits.
> so the second sync was just a time-filling no-op. If all the disks were
> in view, it probably sufficed just to watch them until all the lights
> ... had stopped blinking.
Yes. If the system is single-user, and you say 'sync', if you wait a bit for
the I/O to complete, any later syncs won't actually do anything.
I don't know of any programmatic way to make sure that all the disk I/O has
completed (although obviously one could be written); even the 'unmount' call
doesn't check to make sure all the I/O is completed (it just calls update()).
Watching the lights was as good as anything.
> I usually typed sync three or four times myself.
I usually just type it once, wait a moment, and then halt the machine. I've
never experienced disk corruption from so doing.
With modern ginormous disk caches, you might have to wait more than a moment,
but we're talking older machines here...
Noel
After a day and an evening of fighting with modern hardware,
the modern tangle that passes for UNIX nowadays, and modern
e-merchandising, I am too lazy to go look up the details.
But as I remember it, two syncs was indeed probably enough.
I believe that when sync(2) returned, all unflushed I/O had
been queued to the device driver, but not necessarily finished,
so the second sync was just a time-filling no-op. If all the
disks were in view, it probably sufficed just to watch them
until all the lights (little incandescent bulbs in those days,
not LEDs) had stopped blinking.
I usually typed sync three or four times myself. It gave me
a comfortable feeling (the opposite of a syncing feeling, I
suppose). I still occasionally type `sync' to the shell as
a sort of comfort word while thinking about what I'm going
to do next. Old habits die hard.
(sync; sync; sync)
Norman Wilson
Toronto ON
> From: Doug McIlroy <doug(a)cs.dartmouth.edu>
> Process A spawns process B, which reads stdin with buffering. B gets
> all it deserves from stdin and exits. What's left in the buffer,
> intehded for A, is lost.
Ah. Got it.
The problem is not with buffering as a generic approach, the problem is that
you're trying to use a buffering package intended for simple,
straight-forward situations in one which doesn't fall into that category! :-)
Clearly, either B has to i) be able to put back data which was not for it
('ungets' as a system call), or ii) not read the data that's not for it - but
that may be incompatible with the concept of buffering the input (depending
on the syntax, and thus the ability to predict the approaching of the data B
wants, the only way to avoid the need for ungetc() might be to read a byte at
a time).
If B and its upstream (U) are written together, that could be another way to
deal with it: if U knows where B's syntatical boundaries are, it can give it
advance warning, and B could then use a non-trivial buffering package to do
the right thing. E.g. if U emits 'records' with a header giving the record
length X, B could tell its buffering package 'don't read ahead more than X
bytes until I tell you to go ahead with the next record'.
Of course, that's not a general solution; it only works with prepared U's.
Really, the only general, efficient way to deal with that situation that I can
see is to add 'ungets' to the operating system...
Noel
>> From: Doug McIlroy <doug(a)cs.dartmouth.edu>
>> The spec below isn't hard: just hook two buffer chains together and
>> twiddle a couple of file desciptors.
> In thinking about how to implement it, I was thinking that if there was
> any buffered data in an output pipe, that the process doing the
> splice() would wait (inside the splice() system call) on all the
> buffered data being read by the down-stream process.
> ...
> As a side-benefit, if one adopted that line, one wouldn't have to deal
> with the case (in the middle of the chain) of a pipe-pipe splice with u
> buffered data in both pipes (where one would have to copy the data
> across); instead one could just use the exact same code for both cases
So a couple of days ago I suffered a Big Hack Attack and actually wrote the
code for splice() (for V6, of course :-).
It took me a day or so to get 'mostly' running. (I got tripped up by pointer
arithmetic issues in a number of places, because V6 declares just about
_everything_ to be "int *", so e.g. "ip + 1" doesn't produce the right value
for sleep() if ip is declared to be "struct inode *", which is what I did
automatically.)
My code only had one real bug so far (I forgot to mark the user's channels as
closed, which resulted in their file entries getting sub-zero usage counts
when the middle (departing) process exited).
However, now I have run across a real problem: I was just copying the system
file table entry for the middle process' input channel over to the entry for
the downstream's input (so further reads on its part would read the channel
the middle process used to be reading). Copying the data from one entry to
another meant I didn't have to go chase down file table pointers in the other
process' U structure, etc.
Alas, this simple approach doesn't work.
Using the approach I outlined (where the middle channel waits for the
downstream pipe to be empty, so it can discard it and do the splice by
copying the file table entries) doesn't work, because the downstream process
is in the middle of a read call (waiting for more data to be put in the
pipe), and it has already computed a pointer to the pipe's inode, and it's
looping waiting for that inode to have data.
So now I have to regroup and figure out how to deal with that. My most likely
approach is to copy the inode data across (so I don't have to go mess with the
downstream process to get it to go look at another inode), but i) I want to
think about it a bit first, and ii) I have to check that it won't screw
anything else up if I move the inode data to another slot.
Noel
> From: Mark Longridge <cubexyz(a)gmail.com>
> I was wondering if there might be a better way to do a shutdown on
> early unix.
Not really; I don't seem to recall our having one on the MIT V6 machine.
(We did add a 'reboot' system call so we could reboot the machine without
having to take the elevator up to the machine room [the console was on our
floor, and the reboot() call just jumped into the hardware bootstrap], but in
the source it doesn't even bother to do an update(). Well, I should't say
that: I only have the source for the kernel, which doesn't; I don't at the
moment have access to the source for the rest of the system - although I do
have some full dump tapes, once I can work out how to read them. Anyway, so
maybe the user command for rebooting the system did a sync() first.)
I suppose you could set the switch register to 173030 and send a 'kill -1 1',
which IIRC kills of all shells except the one on the console, but somehow
I doubt you're running multi-user anyway... :-)
Noel
>> the cp command seems different from all other versions, I'm not sure I
>> understand it so I used the mv command instead which worked as expected.
>
> I'm intrigued; in what way is it different?
It seems that one must first cp a file to another file then do a mv to
actually put it into a different directory:
e.g. while in /usr/src
as ctr0.s
cp a.out ctr0.o
mv ctr0.o /usr/lib
...rather than trying to just "cp a.out /usr/lib/ctr0.o"
Mark
Yes, an evil necessary to get things going.
The very definition of original sin.
Doug
Larry McVoy wrote:
>>>> For stdio, of course, one would need fsplice(3), which must flush the
>>>> in-process buffers--penance for stdio's original sin of said buffering.
>>> Err, why is buffering data in the process a sin? (Or was this just a
>>> humourous aside?)
>> Process A spawns process B, which reads stdin with buffering. B gets
>> all it deserves from stdin and exits. What's left in the buffer,
>> intehded for A, is lost. Sinful.
> It really depends on what you want. That buffering is a big win for
> some use cases. Even on today's processors reading a byte at a time via
> read(2) is costly. Like 5000x more costly on the laptop I'm typing on:
> Err, why is buffering data in the process a sin? (Or was this just a
humourous aside?)
Process A spawns process B, which reads stdin with buffering. B gets
all it deserves from stdin and exits. What's left in the buffer,
intehded for A, is lost. Sinful.