> From: Doug McIlroy <doug(a)cs.dartmouth.edu>
> The spec below isn't hard: just hook two buffer chains together and
> twiddle a couple of file desciptors.
How amusing! I was about to send a message with almost the exact same
description - it even had the exact same syntax for the splice() call! A
couple of points from my thoughts which were not covered in your message:
In thinking about how to implement it, I was thinking that if there was any
buffered data in an output pipe, that the process doing the splice() would
wait (inside the splice() system call) on all the buffered data being read by
the down-stream process.
The main point of this is for the case where the up-stream is the head of the
chain (i.e. it's reading from a file), where one more or less has to wait,
because one will want to set the down-streams' file descriptor to point to
the file - but one can't really do that until all the buffered data was
consumed (else it will be lost - one can't exactly put it into the file :-).
As a side-benefit, if one adopted that line, one wouldn't have to deal with
the case (in the middle of the chain) of a pipe-pipe splice with buffered
data in both pipes (where one would have to copy the data across); instead
one could just use the exact same code for both cases, and in that case the
wait would be until the down-stream pipe can simply be discarded.
One thing I couldn't decide is what to do if the upstream is a pipe with
buffered data, and the downstream is a file - does one discard the buffered
data, write it to the file, abort the system call so the calling process can
deal with the buffered data, or what? Perhaps there could be a flag argument
to control the behaviour in such cases.
Speaking of which, I'm not sure I quite grokked this:
> If file descriptor fd0 is associated with a pipe and fd1 is not, then
> fd1 is updated to reflect the effect of buffered data for fd0, and the
> pipe's other descriptor is replaced with a duplicate of fd1.
But what happens to the data? Is it written to the file? (That's the
implication, but it's not stated directly.)
> The same statement holds when "fd0" is exchanged with "fd1" and "write"
> is exchanged with "read".
Ditto - what happens to the data? One can't simply stuff it into the input
file? I think the 'wait in the system call until it drains' approach is
better.
Also, it seemed to me that the right thing to do was to bash the entry in the
system-wide file table (i.e. not the specific pointers in the u area). That
would automatically pick up any children.
Finally, there are 'potential' security issues (I say 'potential' because I'm
not sure they're really problems). For instance, suppose that an end process
(i.e. reading/writing a file) has access to that file (e.g. because it
executed a SUID program), but its neighbour process does not. If the end
process wants to go away, should the neighbour process be allowed access to
the file? A 'simple' implementation would do so (since IIRC file permissions
are only checked at open time, not read/write time).
I don't pretend that this is a complete list of issues - just what I managed
to think up while considering the new call.
> For stdio, of course, one would need fsplice(3), which must flush the
> in-process buffers--penance for stdio's original sin of said buffering.
Err, why is buffering data in the process a sin? (Or was this just a
humourous aside?)
Noel
Larry wrote in separate emails
> If you really think that this could be done I'd suggest trying to
> write the man page for the call.
> I already claimed splice(2) back in 1998; the Linux guys did
> implement part of it ...
I began to write the following spec without knowing that Linux had
appropriated the name "splice" for a capability that was in DTSS
over 40 years ago under a more accurate name, "copy". The spec
below isn't hard: just hook two buffer chains together and twiddle
a couple of file desciptors. For stdio, of course, one would need
fsplice(3), which must flush the in-process buffers--penance for
stdio's original sin of said buffering.
Incidentally, the question is not abstract. I have code that takes
quadratic time because it grows a pipeline of length proportional
to the input, though only a bounded number of the processes are
usefully active at any one time; the rest are cats. Splicing out
the cats would make it linear. Linear approaches that don't need
splice are not nearly as clean.
Doug
SPLICE(2)
SYNOPSIS
int splice(int fd0, int fd1);
DESCRIPTION
Splice connects the source for a reading file descriptor fd0
directly to the destination for a writing file descriptor fd1
and closes both fd0 and fd1. Either the source or the destination
must be another process (via a pipe). Data buffered for fd0 at
the time of splicing follows such data for fd1. If both source
and destination are processes, they become connected by a pipe. If
the source (destination) is a process, the file descriptor
in that process becomes write-only (read-only).
If file descriptor fd0 is associated with a pipe and fd1 is not,
then fd1 is updated to reflect the effect of buffered data for fd0,
and the pipe's other descriptor is replaced with a duplicate of fd1.
The same statement holds when "fd0" is exchanged with "fd1" and
"write" is exchanged with "read".
Splice's effect on any file descriptor propagates to shared file
descriptors in all processes.
NOTES
One file must be a pipe lest the spliced data stream have no
controlling process. It might seem that a socket would suffice,
ceding control to a remote system; but that would allow the
uncontrolled connection file-socket-socket-file.
The provision about a file descriptor becoming either write-only or
read-only sidesteps complications due to read-write file descriptors.
> From: Dave Horsfall <dave(a)horsfall.org>
> crt0.s -> C Run Time (support). It jiggers the stack pointer in some
> obscure manner
It's the initial startup; it sets up the arguments into the canonical C form,
and then calls main(). (It does not do the initial stack frame, a canonical
call to CSV from inside main() will do that.) Here are the exact details:
On an exec(), once the exec() returns, the arguments are available at the
very top of memory: the arguments themselves are at the top, as a sequence of
zero-terminated byte strings. Below them is an array of word pointers to the
arguments, with a -1 in the last entry. (I.e. if there are N arguments, the
array of pointers has N+1 entries, with the last being -1.) Below that is a
word containing the size of that array (i.e. N+1).
The Stack Pointer register points to that count word; all other registers
(including the PC) are cleared.
All CRT0.s does is move that argument count word down one location on the
stack, adjust the SP to point to it, and put a pointer to the argument
pointer table in the now-free word (between the argument count, and the first
element of the argument pointer table). Hence the canonical C main() argument
list of:
int argc;
int **argv;
If/when main() returns, it takes the return value (passed in r0) and calls
exit() with it. (If using the stdio library, that exit() flushes the buffers
and closes all open files.) Should _that_ return, it does a 'sys exit'.
There are two variant forms: fcrt0.s arranges for the floating point
emulation to be loaded, and hooked up; mcrt0.s (much more complicated)
arranges for process monitoring to be done.
Noel
Hi folks,
Yes I have managed to compile Hello World on v1/v2.
the cp command seems different from all other versions, I'm not sure I
understand it so I used the mv command instead which worked as
expected.
I had to "as crt0.s" and put crt0.o in /usr/lib and then it compiled
without issue.
Is the kernel in /etc? I saw a core file in /etc that looked like it
would be about the right size. No unix file in the root directory
which surprised me.
At least I know what crt0.s does now. I guess a port of unirubik to
v1/v2 is in the cards (maybe).
Mark
Hi folks,
I'm interested in comparing notes with C programmers who have written
programs for Unix v5, v6 and v7.
Also I'm interested to know if there's anything similar to the scanf
function for unix v5. Stdio and iolib I know well enough to do file IO
but v5 predates iolib.
Back in 1988 I tried to write a universal rubik's cube program which I
called unirubik and after discovering TUHS I tried to backport it to
v7 (which was easy) and v6 (which was a bit harder) and now I'm trying
to backport it to v5. The v5 version currently doesn't have the any
file IO capability as yet. Here are a few links to the various
versions:
http://www.maxhost.org/other/unirubik.c.v7http://www.maxhost.org/other/unirubik.c.v6http://www.maxhost.org/other/unirubik.c.v5
Also I've compiled the file utility from v6 in v5 and it seemed to
work fine. Once I got /dev/mt0 working for unix v5 (thanks to Warren's
help) I transferred the binary for the paging utility pg into it. This
version of pg I believe was from 1BSD.
I did some experimenting with math functions which can be seen here:
http://www.maxhost.org/other/math1.c
This will compile on unix v5.
My initial impression of Unix v5 was that it was a primitive and
almost unusable version of Unix but now that I understand it a bit
better it seems a fairly complete system. I'm a bit foggy on what the
memory limits are with v5 and v6. Unix v7 seems to run under simh
emulating a PDP-11/70 with 2 megabytes of ram (any more than that and
the kernel panics).
Also I'd be interested in seeing the source code for Ken Thompson's
APL interpreter for Unix v5. I know it does exist as it is referenced
in the Unix v5 manual. The earliest version I could find was dated Oct
1976 and I've written some notes on it here:
http://apl.maxhost.org/getting-apl-11-1976-to-work.txt
Ok, that's about it for now. Is there any chance of going further back
to v4, v3, v2 etc?
Mark
here's the e-mail that I sent on to Mark in the hope that it would
give him enough information to get his 5th Edition kernel working
with a tape device. He has also now joined the list. Welcome aboard, Mark.
Warren
----- Forwarded message from Warren Toomey <wkt(a)tuhs.org> -----
On Thu, Jul 10, 2014 at 05:56:04PM -0400, Mark Longridge wrote:
> There was no m40.s in v5 so I substituted mch.s for m40.s and that
> seemed to create a kernel and it booted but I can't access /dev/mt0.
Mark, glad to hear you were able to rebuild the kernel. I've never tried
on 5th Edition. Just reading through the 6th Edition docs, it says this:
-----
Next you must put in all of the special files in the
directory /dev using mknod‐VIII. Print the configuration
file c.c created above. This is the major device switch of
each device class (block and character). There is one line
for each device configured in your system and a null line
for place holding for those devices not configured. The
block special devices are put in first by executing the fol‐
lowing generic command for each disk or tape drive. (Note
that some of these files already exist in the directory
/dev. Examine each file with ls‐I with −l flag to see if
the file should be removed.)
/etc/mknod /dev/NAME b MAJOR MINOR
The NAME is selected from the following list:
c.c NAME device
rf rf0 RS fixed head disk
tc tap0 TU56 DECtape
rk rk0 RK03 RK05 moving head disk
tm mt0 TU10 TU16 magtape
rp rp0 RP moving head disk
hs hs0 RS03 RS04 fixed head disk
hp hp0 RP04 moving head disk
The major device number is selected by counting the line
number (from zero) of the device’s entry in the block con‐
figuration table. Thus the first entry in the table bdevsw
would be major device zero.
The minor device is the drive number, unit number or
partition as described under each device in section IV. The
last digit of the name (all given as 0 in the table above)
should reflect the minor device number. For tapes where the
unit is dial selectable, a special file may be made for each
possible selection.
The same goes for the character devices. Here the
names are arbitrary except that devices meant to be used for
teletype access should be named /dev/ttyX, where X is any
character. The files tty8 (console), mem, kmem, null are
already correctly configured.
The disk and magtape drivers provide a ‘raw’ interface
to the device which provides direct transmission between the
user’s core and the device and allows reading or writing
large records. The raw device counts as a character device,
and should have the name of the corresponding standard block
special file with ‘r’ prepended. Thus the raw magtape files
would be called /dev/rmtX.
When all the special files have been created, care
should be taken to change the access modes (chmod‐I) on
these files to appropriate values.
-----
Looking at the c.c generated, it has:
int (*bdevsw[])()
{
&nulldev, &nulldev, &rkstrategy, &rktab,
&tmopen, &tmclose, &tmstrategy, &tmtab, /* 1 */
&nulldev, &tcclose, &tcstrategy, &tctab,
0
};
int (*cdevsw[])()
{
&klopen, &klclose, &klread, &klwrite, &klsgtty,
&nulldev, &nulldev, &mmread, &mmwrite, &nodev,
&nulldev, &nulldev, &rkread, &rkwrite, &nodev,
&tmopen, &tmclose, &tmread, &tmwrite, &nodev, /* 3 */
&dcopen, &dcclose, &dcread, &dcwrite, &dcsgtty,
&lpopen, &lpclose, &nodev, &lpwrite, &nodev,
0
};
Following on from the docs, you should be able to make the /dev/mt0
device file by doing:
/etc/mknod /dev/tm0 b 1 0
And possibly also:
/etc/mknod /dev/rmt0 c 3 0
Cheers,
Warren
All, just received this from a fellow who isn't on the TUHS mail list (yet).
I've answered him about using mknod (after reading the 6e docs: we don't have
5e docs). I thought I'd forward the e-mail here as a record of an attempt to
rebuild the 5e kernel.
Cheers, Warren
----- Forwarded message from Mark -----
I hope you don't mind me asking you about compiling the unix v5
kernel. I haven't been able to find any documentation for it.
I tried this:
./mkconf
rk
tm
tc
dc
lp
ctrl-d
# as mch.s
# mv a.out mch.o
# cc -c c.c
# as l.s
# ld -x a.out mch.o c.o ../lib1 ../lib2
There was no m40.s in v5 so I substituted mch.s for m40.s and that
seemed to create a kernel and it booted but I can't access /dev/mt0.
Any pointers are appreciated. Thanks for all your work on early unix,
I thought it was very interesting.
Mark
----- End forwarded message -----
PS: I see I have over-generalized the problem. Doug's original message say "a
process could excise itself from a pipeline". So presumably the initiation
would come from process2 itself, and it would know when it had no
internally-buffered data.
So now we're back to the issue of 'either we need a system call to merge two
pipes into one, or the process has to hang around and turn itself into a cat'.
Noel
> From: Larry McVoy <lm(a)mcvoy.com>
> Making what you are talking about work is gonna be a mess of buffer
> management and it's going to be hard to design system calls that would
> work and still give you reasonable semantics on the pipe. Consider
> calls that want to know if there is data in the pipe
Oh, I didn't say it would work well, and cleanly! :-) I mean, taking one
element in an existing, operating, chain, and blowing it away, is almost
bound to cause problems.
My previous note was merely to say that the file descriptor/pipe
re-arrangement involved might be easier done with a system call - in fact, now
that I think about it, as someone has already sort of pointed out, without a
system call to merge the two pipes into one, you have to keep the middle
process around, and have it turn into a 'cat'.
Thinking out loud for a moment, though, along the lines you suggest....
Here's one problem - suppose process2 has read some data, but not yet
processed it and output it towards process3, when you go to do the splice.
How would the anything outside the process (be it the OS, or the command
interpreter or whatever is initiating the splice) even detect that, much less
retrieve the data?
Even using a heuristic such as 'wait for process2 to try and read data, at
which point we can assume that it no longer has any internally buffered data,
and it's OK to do the splice' fails, because process2 may have decided it
didn't have a complete semantic unit in hand (e.g. a complete line), and
decided to go back and get the rest of the unit before outputting the
complete, processed semantic unit (i.e. including data it had previously
buffered internally).
And suppose the reads _never_ happen to coincide with the semantic units
being output; i.e. process2 will _always_ have some buffered data inside it,
until the whole chain starts to shut down with EOFs from the first stage?
In short, maybe this problem isn't solvable in the general case. In which
case I guess we're back to your "Every utility that you put in a pipeline
would have to be reworked".
Stages would have to have some way to say 'I am not now holding any buffered
data', and only when that state was true could they be spliced out. Or there
could be some signal defined which means 'go into "not holding any buffered
data" state'. At which point my proposed splice() system call might be some
use... :-)
Noel
> From: Larry McVoy <lm(a)mcvoy.com>
> Every utility that you put in a pipeline would have to be reworked to
> pass file descriptors around
Unless the whole operation is supported in the OS directly:
if ((pipe1 = process1->stdout) == process2->stdin) &&
((pipe2 = process2->stdout) == process3->stdin) {
prepend_buffer_contents(pipe1, pipe2);
process1->stdout = process2->stdout;
kill_pipe(pipe1);
}
to be invoked from the chain's parent (e.g. shell).
(The code would probably want to do something with process2's stdin and
stdout, like close them; I wouldn't have the call kill process2 directly, that
could be left to the parent, except in the rare cases where it might have some
use for the spliced-out process.)
Noel