On 2016-03-18 03:00, Warren Toomey <wkt(a)tuhs.org> wrote:
> It's a bit off-topic, but what were non-Unix filesystems like around 1969-1970?
> The PDP-7 filesystem has i-nodes (file metadata) and filenames separate
> from the i-nodes. This allows hard links and thus a non-tree structured
> filesystem.
>
> This has always struck me to be one of the most important features of
> the Unix filesystem: names separated from the rest of the file metadata,
> and arbitrary hard links so that there is no preferred filename.
>
> Were these features in other contemporaneous filesystems?
I don't know exactly how contemporary ODS-1 is. It's the file system
used on RSX-11, and I think it should atleast trace back to around 1972,
but I can't say more for sure.
Anyway, ODS-1, just like the Unix file system, have the directory hold
just the filename and a file identifier (pretty much the same as inode
number). There are of course some differences in details, but I would
say it is very similar to how Unix works. ODS-1 do not have reference
counting, but instead allows dangling directory entries that do not
point to valid files. Instead ODS-1 have a generation counter for each
file identifier, so that when it is reused, links to the old file will
not accidentally refer to the new file.
I would think that something like Multics had something similar, but I
have no idea about that one...
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt(a)softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
> Were these features [arbitrary hard links] in other contemporaneous filesystems?
Multics had arbitary "links", which were distinguished from "branches"--the
actual file. Links and branches coexisted in directories. Unix was consciously
derived from this model, but simplified so that only links have names
and branches live elsewhere (the inode list). The architecture is
described in http://www.multicians.org/fjcc4.html. This paper is
dates from 1964 or 1965; it was certainly authoritative in 1969.
I don't know whether it evolved further.
Christopher Strachey and Joe Stoy independently conceived a model
isomorphic to Unix for OS 6 at Oxford. The two systems were
essentially contemporaneous.
I am not aware of other systems that allowed multiple names for
one file, though clearly the scent was in the air at the time.
doug
> From: Johnny Billquist
> I would think that something like Multics had something similar
No, as far as I know, Multics was always 'old-style' - the directory contained
not just the name of the file, but also all the file's meta-data. Multics had
only symbolic links, I'm pretty sure.
To answer the original question, I can't think offhand of another system that
separated naming and file meta-data before Unix did it. I've always assumed
that that was one of Unix' novel ideas.
Noel
$ pdp7 unixv0.simh # Run the PDP-7 Unix kernel on SimH
PDP-7 simulator V4.0-0 Beta
sim> c # Start execution
login: ken
password: ken
@ date
Thu Jan 01 1970 00:00:05
@ ls -l
00004 drw-- 01 777 00050 dd
00035 drwr- 01 012 00060 .
00003 drw-- 01 777 00270 system
00036 lrwr- 01 777 00305 date
00037 lrwr- 01 777 00441 ls
@
inum perms lnks uid size file
Root was user-id -1, but the octal print routine sees as an unsigned
int and prints it truncated to three octal digits, 777.
So, the kernel boots and runs.
Cheers, Warren
(Posted to both The Unix Heritage Society and the TZ mailing list)
I've been off-and-on reading the "live minus thirty years" old usenet
feed at olduse.net, and noticed something that may be of interest to
both of these groups: The original mod.sources posting of the (as far as
I can tell) earliest available version of Arthur David Olson's timezone
handling code, in 1986.
https://groups.google.com/d/msg/mod.sources/gcolqTxTt9w/04ZtaYCxLvcJ
For the files present in both, it matches revision 7441f6b6 from the git
repository, except for SCCS IDs vs %W%.
https://github.com/eggert/tz/tree/7441f6b6705782743f40b9fc40cdcc80a498fda5
The git repository contains a file ialloc.c that is not present in the
release.
Probable renamed files - These appear in the git repository under their
new
names, but had the older names in the release.
New: localtime.c newctime.3 zdump.c zic.8 zic.c
Old: tzcomp.8 tzcomp.c tzdump.c settz.c settz.3
Files in the release but not this version of the git repository:
mkdir.c strchr.c: These never appear, though they're referenced in
Makefile edits.
pacificnew: doesn't appear until SCCS version 8.1 in revision aaf2a927
dated July 2006.
years.sh: Appears as SCCS 7.1 yearistype.sh, dated March 1992.
According to Ken, the inspiration for ++ and -- came
from the PDP-7 hardware, not from previous languages.
The PDP-7 supported only ++i, where i had to be one
of only a few memory addresses. "For symmetry", Ken
says, he put all four operations in B. (And he did
not use the hardware autoincrement.)
Doug
All,
Is there a good source of information about the Unix v6 filesystem
outside of the source code itself? Also, is there a source for the
history of the early Unix filesystems from v6 onward?
Thanks,
Will
> From: Brantley Coile
> But B's ++ and -- operators seem to be unique.
B seems to be like UNIX itself in this regard: a carefully selected set of
old ideas, along with a few new ones of equal value.
Noel
>> https://www.bell-labs.com/usr/dmr/www/kbman.html
>> https://www.bell-labs.com/usr/dmr/www/bintro.html
> Yup, there certainly were different versions of B.
Yes, kbman covers only one of the two implementations that
cohabited the PDP-11. The other was the same language, with
software paging, so it could have a larger data space.
Various aspects of the language were borrowed from PL/I,
BCPL and Algol 68. ++ and -- were novel operators. The
reversal of Algol's assignment operators (e.g. -=
became =-) was eventually repealed in C.
doug
e
So the PDP-7 code from Norman has a B interpreter. I know the history:
BCPL -> B -> NB -> C, but I don't recall seeing a decent decription of
the B language. Does anybody know of such a document? We'll need something
like this so we can use the interpreter once we get it working :-)
Cheers, Warren
I grabbed a copy as well. A quick grep showed something I had forgotten:
I ran a Source redistribution service.
David
>
> On Tue, 2 Feb 2016, Warren Toomey wrote:
>
>> This is temporarily at http://www.tuhs.org/Usenet/Usenet.tar.bz2 if
>> anybody else would like to grab it.
>
> Suitably grabbed :-) I know, I must finish my mirror some day (got a few
> health issues right now)...
>
At www.skeeve.com/Usenet.tar.bz2 is a copy of UUNET's archives of the
various USENET source newsgroups. I created this file on September 2 2004.
I made it for myself, since it was clear that uu.net would disappear
sometime soon...
It's 139 Meg - Warren maybe you can put it into the archives and everyone
else can get it from there? I think the person who hosts www.skeeve.com
has some monthly limits on data transfer and I don't want him blown
out of the water.
Thanks,
Arnold
> I'm still trying to get my around about how a program such as "egrep"
> which handles complex patterns can be faster than one that doesn't
> Is there a simple explanation, involving small words?
First, the big-word explanation. Grep is implemented as a nondeterministic
finite automaton, egrep as a deterministic finite automaton. Theory folk
abbreviate these terms to "NFA" and "DFA", but these are probably not
what you had in mind as "small words".
The way grep works is to keep a record of possible partial parsings
as it scans the subject text; it doesn't know which (if any) will
ultimately be the right one, hence "nondeterministic". For example,
suppose grep seeks pattern '^a.*bbbc' in text 'a.*bbbbbbc'. When
it has read past 3 b's, the possible partial parses are 'a.*', 'a.*b',
'a.*bb' and 'a.*bbb'. If it then reads another b, it splits the
first partial parse into 'a.*' and 'a.*b', updates the next two
to 'a.*bb' and 'a.*bbb', and kills off the fourth. If instead it
reads a c, recognition is successful; if anything else, all partials
are killed and recognition fails.
Egrep, by preprocessing the expression, produces separate code for
each of several possible states: "no b's", "1 b", 2 b's" and "3 bs".
When it's in state "1 b", for example, it switches on the next
character into "2 b's" or fails, depending on whether the
character is b or not--very fast. Grep, on the other hand, has to
update all the live parses.
So if egrep is so fast, why do we have grep? One reason is that
grep only needs to keep a list of possible progress points
in the expression. This list can't be longer than the expression.
In egrep, though, each state is a subset of progress points.
The number of possible subsets is exponential in the length
of the expression, so the recognition machine that egrep constructs
before attempting the parse can explode--perhaps overflowing
memory, or perhaps taking so long in construction that grep
can finish its slow parse before egrep even begins its fast parse.
To revert to the words of theory folks, grep takes time O(m*n) and
space O(m) worst case, while egrep takes time and space O(2^m+n).
(2^m is an overestimate, but you get the idea.)
That's the short story. In real life egrep overcomes the exponential
by lazily constructing the machine--not generating a state until it
is encountered in the parse, so no more than n states get constructed.
It's a complex program, though, for the already fancy preprocessing
must be interleaved with the parsing.
Egrep is a tour de force of classical computer science, and pays
off big on scanning big files. Still, there's a lot to say for
the simple elegance of grep (and the theoretical simplicity
of nondeterministic automata). On small jobs it can often win.
And it is guaranteed to work in O(m) memory, while egrep may need
O(n).
-------------------------------------------------
Ken Thompson invented the grep algorithm and published it in CACM.
A pointy-headed reviewer for Computing Reviews scoffed: why would
anybody want to use the method when a DFA could do the recognition
in time O(n)? Of course the reviewer overlooked the potentially
exponential costs of constructing a DFA.
Some years later Al Aho wrote the more complicated egrep in the
expectation that bad exponential cases wouldn't happen in everyday life.
But one did. This job took 30 seconds' preprocessing to prepare
for a fraction of a second's parsing. Chagrined, Al conceived
the lazy-evaluation trick to overcome the exponential and
achieved O(n) run time, albeit with a big linear constant.
In regard to the "short history of grep", I have always thought
my request that Ken liberate regular expressions from ed caused
him to write grep. I asked him one afternoon, but I can't remember
whether I asked in person or by email. Anyway, next morning I
got an email message directing me to grep. If Ken took it
from his hip pocket, I was unaware. I'll have to ask him.
Doug
A friend just sent me a pointer to this site, which appears not
to have been mentioned on this list before:
PDP 11/70 Emulator
http://skn.noip.me/pdp11/pdp11.html
The site lists these working guest O/Ses:
RL0 BSD 2.9
RL1 RSX 11M v3.2
RL2 RSTS/E v7.0
RL3 XXDP
RK0 Unix V5
RK1 RT11 v4.0
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
All, I've got the original PDP-7 cat working in my PDP-7 user-mode
simulator called "a7out". The cat.s source code and the a7out program are
in the Github repository in src/cmds and tools, respectively.
The repository is at https://github.com/DoctorWkt/pdp7-unix
The mailing list is at http://minnie.tuhs.org/pipermail/pdp7-unix/
I'm attaching the a.out assembly output as the as7 assembler in the
repository currently is not quite ready for cat.s.
To run the original cat, create a text file, e.g. echo hello > file1
$ ./a7out a.out file1
hello <- the output of the PDP-7 machine code
Cheers, Warren
P.S Thanks again to Norman for getting us the scans.
> Norman Wilson is going to try and get us some higher quality scans which
will help a great deal in deciphering some of the hard to read characters.
A second scan, high or low quality, is a tremendous help. Diffing them
is a really good way to spot trouble.
Doug
On Thu, Feb 25, 2016 at 01:43:03PM -0500, Robert Swierczek wrote:
> Do you know if anybody has taken up the challenge of transcribing and
> simulating the PDP-7 Unix source code you have uncovered in your
> post http://minnie.tuhs.org/pipermail/tuhs/2016-February/006622.html
> If not, I would love to get started on it as a project.
Hi Robert, yes there is a project underway to type it all in and bring it
up on SimH and hopefully on a real PDP-7. I've set up a mailing list for
the project, so let me know if you would like to join: I'll add you.
The repository is at https://github.com/DoctorWkt/pdp7-unix. I've started
on the S1 section (in scans/), and I've also started work on an assembler
and a user-mode simulator (in tools/)
Norman Wilson is going to try and get us some higher quality scans which
will help a great deal in deciphering some of the hard to read characters.
Cheers, Warren
> From: Random832
> They're 24 bits, aren't they?
Not according to the source:
typedef long daddr_t;
daddr_t s_fsize; /* size in blocks of entire volume */
short s_nfree; /* number of addresses in s_free */
daddr_t s_free[NICFREE];/* free block list */
(from param.h and filsys.h respectively).
> From: Ron Natalie
> The V6 block numbers were 24 bits.
Maybe you're thinking of the byte number within the file? The file length
was stored in an word plus a byte in the inode in V6:
char i_size0;
char *i_size1;
but the block number in the device was a word:
int s_fsize; /* size in blocks of entire volume */
int s_nfree; /* number of in core free blocks (0-100) */
int s_free[100]; /* in core free blocks */
"Use the source, Luke!"
Noel
> From: Will Senn
> Is there a good source of information about the Unix v6 filesystem
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/man/man5/fs.5
> Also, is there a source for the history of the early Unix filesystems
> from v6 onward?
I don't know of one (although there is that article on the 4.2 filesystem),
but would love to hear of one.
I gather that V7 is basically V6 except the block numbers are 32 bits, not 16.
Noel
On Sun, Feb 21, 2016 at 09:21:14PM -0600, Will Senn wrote:
> Thanks for the link. The tools look useful. But, they appear to be extract from tape rather than create tape utils? I am away from a computer but will try them out later to make sure.
No, my bad. I thought they would make tapes, but I read the Readme files
and it doesn't look so. You could modify the mkfs.c tool that I wrote at
https://github.com/DoctorWkt/unix-jun72/blob/master/tools/mkfs.c
to write V6 filesystems. It shouldn't be too hard.
Cheers, Warren
Sometime back before the turn of the century, I remember
writing up a summary of the evolution of the UNIX file
system, starting with the earliest system I could find
information for (possibly the PDP-7 system) and running
through the printed manuals as things changed, up to
the Seventh Edition.
I think I've found it; I'll look it over and try to put
it somewhere on the web in the next day or two.
Norman Wilson
Toronto ON
> From: Will Senn
> Thanks for the link.
Sure. It's worth reading the entire V6 manual if you're going to be doing a
lot with it - lots of goodies hidden in places like that. Also the two BSTJ
Unix issues. (I think they are available online, now.)
> Supposing I created a byte faithful representation of a V6 filesystem
> on my mac, would I then be able to load the file in simh as an RK05 and
> mount and access its files and directories from a V6 instance?
That's really a SIMH question, and I don't use SIMH; I use Ersatz11. That is
certainly how Ersatz11 works; I just FTP'd the RK05 distro images over, set
them up as the files that 'implemented' various RK05 drives, and (modulo a
few teething Ersatz11 configuration issues) away it went.
Noel