> everyone should write for their first compiler in Pascal for a
> simple language and no cheating using YACC. You need to write the whole
> thing if you want to understand how parsing really works.
Yacc certainly makes it easier to write parsers for big grammars, but
it's far from cheating. You need to know a lot more about parsing to use
Yacc than you need to roll your own.
Hand parsing of a tiny grammar is almost a necessary step on the way to
understanding Yacc. But I think hand-building the whole parser for a
compiler is unnecessary torture--like doing trigonometry with log tables.
Doug
Found out today that we lost George Coulouris about a month ago, he was at QMC (then QMW, now Queen Mary, University of London) in CompSci and an old Unix hand (but not only).
Obituary from his PhD student (who wrote a Unix editor called “ded”):
https://www.theguardian.com/education/2025/jan/19/george-coulouris-obituary
Someone I know is seeking the original version of an internal Bell Labs
memo from 1974 titled "Webster's Second on the Head of a Pin" by Morris and
Thompson. The topic appears to be related to improving the speed of lookups
or search. It's cited in a few papers as "Unpublished Technical Memo, Bell
Laboratories, Murray Hill, NJ 1974." All I can find online is citations.
Any leads appreciated!
--
Royce
As I mentioned in the discussion about C, it's easy to look back with
a modern perspective and cast aspersions on C. But this got me
thinking, what would possible alternatives have been? In the context
of the very late 1960s heading into the early 70s, and given the
constraints of the PDP-7 and early PDP-11s, what languages would one
consider for implementing a system like early Unix? Dennis's history
paper mentioned a very short-lived effort at Fortran, and I asked
about that a few years ago, but no one really remembered much about
it; I gather this was an experiment that lasted a few days or weeks
and was quickly abandoned. But what else?
My short list included PL/1, Algol/W, Fortran, and Pascal. Fortran was
already mentioned. I don't think PL/1 (or PL/I) could have fit on
those machines. Pascal was really targeted towards teaching and would
have required pretty extensive work to be usable. The big question
mark in my mind is Algol/W; how well known was it at the time? Was any
consideration for it made?
Obviously, the decision to go with BCPL (as the basis for B, which
beget C) was made and the rest is history. But I'm really curious
about how, in the research culture at the time, information about new
programming languages made its way through the community.
- Dan C.
On Mar 10, 2025, at 7:26 PM, John Levine <johnl(a)taugh.com> wrote:
>
> In my 1971 compiler course at Yale, Alan Perlis made us try to write a compiler
> that translated a subset of APL into Basic. He suggested we write it in APL,
> which was a terrible idea, so I wrote it in Trac, for which I happened to have
> written my own interpreter.
>
> I think my compiler was the only one that worked, and it was pretty clever,
> turning the APL array expressions into structures with array boundaries and
> example expressions, with no array temporaries. It only generated the loops to
> evaluate the expressions when storing into another array.
>
> Someone got a PhD in 1978 for a similar compiling technique but in 1971 I was a
> 17 year old twerp so what did I know?
>
> R's,
> John
Pretty impressive for a 17yo!
Isn’t APL syntax rather context sensitive[1]? Neither yacc nor
a RD parser would’ve helped! Unless the subset was limited to a
context free subset.
Tim Budd in his 1978 work made quite a few changes to APL to
ease compilation and used yacc. [I have the book somewhere....]
[1] I do not recall if Iverson's original APL had a context sensitive
grammar but modern APLs do.
Given an expression ‘x y z’, its parse depends on the types of
x, y & z. Example: y(x,z) if y is a dyadic verb, x & z are values,
x(y(z)) if x & y are monadic verbs, z a value etc.
I assume people have seen this?
https://github.com/ryomuk/TangNanoDCJ11MEM/tree/main
It's capable of running Unix v1 & some limited amount of v6 among other
things. The FPGA in question the Tang Nano 20k is sub 30GBP delivered from
AliExpress.
Kind of neat to combine a real processor with a simple FPGA implementation
of the hardware.
Ken,
Was smalgol also known as BC Algol, as described here:
https://www.softwarepreservation.org/projects/ALGOL/algol60impl/#BC_ALGOL
> On Mar 9, 2025, at 12:06 PM, Ken Thompson <kenbob(a)gmail.com <mailto:kenbob@gmail.com>>
> wrote:
>
> how about smalgol?
>
> it was an algol-like language with just int and float types.
> i dont know its history, but it came out of berkeley near
> when Niklaus Wirth was there. it compiled for the ibm 7094
> in normal batch processing fashion. i converted it to a jit
> into memory in order to skip the loading phase. i used
> it for a lot of my fun-work. (1965-66)
>
> mainframe time, then, was a big factor in the computing process.
> smalgol could compile, load, and run in about 1 cpu-second.
>
> smalgol was all ibm-cards, but it was on my mind through
> the bcpl to b to nb phases. i would use the modern word
> "influencer.”
Paul McJones
Adding to Brian's remarks.
Both PL/I, which had been adopted by Multics, and BCPL/B were very
familiar. PL/I , even gutted as it had been for Multics, was much too big
to contemplate. BCPL's integration of subscripting and pointers was nice,
as was its closeness to the machine. Typlessness was a drawback: how would
one integrate floating point or characters? Another was the global vector,
like Fortran COMMON.
Algol W was known (to me, at least) only via its publication in CACM. I
don't recall it having been considered. Because Algol W had more concepts
than BCPL,was not as closely matched to machine-level coding, and (I
believe) was equally lacking in separate-compilation facilities, I suspect
it would not have made the cut.
Doug
I asked BWK if he had any thoughts about possible alternative
languages. Here is his response, forwarded by permission.
Arnold
> Date: Sun, 9 Mar 2025 08:27:57 -0400 (EDT)
> From: Brian Kernighan <bwk(a)cs.princeton.edu>
> To: arnold(a)skeeve.com
> cc: crossd(a)gmail.com, Brian Kernighan <bwk(a)cs.princeton.edu>
> Subject: Re: An interesting history question
>
> Dan raises an interesting question. I don't have a good answer,
> but there are possibilities.
>
> Typeless languages like BCPL were in the air; Bliss, from CMU in
> 1970, was a significant example, used mostly on the PDP-10 but it
> could run on a PDP-11. It was definitely a contender for doing
> systems work.
>
> I used MAD in the summer of 1966 at MIT and remembered it as being
> much nicer than Fortran, though when I looked at a description a
> while ago, it wasn't clear what the attraction was.
>
> Bell Labs (Doug McIlroy and Bob Morris, mostly) made a PL/I subset
> called EPL that was at least compilable and a lot easier to manage
> than the full language. I don't know whether that would have
> worked, but it would seem that Ken didn't think so, since he went
> off on his own direction. Doug would know more; he sent me some
> corrective info a month ago, on the errata page here:
>
> https://www.cs.princeton.edu/~bwk/memoir.html
>
> Fortran would have needed major work to handle non-numeric data.
> I wrote a text formatter in it by hacking with the Logical*1 type;
> that let me handle one character at a time by basically lying,
> though I've long since forgotten the details.
>
> Pascal was hopeless, as I have described elsewhere, though
> variants that repaired some of the type system might have worked.
>
> The US military used Jovial; it sounds like it's still sort of in
> use, since it handles the avionics in a lot of planes. It looks
> like a direct descendant of Algol 58.
>
> I never used Algol/W, but of all the options, it seems like it
> might have been the strongest contender.
>
> Xerox PARC had Mesa, but my dim memory is that it was big and
> complicated, which is the opposite of what was needed at the time.
> It also came along too late, mid to late 1970s. It did influence
> Java and Modula-2, says Wikipedia.
>
> HOPL 1 includes papers on other languages of the time, most of
> which would not have worked, and/or have died by now. There's a
> lot of history, and I have no idea how to get on top of it all.
> But still interesting to look at and speculate about.
>
> Brian
>
>
> On Sat, 8 Mar 2025, arnold(a)skeeve.com wrote:
>
> > Hi Brian.
> >
> > Any thoughts on this?
> >
> > (cc-ing Dan, the original poster)
> >
> > Thanks,
> >
> > Arnold
> >
> >> From: Dan Cross <crossd(a)gmail.com>
> >> Date: Sat, 8 Mar 2025 22:46:58 -0500
> >> To: TUHS <tuhs(a)tuhs.org>
> >> Subject: [TUHS] What would early alternatives to C have been?
> >>
> >> As I mentioned in the discussion about C, it's easy to look back with
> >> a modern perspective and cast aspersions on C. But this got me
> >> thinking, what would possible alternatives have been? In the context
> >> of the very late 1960s heading into the early 70s, and given the
> >> constraints of the PDP-7 and early PDP-11s, what languages would one
> >> consider for implementing a system like early Unix? Dennis's history
> >> paper mentioned a very short-lived effort at Fortran, and I asked
> >> about that a few years ago, but no one really remembered much about
> >> it; I gather this was an experiment that lasted a few days or weeks
> >> and was quickly abandoned. But what else?
> >>
> >> My short list included PL/1, Algol/W, Fortran, and Pascal. Fortran was
> >> already mentioned. I don't think PL/1 (or PL/I) could have fit on
> >> those machines. Pascal was really targeted towards teaching and would
> >> have required pretty extensive work to be usable. The big question
> >> mark in my mind is Algol/W; how well known was it at the time? Was any
> >> consideration for it made?
> >>
> >> Obviously, the decision to go with BCPL (as the basis for B, which
> >> beget C) was made and the rest is history. But I'm really curious
> >> about how, in the research culture at the time, information about new
> >> programming languages made its way through the community.
> >>
> >> - Dan C.
> >>
> >
>
> From: Larry McVoy
> Not once did I think about packing, the structs somehow just worked on
> the machines I was working on. Maybe the TCP/IP guys knew about spacing
> in the structs.
Not really! Of the first 6 TCP/IP implementations:
https://gunkies.org/wiki/TCP_and_IP_bake_offs
only 1 was in C - and it was a relatively late one. The earliest ones were
mostly in assembler (PDP-10 and PDP-11).
Noel
> From: Phil Budne
> BUT, the basic TCP and IP protocols seem to have been created with a
> general care that two byte fields should be aligned at multiples of two
> bytes
Yes, because dealing with a 16-bit field that spans two PDP-11 16-bit words
is a pain (espcially because the PDP-11 does not have a 'load byte into
register _without_ extending the sign bit into the high half' instruction).
Do realize that in addition to the early TCP implementation, the _first_ TCP
router (at that stage, TCP and IP were not separate protocols) was also a
PDP-11 (albeit programmed in BCPL, not MACRO-11).
I remember the extension being a real PITA. To load an un-aligned 16-bit
quantity into R0, one would have had to do something like (assuming a pointer
to the un-aligned 16-bit quantity was in R1):
MOVB (R1)+, R0
SWAB R0
BIC #0377, R0
BISB (R1)+, R0
There may have been a better way to do it, but that's the best I can come up
with now; I recall we had to do something like that.
Yes, the 16-bit fields were 16-bit word aligned.
Noel
the code in the repo is for the FPGA, the processor that is strapped to the
FPGA well it runs the real code.
It's like the 'minimig' Amiga emulator platform, a real processor, and FPGA
to do all the IO heavy lifting.
So it's not 100% FPGA but you are executing code on a real processor so you
aren't exactly full emulation either. And it doesn't cost a fortune,
assuming you can find one of these ancient microprocessors.
-----Original Message-----
From: emanuel stiebler
To: Jason Stevens; 'tuhs(a)tuhs.org'
Sent: 3/5/25 2:50 PM
Subject: Re: [TUHS] DCJ-11 processor with 20k FPGA
On 2025-03-01 07:11, Jason Stevens via TUHS wrote:
> I assume people have seen this?
>
> https://github.com/ryomuk/TangNanoDCJ11MEM/tree/main
>
>
> It's capable of running Unix v1 & some limited amount of v6 among
other
> things. The FPGA in question the Tang Nano 20k is sub 30GBP
delivered from
> AliExpress.
>
> Kind of neat to combine a real processor with a simple FPGA
implementation
> of the hardware.
I just had a look at it, but he doesn't show the code, which runs on the
TangNano?
> From: Rob Pike
> The notion that the struct layout must correspond to the hardware
> device's layout is about as non-portable as thinking can get.
I'm confused; I thought device register layout is inherently about as
non-portable a thing as one could have, generally.
(Exceptions: 1) the device is basically a single chip, so interfaces on two
machines might be essentially identical, if they use the same chip; 2) someone
made a 68K card that plugged into a QBUS, so drivers on a PDP-11 and that 68K
could be identical.)
Or did you mean that one could somehow disassociate the struct layout and the
details of the device (assuming it has addressable registers, as became
common)? How (I'm missing it)?
Noel
> From: "G. Brandn Robinson"
> C was a language for people who wanted to get that crap out of the way
> so they could think about binary representations.
Huh? Sometimes C gets in the way of doing that; see below.
> From: Dan Cross
> They did indicate that alignment makes sharing _binary_ data between
> VAX and PDP-11 harder, but that's truerepresentation of other aspects of product
> types as well.
Alignment is just one aspect of low-level binary representation; there's also
length (in bits), which is critically important in several problem domains;
device registers have already been mentioned, but more below.
> From: Peter Yardley
> Problems I have with C as a systems language is there is no certainty
> about representation of a structure in memory and hence when it is
> written to disk.
That's yet another one.
The area I'm thinking of (and which I saw a lot of) is code to implement
network protocols (and I'm fairly astounded that nobody else has mentioned
this yet). One has to have _absolute_ control over how the bits are laid out
in the packet (which of course might wind up in any one of countless other
machine types) - which generally means how they are laid out in memory.
The whole concept of C declarations is not rich enough to really deal with
this properly. For each field in the header, one absolutely needs to be able
to separately specify the syntax (e.g. size in bits) and semantics (unsigned
integer, etc).
And if you want the code to be portable, so that one set of sources will
compile to working code on several different architctures, it gets worse.
Device registers, already mentioned, often only have to run on one type of
machine, but having protocol implementions run on a number of different
machine types is really common.
I came up with a way to do this, without _any_ #ifdefs (which I don't like,
for a reason I won't get into) in most source files. Dealing with byte order
issues was similarly dealt with (one can't deal with it just in types, really,
without making the type specification, and the language, somewhat
complicated).
I know later C's got better about richer variable semantics and syntax
selection than the circa 1985 ones I was working with, but I don't think it
was ever made completely simple and orthogonal (e.g.
'signed/unsigned/boolean/etc char/short/long/quad/word/etc') as it should
have been.
Noel
Given that anything that obeys the ABI and has assembler entries to the kernel
can request services, it seems to me it would be possible to stand up a
user-land without C being present. Have any UNIXen ever done this after the
advent of C?
- Matt G.
> Everything that can possibly be represented in a machine
> register with all bits clear shows up as an integral zero literal.
> '\0' == 0 == nullptr == struct { } == union { }
Well, some things.
0.0f and other floating-point zero constants are represented
by all-zero words (of various sizes) and are not integral constants.
NULL does not "show up as an integral zero literal".
0==NULL is true only because 0 can be converted to NULL.
Getting really lawyerly, one can cook up any number of
bizarre "things that can possibly be represented" by an
all-zero word, for example (char[4]){0,0,0,0}, and have
no representation as an integral constant.
Only 3 of the 5 examples fit the description of possibly being
represented by an all-zero word.
struct{} and union{} are gnu extensions with size zero. Even
if you accept them as C, they have no machine representation
and cannot be cast to int.
The null pointer makes the list only thanks to the weasel-word
"possibly". Although 0 can be cast to the null pointer, the
result of casting a null pointer to int depends on its unspecified
machine representation. Zero, of course, is a good choice
because it's easy to test for, and is easy to omit from virtual
address spaces.
Doug
Hello Anyone interested in this silliness , I am just recently trying
to reacquaint myself with this os . Which I had a decent passing knowledge of
at one time . Not any real OS level or driver coding , But was least decently
acquainted . Now on with the preliminaries ...
Any good software items to update on this ol'thing that give me a better chance
of completing this task , Greatly welcome .
I have folowed , ths article which is a copy from (imo) several places , tho
all of them are using a axp-Emulator .
<https://gist.github.com/jamesy0ung/eeac82997ebeae92873d1f2844a14ac3>
I am using (I'll admit) a REAL AlphaStation 200 (4/100) with 384MB main memory &
three disks all are U160's 2x4G+1x72G , OS is installed on the 72G(now) & has a
/home dir for users rather that the default location . See info & error during
make of gcc . Those numbers for the allocation & total have been exactly the
same accross many iterations of attempts in that exact file .
# sizer -v
HP Tru64 UNIX V5.1B (Rev. 2650); Sun Feb 23 19:43:32 AKST 2025
It is at patch level 008 .
and had successfully compiled & installed all the prerequisites shown in the
article mentioned above .
It Seems the OS doesn't know how to access the swap properly .
root@as200:/home/buildnfs/gcc-4.4.7# env PATH=/usr/local/bin:/sbin:/usr/sbin:/usr/bin:/usr/ccs/bin:/usr/bin/X11:/usr/dt/bin:~/bin:. make
... many lines snipped ...
/home/buildnfs/gcc-4.4.7/host-alpha-dec-osf5.1b/prev-gcc/xgcc
-B/home/buildnfs/gcc-4.4.7/host-alpha-dec-osf5.1b/prev-gcc/
-B/usr/local/alpha-dec-osf5.1b/bin/ -c -g -$
cc1: out of memory allocating 135816 bytes after a total of 796519376 bytes
make[3]: *** [fold-const.o] Error 1
make[3]: Leaving directory `/home/buildnfs/gcc-4.4.7/host-alpha-dec-osf5.1b/gcc'
make[2]: *** [all-stage2-gcc] Error 2
make[2]: Leaving directory `/home/buildnfs/gcc-4.4.7'
make[1]: *** [stage2-bubble] Error 2
make[1]: Leaving directory `/home/buildnfs/gcc-4.4.7'
make: *** [all] Error 2
# swapon -s
Swap partition /dev/disk/dsk2g:
Allocated space: 249774 pages (1.91GB)
In-use space: 1520 pages ( 0%)
Free space: 248254 pages ( 99%)
Swap partition /dev/disk/dsk1b:
Allocated space: 49152 pages (384MB)
In-use space: 1630 pages ( 3%)
Free space: 47522 pages ( 96%)
Swap partition /dev/disk/dsk0b:
Allocated space: 49152 pages (384MB)
In-use space: 1618 pages ( 3%)
Free space: 47534 pages ( 96%)
Total swap allocation:
Allocated space: 348078 pages (2.66GB)
In-use space: 4768 pages ( 1%)
Available space: 343310 pages ( 98%)
# hwmgr show scsi
SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST
HWID: DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH
-------------------------------------------------------------------------
42: 0 as200 cdrom none 0 1 cdrom0 [0/4/0]
43: 1 as200 disk none 2 1 dsk0 [1/0/0]
44: 2 as200 disk none 2 1 dsk1 [1/1/0]
45: 3 as200 disk none 2 1 dsk2 [1/2/0]
# hwmgr -view dev
HWID: Device Name Mfg Model Location
------------------------------------------------------------------------------
3: /dev/dmapi/dmapi
4: /dev/scp_scsi
5: /dev/kevm
29: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0
42: /dev/disk/cdrom0c TOSHIBA DVD-ROM SD-M1401 bus-0-targ-4-lun-0
43: /dev/disk/dsk0c IBM DDRS-34560D bus-1-targ-0-lun-0
44: /dev/disk/dsk1c COMPAQ BD07286224 bus-1-targ-1-lun-0
45: /dev/disk/dsk2c COMPAQ ST34371W bus-1-targ-2-lun-0
46: /dev/random
47: /dev/urandom
Tia , JimL
--
+---------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network & System Engineer | 3237 Holden Road | Give me Linux |
| jiml(a)system-techniques.com | Fairbanks, AK. 99709 | only on AXP |
+---------------------------------------------------------------------+
Yufeng,
> I've recently brought the "prestruct-c" compiler back to "life"
Great archeology! It seems you've unearthed a snapshot from the brief
period when Dennis was struggling to reconcile byte addressing with BCPL
pointers--the seminal innovation of C. In characteristic Unix fashion, he
was trying out his ideas as they developed.
I had forgotten that product types were under con-struct-ion at the same
time. That really was a big bang.
Doug
Hi again,
I've recently brought the "prestruct-c" compiler back to "life" (https://github.com/TheBrokenPipe/C-Compiler-Dec72) and thought it might be worth documenting here. One thing I have to say first - it's barely working and probably never worked to begin with.
There were some efforts in the distant past to revive this compiler; however, the compiled compiler never worked. The reasons are as follows:
- The compiled executable is too big (exceeds 32K, making pointers effectively negative). This triggers a bug in the liba I/O routines.
- The compiler assumes an origin of 0 and writes temp data at the NULL pointer. Without an MMU, this kills the interrupt vectors and possibly the kernel on the 11/20.
- The compiler is missing ALL code/tables written in assembly language. This is pretty fatal, and internal changes rendered files from the last1120c compiler incompatible.
- Calling convention changes make the s2/last1120c libc library incompatible.
I'm a big fan of the C programming language, and the reason I was so insistent on getting this compiler to work is that it has a funny struct syntax not seen in any other C compiler. Structs are defined like:
struct name (
type field;
...
);
... with round brackets (parentheses) instead of curly braces.
Another notable thing introduced in this compiler is that certain things are no longer lvalues. In the past (B and last1120c), functions, labels, and arrays were lvalues, meaning they could be assigned. For instance, this code:
func1() { return (1); }
func2() { return (2); }
main() {
printf("func1() = %d\n", func1());
printf("func2() = %d\n", func2());
printf("func1 = func2\n");
func1 = func2;
printf("func1() = %d\n", func1());
}
produces the output:
func1() = 1
func2() = 2
func1 = func2
func1() = 2
This code:
main() {
second = first;
goto second;
first:
printf("first\n");
second:
printf("second\n");
}
produces the output:
first
second
And this code:
main(argc, argv) {
int arr1[10];
int arr2[10];
arr1[0] = 5;
arr2[0] = 8;
printf("arr1[0] = %d\n", arr1[0]);
printf("arr2[0] = %d\n", arr2[0]);
printf("arr1 = arr2\n");
arr1 = arr2;
printf("arr1[0] = %d\n", arr1[0]);
}
produces the output:
arr1[0] = 5
arr2[0] = 8
arr1 = arr2
arr1[0] = 8
Now, the rules of the game have changed with the prestruct-c compiler, and these are no longer lvalues. I don't know why this change was made, but if I had to guess, speed was probably the biggest driving factor, with security also playing a role. Anyhow, they're no longer lvalues, so there's now one less level of indirection involving functions and labels. This means the codegen tables from last1120c have to be modified to suit this compiler change.
However, even with it generating the correct code, there is still one fatal problem - the libc. The libc from s2/last1120c was designed for the older compiler and therefore has one extra layer of indirection for functions. Luckily, the source code of the libc is available on the last1120c tape, and it wasn't too much work to remove the indirection manually.
Okay, what else? Well, this compiler also seems to be the first to introduce the "modern" pointer syntax. Before this compiler, pointers were declared using the same syntax as arrays/vectors, like "char name[];". This compiler introduced the "modern" syntax of "char *name;". No big deal, right? Well, the compiler itself was written using the old syntax, meaning it cannot compile itself. I think this indicates that this compiler (or the new syntax) was so unstable that the production compiler still used the old syntax.
With everything carefully put back into place, I managed to get this to work:
struct foo (
char x;
int y;
char *z;
);
main(argc, argv)
char **argv;
{
struct foo bruh;
bruh.x = 'C';
bruh.y = 123;
bruh.z = "test";
printf("x = '%c', y = %d, z = \"%s\"\n", bruh.x, bruh.y, bruh.z);
}
However, if I rename the variable "bruh" to something like "bar", it throws the error "Unimplemented pointer conversion". I have no clue why. I've also never gotten struct pointers to work - it always complains about "Illegal structure ref" when I try to use "->". It also seems to accept "." on structure pointers (and does not actually dereference the pointer when accessing the members), so something is probably very wrong with referencing and dereferencing.
Anyway, there are plenty of other issues with the compiler, like the code may not compile correctly with pointers and switch statements. I'm not sure if the issues are caused by my poor reconstruction of the assembly tables, or if the compiler itself never worked properly in the first place (or both). Either way, I've managed to get it to spit out a correct hello world, as well as the struct test above, so I think I've fulfilled my goal of seeing this compiler "work".
The code, build instructions and pre-built binaries are here:
https://github.com/TheBrokenPipe/C-Compiler-Dec72
Ideally, it runs under a PDP-11/45 environment with 0 as the origin and generates code for the PDP-11/45. However, I made it target the 11/20 since I couldn't get the 11/45 toolchain to work, and I haven't implemented 11/45 instructions in my simulator yet. If anyone wants to pick up the baton and get it working for the 11/45 or fix my bugs, be my guest!
Sincerely,
Yufeng
> From: Yufeng Gao
> The s1 kernel is, to date, the earliest machine-readable UNIX kernel,
> sitting between V1 and V2.
It will be interesting to see what it reveals, as it's in the UNIX 'dark age'
between V1 and V4. Working from hints and clues in the extant 'UNIX
Programmer's Manual: Second Edition', I had tried to figure out how V2
differed from V1:
https://gunkies.org/wiki/UNIX_Second_Edition
but I was mostly interested in 'big picture' issues (like how a process'es
address space was laid out), not details like 'the foo() call was added', or
'how exec() differs'. (If someone _does_ create lists of the calls in V1 and
V2, and their details, and compares them, that _will_ be of value, don't get
me wrong; I was mostly just trying to work out how the mysterious KS11
worked.)
> It's somewhat picky about the environment. So far, aap's PDP-11/20
> emulator .. is the only one capable of booting the kernel. SIMH and
> Ersatz-11 both hang before reaching the login prompt.
It would be very interesting to know what fails. By 'hang', do you mean
'ceases making progress', or 'halts'?
If the former, since I've almost always had good experiences with Ersatz-11,
my _guess_ would be a problem with the RF11 emulation. (The RF11 was a very
early, and smalll, disk, so I wouldn't be surprised if there hasn't been a
lot of software run on those emulators that uses it, to flush out bugs. It's
also kind of an odd duck; it's word-oriented, not block-orientd.) So, for
instance, a 'lost' disk interrupt would produce this symptom. Are there any
RF11 diagnostics online? That would be the thing I would start with.
And I guess this system doesn't include the KS11; a pity, code that uses it
would allow re-creation of the programming manual (the way the:
https://gunkies.org/wiki/ANTS/ISI_IMP_Interface
programming instructions were re-created).
> From: Angelo Papenhoff
> So the next step would be to restore the assembly source? :)
Having only the binary to work from (to start with) is not optimal; those
early versions of UNIX ran on a number of very different hardware
configurations (e.g. with or without the KS11), with conditional assembly to
handle different configurations. Having only the dis-assembled code for _this_
configuration would obviously leave the code for the others missing.
Still, having _this_ source _would_ be useful; e.g. the 'hang-up' problem
above; the easiest way to debug that would to put 'print' statements in the
code, where a disk operation was started, and completes. If it's 'losing' a
disk interrupt completion, that will show right up. (Been there, done that, on
the RK11 hardware emulator Bridgham and I built, when UNIX wouldn't boot, just
hung.) Although I suppose one could put break-points there. Trying to debug it
any other way would be painfu beyond belief.
Noel
Hi everyone,
I've cobbed together a crude Teletype Model 37 emulator that generates PDF files (https://github.com/TheBrokenPipe/Teletype-37-PDF) It produces sane-looking PDFs for most (all?) of the early UNIX ROFF/NROFF documents.
The biggest advantage of this over something like "roff $1 | enscript -c -f Courier12 -l -M Letter --margins=67:-9:0:-9 -p $1.ps -s -0.05" is it supports half (forward/reverse) line feeds which enscript does not. Early ROFF stuff like the UNIX manuals and memos made extensive use of subscripts (and superscripts), making them rather painful to typeset.
As an experiment, I re-set Ken Thompson's "Users' Reference to B" memo from early 1972 (https://github.com/TheBrokenPipe/kbman-reset) I picked this one because it contains a BNF-alike description of the grammar as well as fractions in code comments, both of which make extensive use of sub/superscripts. I went to the extent of overlaying the re-set pages on top of the originals to make sure everything lined up.
I'd really appreciate it if someone could review my work on the B manual. If everything looks good, I may tackle other documents, starting with low hanging fruits like the "V0" manual and potentially moving on to re-setting the V1 and V2 manuals in the future (building on aap's work).
Sincerely,
Yufeng
Hi everyone,
First-time poster here. Near the end of last year, I did some forensic analysis on the DMR tapes (https://www.tuhs.org/Archive/Applications/Dennis_Tapes) and had some fun playing around with them. Warren forwarded a few of my emails to this list at the end of last year and the beginning of this year, but it was never my intention for him to be my messenger, so I'm posting here myself now.
Here's an update on my work with the s1/s2 tapes - I've managed to get a working system out of them. The s1 tape is a UNIX INIT DECtape containing the kernel, while s2 includes most of the distribution files.
The s1 kernel is, to date, the earliest machine-readable UNIX kernel, sitting between V1 and V2. It differs from the unix-jun72 kernel in the following ways:
- It supports both V1 and V2 a.outs out of the box, whereas the unmodified unix-jun72 kernel supports only V1.
- The core size has been increased to 16 KiB (8K words), while the unmodified unix-jun72 kernel has an 8 KiB (4K word) user core.
On the other hand, its syscall table matches that of V1 and the unix-jun72 kernel, lacking all V2 syscalls. Since it aligns with V1 in terms of syscalls, has the V2 core size and can run V2 binaries, I consider it a "V2 beta".
login: root
root
# ls -la
total 42
41 sdrwrw 7 root 80 Jan 1 00:02:02 .
41 sdrwrw 7 root 80 Jan 1 00:02:02 ..
43 sdrwrw 2 root 620 Jan 1 00:01:30 bin
147 l-rwrw 1 root 16448 Jan 1 00:33:51 core
42 sdrwrw 2 root 250 Jan 1 00:01:51 dev
49 sdrwrw 2 root 110 Jan 1 00:01:55 etc
54 sdrwrw 2 root 50 Jan 1 00:00:52 tmp
55 sdrwrw 7 root 80 Jan 1 00:00:52 usr
# ls -la usr
total 8
55 sdrwrw 7 root 80 Jan 1 00:00:52 .
41 sdrwrw 7 root 80 Jan 1 00:02:02 ..
56 sdrwrw 2 28 60 Jan 1 00:02:22 fort
57 sdrwrw 2 jack 50 Jan 1 00:02:39 jack
58 sdrwrw 2 6 30 Jan 1 00:02:36 ken
59 sdrwrw 2 root 120 Jan 1 00:00:52 lib
60 sdrwrw 2 sys 50 Jan 1 00:02:45 sys
142 s-rwrw 1 jack 54 Jan 1 00:52:29 x
# ed
a
main() printf("hello world!\n");
.
w hello.c
33
q
# cc hello.c
I
II
# ls -l a.out
total 3
153 sxrwrw 1 root 1328 Jan 1 00:02:12 a.out
# a.out
hello world!
#
It's somewhat picky about the environment. So far, aap's PDP-11/20 emulator (https://github.com/aap/pdp11) is the only one capable of booting the kernel. SIMH and Ersatz-11 both hang before reaching the login prompt. This makes installation from the s1/s2 tapes difficult, as aap's emulator does not support the TC11. The intended installation process involves booting from s1 and restoring files from s2.
What I did was I extracted the files from the s1 tape and placed them on an empty RF disk, then installed the unix-jun72 kernel. After booting from the RF under SIMH, I extracted the remaining files from s2. Finally, I replaced the unix-jun72 kernel with the s1 kernel using a hex editor, resulting in an RF disk image containing only files from s1/s2. This RF image is bootable under aap's emulator but not SIMH.
The RF disk image can be downloaded from here (https://github.com/TheBrokenPipe/Research-UNIX-V2-Beta)
Direct link - https://github.com/TheBrokenPipe/Research-UNIX-V2-Beta/raw/refs/heads/main/…
Interestingly, its init(7) program does not mount the RK to /usr, suggesting that /usr was stored on the RF.
Sincerely,
Yufeng
Tom Van Vleck just posted to the multicians mailing list that he is
doing an update to the Unix page at multicians.org and is soliciting
feedback. I figure some folks here may have useful suggestions.
His draft is here: https://multicians.org/unix2.html
Comments directly to Tom, I suppose, but if interested parties would
rather discuss here I'd be happy to summarize and send to him as well.
- Dan C.