TUHS

tuhs@tuhs.org

29 participants
6557 discussions

by Doug McIlroy

> 1. What's the provenance of regex in unix (when did it appear, in what form, etc)? > 2. What are the 'best' implementations throughout unix (keep it pre1980s)? > 3. What are some of the milestones along the way (major changes, forks, disagreements)? The editor ed was in Unix from day 1. For the necessarily tiny implementation, Ken discarded various features from the ancestral qed. Among the casualties was alternation in regular expressions. It has never fully returned. Ken's original paper described a method for simulating all paths of a nondeterministic finite automaton in parallel, although he didn't describe it in these exact terms. This meant he had to keep track of up to n possible states, where n is the number of terminal symbols in the regular expression. "Computing Reviews" published a scathing critique of the paper: everyone knows a deterministic automaton can recognize regular expressions with one state transition per input character; what a waste of time to have to keep track of multiple states! What the review missed was that the size of the DFA can be exponential in n. For one-shot use, as in an editor, it can take far longer to construct the DFA than to run it. This lesson came home with a vengeance when Al Aho wrote egrep, which implemented full regular expressions as DFA's. I happened to be writing calendar(1) at the same time, and used egrep to search calendar files for dates in rather free formats for today and all days through the next working day. Here's an example (egrep interprets newline as "|"): (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*1)([^0123456789]|$) (^|[ (,;])((\* *)0*1)([^0123456789]|$) (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*2)([^0123456789]|$) (^|[ (,;])((\* *)0*2)([^0123456789]|$) (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*3)([^0123456789]|$) (^|[ (,;])((\* *)0*3)([^0123456789]|$) Much to Al's chagrin, this regular expression took the better part of a minute to compile into a DFA, which would then run in microseconds. The trouble was that the DFA was enormously bigger than the input--only a tiny fraction of the machine's states would be visited; the rest were useless. That led him to the brilliant idea of constructing the machine on the fly, creating only the states that were pertinent to the input at hand. That innovation made the DFA again competitive with an NFA. Doug

5 years

Unix and SW Releases (was V7 et al from Will)

by Clem Cole

This topic is still primarily UNIX but is getting near the edge of COFF, so I'll CC there if people want to follow up. As I mentioned to Will, during the time Research was doing the work/put out their 'editions', the 'releases' were a bit more ephemeral - really a set of bits (binary and hopefully matching source, but maybe not always) that become a point in time. With 4th (and I think 5th) Editions it was a state of disk pack when the bits were copies, but by 6th edition, as Noel points out, there was a 'master tape' that the first site at an institution received upon executing of a signed license, so the people at each institution (MIT, Purdue, CMU, Harvard) passed those bits around inside. But what is more, is what Noel pointed out, we all passed source code and binaries between each other, so DNA was fairly mixed up [sorry Larry - it really was 'Open Source' between the licensees]. Sadly, it means some things that actually were sourced at one location and one system, is credited sometimes credited from some other place the >>wide<< release was in USG or BSD [think Jim Kulp's Job control, which ended up in the kernel and csh(1) as part in 4BSD, our recent discussions on the list about more/pg/less, the different networking changes from all of MIT/UofI/Rand, Goble's FS fixes to make the thing more crash resilient, the early Harvard ar changes - *a.k.a.* newar(1) which became ar(1), CMU fsck, e*tc*.]. Eventually, the AT&T Unix Support Group (USG) was stood up in Summit, as I understand it, originally for the Operating Companies as they wanted to use UNIX (but not for the licenses, originally). Steve Johnson moved from Research over there and can tell you many more of the specifics. Eventually (*i.e.* post-Judge Green), distribution to the world moved from MH's Research and the Patent Licensing teams to USG and AT&T North Carolina business folks. That said, when the distribution of UNIX moved to USG in Summit, things started to a bit more formal. But there were still differences inside, as we have tried to unravel. PWB/TS and eventually System x. FWIW, BSD went through the same thing. The first BSD's are really the binary state of the world on the Cory 11/70, later 'Ernie.' By the time CSRG gets stood up because their official job (like USG) is to support Unix for DARPA, Sam and company are acting a bit more like traditional SW firms with alpha/beta releases and a more formal build process. Note that 2.X never really went through that, so we are all witnessing the wonderful efforts to try to rebuild early 2.X BSD, and see that the ephemeral nature of the bits has become more obvious. As a side story ... the fact is that even for professional SW houses, it was not as pure as it should be. To be honest, knowing the players and processes involved, I highly doubt DEC could rebuild early editions of VMS, particularly since the 'source control' system was a physical flag in Cutler's office. The fact is that the problem of which bits were used to make what other bits was widespread enough throughout the industry that in the mid-late 80s when Masscomp won the bid to build the system that Nasa used to control the space shuttle post-Challenger, a clause of the contract was that we have put an archive of the bits running on the build machine ('Yeti'), a copy of the prints and even microcode/PAL versions so that Ford Aerospace (the prime contractor) could rebuild the exact system we used to build the binaries for them if we went bankrupt. I actually, had a duplicate of that Yeti as my home system ('Xorn') in my basement when I made some money for a couple of years as a contract/on-call person for them every time the shuttle flew. Anyway - the point is that documentation and actual bits being 100% in sync is nothing new. Companies work hard to try to keep it together, but different projects work at different speeds. In fact, the 'train release' model is what is usually what people fall into. You schedule a release of some piece of SW and anything that goes with it, has to be on the train or it must wait for the next one. So developers and marketing people in firms argue what gets to be the 'engine' [hint often its HW releases which are a terrible idea, but that's a topic for COFF].

5 years

Re: [TUHS] Debugging help sought

by jnc＠mercury.lcs.mit.edu

> From: Warner Losh > 8 I think was the limit. IIRC, you could use longer names than that (in C), but external references only used the first 7 (in C - C symbols had a leading '_' tacked on; I used to know why), 8 (in assembler). > Could that cause this error? Seems unlikely - see below. > The error comes from lookloc. This is called for external type > relocations. It searches the local symbol table for something that > matches the relocation entry. This error happens when it can't find > it... Someone who actually looked at the source: https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/ld.c instead of just guessing. Give that man a star! I spent a while looking at the code, trying to figure out i) how it works, and ii) what's going wrong with that message, but I don't have a definitive answer. The code is not super well commented, so one has to actually understand what it's doing! :-) It seems to my initial perusal that it maintains two symbol tables, one for globals (which accumulates as each file is processed), and one for locals (which is discarded/reset for each file). As Werner mentioned, the message appears when a local symbol referenced in the relocation information in the current file can't be found (in the local symbol table). It's not, I think, simply due to too many local symbols in an input file - there seems to be a check for that as it's reading the input file symbol table: if (lp >= &local[NSYMPR]) error(1, "Local symbol overflow"); *lp++ = symno; *lp++ = sp; although of course there could be a bug which breaks this check. It seems to me that this is an 'impossible' error, one which can only happen due to i) a bug in the loader (a fencepost error, or something), or ii) an error in the input a.out file. I don't want to spend more time on it, since I'm not sure if you've managed to bypass the problem. If not, let me know, and we'll track it down. (This may involve you addding some printf's so we have more info about the details.) Noel

5 years

Debugging help sought

by Adam Thornton

I finally munged lbforth.c (https://gist.github.com/lbruder/10007431) into compiling cleanly on mostly-stock v7 with the system compiler (lbforth itself does fine on 211BSD, but it needs a little help to build in a real K&R environment). Which would be nice, except that when it gets to the linker.... $ cc -o 4th forth.c ld:forth.o: Local symbol botch WTF? How do I begin to debug this? Adam

5 years

Re: [TUHS] v7, adb, and fcreat

by jnc＠mercury.lcs.mit.edu

> From: Will Senn > it finally clicked that it is just one (of many) bit buckets out there > with the moniker v6. ... I am coming from a world where OS version > floppy/cd/dvd images are copies of a single master ... These tape things > could be snapshots of the systems they originate from at very different > times and with different software/sources etc. Well, sort of. Do remember that everyone with V6 had to have a license, which at that point you could _only_ get from Western Electric. So every _institution_ (which is not the same as every _machine_) had had to have had dealings with them. However, once your institution was 'in the club', stuff just got passed around. E.g. the BBN V6 system with TCP/IP: https://www.tuhs.org/cgi-bin/utree.pl?file=BBN-V6 I got that by driving over to BBN, and talking to Jack Haverty, and he gave us a tape (or maybe a pack, I don't recall exactly). But we had a V6 license, so we could do that. But my particular machine, it worked just the way you described: we got our V6 from the other V6 machine in the Tech Sq building (RTS/DSSR), including not only Bell post-V6 'leakage' like adb, but their local hacks (e.g. their TTY driver, and the ttymod() system call to get to its extended features; the ability to suspend selected applications; yadda, yadda). We never saw a V6 tape. Noel

5 years

Re: [TUHS] v7, adb, and fcreat

by jnc＠mercury.lcs.mit.edu

> From: Will Senn > I don't think adb was in v6, where the fcreat function and buf struct > are used... Were Maranzano and Bourne using some kind of hybrid 6+ system? In addition to the point about skew between the released and internal development, it's worth remembering just how long it was between the V6 and V7 releases, and how much ground was covered technically during that period. A lot of that stuff leaked out: we've talked about the upgraded 'Typesetter C' (and compilers), which a lot of people had, and the V6+ system at MIT (partially sort of PWB1) had both 'adb' and the stdio library. The latter also made its way to lots of places; in my 'improved V6 page': http://www.chiappa.net/~jnc/tech/ImprovingV6.html it talks about finding the standard I/O stuff in several later V6 repositories, including a UNSW tape. But it and typsetter C, also on the Shoppa pack, were clearly quite widespread. Noel

5 years

v7, adb, and fcreat

by Will Senn

I've done research on this, but I'm confused and would appreciate some help to understand what's going on. In the 7th edition manual, vol 2, there's an ADB tutorial (pp. 323-336). In the tutorial, the authors, Maranzano and Bourne, walk the reader through a debugging session. The first example is predicated on a buffer overflow bug and the code includes: struct buf { int fildes; int nleft; char *nextp; char buff[512]; }bb; struct buf *obuf; ... if((fcreat(argv[1],obuf)) < 0){ ... Well, this isn't v7 code. As discussed in the v7 manual vol 1 (p. VII): Standard I/O. The old fopen, getc, putc complex and the old –lp package are both dead, and even getchar has changed. All have been replaced by the clean, highly efficient, stdio(3) package. The first things to know are that getchar(3) returns the integer EOF (–1), which is not a possible byte value, on end of file, that 518-byte buffers are out, and that there is a defined FILE data type. The buffers are out, fcreat is gone, etc. So, what's up with this? I don't think adb was in v6, where the fcreat function and buf struct are used... Were Maranzano and Bourne using some kind of hybrid 6+ system? Thanks, Will -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF

5 years

2.11BSD Status Update

by Warner Losh

OK. I'm plowing through a lot of issues with the putative 2.11BSD reconstructions I've done to date. I keep finding things dated too new to be right. And it turns out that a few patches "snuck in" when the patch 80 catch up was done. I've outlined the ones I've found so far at https://bsdimp.blogspot.com/2020/08/missing-211bsd-patches.html but I'm sure there's at least one more. There was much ambiguity over /usr/new and /usr/local that lead to some of these, but others look like they were in the master tree, but never formally published that have all the hallmarks of legit bug fixes... I've also detailed the issues in going backwards. 2.11BSDpl195 had a different .o format than 2.11BSDpl0. And to make matters worse, its assembler couldn't assemble the assembler from the initial release, so I had to get creative (using apout, thanks to all who contributed to that!). I've also blogged about how to walk back a binary change when the old programs no longer build on the new system. I think I got lucky that it was possible at all :). https://bsdimp.blogspot.com/2020/08/bootstrapping-211bsd-no-patches-from.ht… has the blow by blow. There are a lot of steps to building even a normal system... Let alone walking through the minefield of errors that you need to do when stepping back... And neither of these even begin to get into the issues with the build system itself requiring workarounds for that... But anyway, I keep making "ur2.11BSD" tapes, installing them and fixing the issues I find... While much information was destroyed in the process, there's a surprising amount of redundancy that one can use to 'test' putative tapes. Warner P.S. ur2.11BSD is from urFOO in linguisting, meaning the original FOO that's been lost and which some amount of reconstruction / speculation is offered about it. Still looking for a good name for the reconstructed 2.11BSD release....

5 years

Re: [TUHS] Regular Expressions

by Rudi Blom

Probably same as others do, when I'm implementing some 'trace' messages in a new program or one just 'under investigation' I try to make sure the message has a nice format to be able to process a few megabyte of logfile easily. Cheers, uncle rubl

5 years

Re: [TUHS] Dennis Ritchie's Dissertation

by norman＠oclsc.org

John Gilmore: Yes -- but [Bell Labs'] administration was anything but egalitarian or meritocratic. I know someone who had immense trouble getting inside the door at the Labs because "all" he had was a bachelor's degree. Let their character be judged by how they treated a stranger. Sign me proud to have succeeded in life with no degrees at all, ==== That was where local management came in. I have no degrees at all. I haven't been nearly as successful in many ways as John, but I was recruited and hired by 1127. That I had no degree meant I was initially hired as a `special technical assistant' rather than a `member of technical staff,' but my department head and director and executive director (the last was the legendary Vic Vyssotsky) worked tirelessly on my behalf, without my pushing them at all, to get me upgraded, and succeeded after I'd been there about a year. It was only later that I realized just how much work they'd done on my behalf. The upgrade gave me a big raise in pay, but I was young enough and nerdy enough not to notice. Within the 1127 culture there was no perceptible difference; it was very much an egalitarian culture. I felt respected as an equal from the start (really from the day and a half I spent interviewing there). Not every part of the Labs, let alone AT&T, was like that, especially outside of the Research area. I didn't realize it initially but that was one of the ways I benefited from the success of UNIX (that 1127's and 112's management could push past such bureaucratic barriers). After all, Ken never had more than an MS. Norman Wilson Toronto ON

5 years

Dennis Ritchie's Dissertation

by Kirk McKusick

The Computer History Museum has an interesting blog post about Dennis Ritchie's lost dissertation: https://computerhistory.org/blog/discovering-dennis-ritchies-lost-dissertat… Interesting fact is that Dennis never received his PhD because he failed to provide a bound copy of his dissertation to the Harvard library. Kirk McKusick

5 years

Re: [TUHS] Dennis Ritchie's dissertation

by Doug McIlroy

> The use of honorifics was subtly discouraged at the Labs. I never saw a policy statement, but nobody I knew used "Dr" (except those in the medical department) With the sole exception of the president's office, secretaries were instructed not to say "Dr so-and-so's office" when they picked up an unanswered phone call. (When that happened you could be sure that the party you were calling was genuinely unavailable. Part of the AT&T ethos--now abandoned--was that everybody, right up to the president, answered their own phones.) Doug

5 years

vi in v7

by Will Senn

On another front. I know I've asked this before in v6, and possibly related to v7, but I can't find the notes anywhere. vi doesn't come with v7. So, has anybody put it on v7 in simh? I saw a thread sometime back where vi on v7 wasn't the main topic, where Warren? I think it was, said he'd done it and it was "easy." I don't suppose there are any notes laying around telling how this might be accomplished? I do see vi in 2bsd.tar, I don't suppose there is a 'how to install 2bsd on v7" note around either? Thanks, Will -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF

5 years

BSDI git repo?

by Steve Mynott

Is there full bsdi git repo anywhere? I've vague recollections parts were merged into FreeBSD in the early 2000s so I assume it was open sourced? There is a tarball of bsdi 2 on venus wetware but that's the best I can do with searching. -- Steve Mynott <steve.mynott(a)gmail.com> cv25519/ECF8B611205B447E091246AF959E3D6197190DD5

5 years

Re: [TUHS] Regular Expressions

by jnc＠mercury.lcs.mit.edu

> From: Will Senn > I don't really understand how they work. > ... > maybe the way to understand the unix regex lies in a careful > investigation into how it is implemented I'm not sure what I did, but it wasn't the latter, since I have no idea how they are done! I just mentally break the regex search string up into substrings (I use them most in Epsilon, which has syntax to do substrings of search strings, which helps a lot); past that, I think it's just using them and getting used to them. > an IBM 7094 (whatever that is) IBM's last 36-bit scientific mainframe before the System/360's. CTSS (which DMR held out as the ancestor of Unix) ran on 7094's. Noel

5 years

Re: [TUHS] Dennis Ritchie's Dissertation

by Doug McIlroy

> "My graduate school experience convinced me that I was not smart enough to > be an expert in the theory of algorithms and also that I liked procedural > languages better than functional ones." > > Amen to that. Me too, I tried functional languages and my head hurt. C > seems so natural to me. Dennis made quite a generalization from a sample of one--Lisp, the only functional language that existed when he was in grad school. I'm sure he'd agree today that functional languages shine for spplications rooted in algebraic domains. I immodestly point to www.cs.dartmouth.edu/~doug/powser.html, which has nothing to do with Unix, but certainly would have appealed to Dennis. Doug

5 years

Re: [TUHS] Will pdp 11/04 run unix?

by jnc＠mercury.lcs.mit.edu

> From: Angelo Papenhoff > I believe 11/20 UNIX also needs the EAE. Some applications might have used it (the story about the KS11 bug with the KW11-A confirms they did use it on that machine), but I found no trace of use of it in a quick scan of the entire Version 1 source (the one which is extant). Also, the first file in the OS source: https://www.tuhs.org/cgi-bin/utree.pl?file=V1/u0.s lists the addresses of all device registers, and the KE11-A isn't there. If the KE11 is needed to run some application on the -11/04, there are KE11-B's (program compatible, but a single hex card) available, ISTR. For emulation, something (SIMH?) supports it, since the TV -11 on ITS (now running in emulation,I'm pretty sure) uses it. Noel

5 years

Re: [TUHS] Will pdp 11/04 run unix?

by jnc＠mercury.lcs.mit.edu

> From: Will Senn > can I run Unix on a PDP 11/04? No, it doesn't have memory management, so not any of the well-known 'stock' versions (V5/V6/etc). Two choices, though: - If you get the V1 that ran on an -11/20 (which is mostly compatible with the /04 and /05), it should run on an /04. (Not sure what you'd use for mass storage, on a physical /04, though.) I'm not sure when they dropped the /20 - I think V4 n(at the latest)? But V2 and V3 are lost. - There's a 'Unix' for the LSI-11, and with minor changes (the LSI-11 isn't 100% compatible with other MMU-less 11's, but the changes are minor, e.g. MOS, written in MACRO-11, was conditionalized to run on both the LSI-11 and the -11/20) it should run on an /04. Noel

5 years

Re: [TUHS] Will pdp 11/04 run unix?

by jnc＠mercury.lcs.mit.edu

> From: Clem Cole > And if you could find a KS-11 MMU that Ken and Dennis had for the 11/20 > ... we can't even find documentation about it (Ken's surviving code is > the best doc we have). Where is that code? The Version 1 at TUHS appears to pre-date it. It would be great to have a look at it, we might be able to partially document the KS11 using it. (Ken had only vague memories of the KS11.) > From: Ronald Natalie > There's always MiniUnix. Ah; I didn't realize that was something different from LSX (the LSI-11 system). Noel

5 years

Will pdp 11/04 run unix?

by Will Senn

The question is can I run Unix on a PDP 11/04? I've dug around and it's unclear to me, so I'm asking y'all. Will -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF

5 years

v7 tar and 2bsd.tar

by Will Senn

I'm trying to get 2bsd.tar extracted into v7. Does anyone have any recollection how to do this. Here's what I'm seeing: in simh: simh> att tm0 -V -F TAR 2bsd.tar Tape Image '2bsd.tar' scanned as TAR format. contains 4935680 bytes of tape data (482 records, 1 tapemarks) c and in v7: tar xv0 tar: bin/ - cannot create directory checksum error # ls bin What gives - it says it can't create the dir, but it does, it's there... Thanks, Will -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF

5 years

Re: [TUHS] Troff to ps

by Doug McIlroy

> There was quite some communication between Peter Nilson (npn, known > for picasso) and bwk. In the interest of accuracy npn's full name is Nils-Peter Nelson. He honchoed the Bell Labs Cray and originated <string.h>. Doug

5 years

Re: [TUHS] Troff to ps (Will Senn)

by Steve Simon

Hi folks, I used thack to typeset my dissertation on v7 circa 1988. It converts C/A/T codes to postscript. i have no idea if it will cope with eqn or pic but it was enough for nicely formatted text as i remember. this is the patch/bugfix with a link to the original package. https://www.tuhs.org/Usenet/comp.sources.misc/1989-July/001073.html -Steve

5 years

Re: [TUHS] Troff to ps

by norman＠oclsc.org

Nemu Nusquam: When was dpost born? ===== CSTR 97, A Typesetter-Independent TROFF by Brian W Kernighan was issued in 1981 and revised the next year. So that's the earliest possible date. I vaguely remember the existence of Postscript support in general, including at least one Apple Laserwriter kicking around somewhere, starting at some point during my time at 1127 in the latter 1980s. There was even a Postscript display engine that ran on 5620 terminals under mux, though it wasn't normally used for troff previewing because the troff-specific proofer was faster (mainly, I think, it didn't send nearly as much data down the serial line to the terminal). My personal snapshot of V10, and the TUHS archive copy, include dpost; see src/cmd/postscript/dpost. Everything in the postscript directory came from USG, who had packaged everything troff into a separately-licensed Documenter's Workbench package. That may have made us exclude it from the officially-distributed V8 tape and V9 snapshots. In any case, the only V9 snapshot I know of offhand (which is in Warren's archive) has no dpost. Both my copy of V10 and the TUHS copy show dpost's source files with dates in 1991, but it was certainly there earlier if I used it in New Jersey (I left in mid-1990). Dpost is documented in man8/postscript.8; my copy of that file is dated October 1989. Digging around in documents available on the web, I found a bundle of DWB 2.0 docs: http://www.bitsavers.org/pdf/att/unix/Documentors_Workbench_1989/UNIX_Syste… It's a scanned-image PDF so I can't search it by machine, but it includes such things as listings of the source-code directory and manifests of various binary distributions, and dpost doesn't appear anywhere I can see. As the URL implies, the docs seem to be dated 1989. So maybe dpost wasn't part of the product until DWB 3.0; but maybe we in Research got an early copy of the postscript stuff (I think bwk was in regular communication with the USG-troff folks), perhaps in 1989. I confess I've lost track of the original question that spawned this thread, but if it is whether dpost is easily back-ported to PDP-11 UNIX, I don't think that's likely without a good bit of work. It would very likely require a post-1980-type C compiler, since it was written in the late 1980s. It might or might not fit on a PDP-11; I don't remember whether USG's system still officially ran there by the late 1980s. Norman Wilson Toronto ON

5 years

Re: [TUHS] V6 Console IO

by jnc＠mercury.lcs.mit.edu

> From: Paul Riley > I'm struggling however with how C processes the IO. It seems that when I > type at the console, my typing is immediately echoed to my terminal > window. ... nothing appears on the terminal until I press enter, when > the system displays the whole line of input ... How > can I suppress the original C/Unix echo, and get my output to appear > immediately? This is not a C issue; it's the Unix I/O system (and specifically, terminal I/O). Normally, Unix terminal input is done line-at-a-time: i.e. the read() call to the OS (whether for 1 character, or a large number) doesn't return until an enire line has been typed, and [Retrurn] has been hit; then the entire line is available. While it's being buffered by the OS, echoing is done, and rubout processing is also performed. One can suppress all this; there's a mode call 'raw' (the normal mode is sometime labelled 'cooked') which suppresses all that, and just gives one the characters actually typed, as they are typed. The stty() system call can be used to turn this on. See the V6 tty(IV) manual entry for more. stty() is in stty(II). Noel

5 years

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS