TUHS August 2020

tuhs@tuhs.org

61 participants
55 discussions

by Doug McIlroy

> 1. What's the provenance of regex in unix (when did it appear, in what form, etc)? > 2. What are the 'best' implementations throughout unix (keep it pre1980s)? > 3. What are some of the milestones along the way (major changes, forks, disagreements)? The editor ed was in Unix from day 1. For the necessarily tiny implementation, Ken discarded various features from the ancestral qed. Among the casualties was alternation in regular expressions. It has never fully returned. Ken's original paper described a method for simulating all paths of a nondeterministic finite automaton in parallel, although he didn't describe it in these exact terms. This meant he had to keep track of up to n possible states, where n is the number of terminal symbols in the regular expression. "Computing Reviews" published a scathing critique of the paper: everyone knows a deterministic automaton can recognize regular expressions with one state transition per input character; what a waste of time to have to keep track of multiple states! What the review missed was that the size of the DFA can be exponential in n. For one-shot use, as in an editor, it can take far longer to construct the DFA than to run it. This lesson came home with a vengeance when Al Aho wrote egrep, which implemented full regular expressions as DFA's. I happened to be writing calendar(1) at the same time, and used egrep to search calendar files for dates in rather free formats for today and all days through the next working day. Here's an example (egrep interprets newline as "|"): (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*1)([^0123456789]|$) (^|[ (,;])((\* *)0*1)([^0123456789]|$) (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*2)([^0123456789]|$) (^|[ (,;])((\* *)0*2)([^0123456789]|$) (^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*3)([^0123456789]|$) (^|[ (,;])((\* *)0*3)([^0123456789]|$) Much to Al's chagrin, this regular expression took the better part of a minute to compile into a DFA, which would then run in microseconds. The trouble was that the DFA was enormously bigger than the input--only a tiny fraction of the machine's states would be visited; the rest were useless. That led him to the brilliant idea of constructing the machine on the fly, creating only the states that were pertinent to the input at hand. That innovation made the DFA again competitive with an NFA. Doug

4 years, 11 months

Unix and SW Releases (was V7 et al from Will)

by Clem Cole

This topic is still primarily UNIX but is getting near the edge of COFF, so I'll CC there if people want to follow up. As I mentioned to Will, during the time Research was doing the work/put out their 'editions', the 'releases' were a bit more ephemeral - really a set of bits (binary and hopefully matching source, but maybe not always) that become a point in time. With 4th (and I think 5th) Editions it was a state of disk pack when the bits were copies, but by 6th edition, as Noel points out, there was a 'master tape' that the first site at an institution received upon executing of a signed license, so the people at each institution (MIT, Purdue, CMU, Harvard) passed those bits around inside. But what is more, is what Noel pointed out, we all passed source code and binaries between each other, so DNA was fairly mixed up [sorry Larry - it really was 'Open Source' between the licensees]. Sadly, it means some things that actually were sourced at one location and one system, is credited sometimes credited from some other place the >>wide<< release was in USG or BSD [think Jim Kulp's Job control, which ended up in the kernel and csh(1) as part in 4BSD, our recent discussions on the list about more/pg/less, the different networking changes from all of MIT/UofI/Rand, Goble's FS fixes to make the thing more crash resilient, the early Harvard ar changes - *a.k.a.* newar(1) which became ar(1), CMU fsck, e*tc*.]. Eventually, the AT&T Unix Support Group (USG) was stood up in Summit, as I understand it, originally for the Operating Companies as they wanted to use UNIX (but not for the licenses, originally). Steve Johnson moved from Research over there and can tell you many more of the specifics. Eventually (*i.e.* post-Judge Green), distribution to the world moved from MH's Research and the Patent Licensing teams to USG and AT&T North Carolina business folks. That said, when the distribution of UNIX moved to USG in Summit, things started to a bit more formal. But there were still differences inside, as we have tried to unravel. PWB/TS and eventually System x. FWIW, BSD went through the same thing. The first BSD's are really the binary state of the world on the Cory 11/70, later 'Ernie.' By the time CSRG gets stood up because their official job (like USG) is to support Unix for DARPA, Sam and company are acting a bit more like traditional SW firms with alpha/beta releases and a more formal build process. Note that 2.X never really went through that, so we are all witnessing the wonderful efforts to try to rebuild early 2.X BSD, and see that the ephemeral nature of the bits has become more obvious. As a side story ... the fact is that even for professional SW houses, it was not as pure as it should be. To be honest, knowing the players and processes involved, I highly doubt DEC could rebuild early editions of VMS, particularly since the 'source control' system was a physical flag in Cutler's office. The fact is that the problem of which bits were used to make what other bits was widespread enough throughout the industry that in the mid-late 80s when Masscomp won the bid to build the system that Nasa used to control the space shuttle post-Challenger, a clause of the contract was that we have put an archive of the bits running on the build machine ('Yeti'), a copy of the prints and even microcode/PAL versions so that Ford Aerospace (the prime contractor) could rebuild the exact system we used to build the binaries for them if we went bankrupt. I actually, had a duplicate of that Yeti as my home system ('Xorn') in my basement when I made some money for a couple of years as a contract/on-call person for them every time the shuttle flew. Anyway - the point is that documentation and actual bits being 100% in sync is nothing new. Companies work hard to try to keep it together, but different projects work at different speeds. In fact, the 'train release' model is what is usually what people fall into. You schedule a release of some piece of SW and anything that goes with it, has to be on the train or it must wait for the next one. So developers and marketing people in firms argue what gets to be the 'engine' [hint often its HW releases which are a terrible idea, but that's a topic for COFF].

4 years, 11 months

Re: [TUHS] Debugging help sought

by jnc＠mercury.lcs.mit.edu

> From: Warner Losh > 8 I think was the limit. IIRC, you could use longer names than that (in C), but external references only used the first 7 (in C - C symbols had a leading '_' tacked on; I used to know why), 8 (in assembler). > Could that cause this error? Seems unlikely - see below. > The error comes from lookloc. This is called for external type > relocations. It searches the local symbol table for something that > matches the relocation entry. This error happens when it can't find > it... Someone who actually looked at the source: https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/ld.c instead of just guessing. Give that man a star! I spent a while looking at the code, trying to figure out i) how it works, and ii) what's going wrong with that message, but I don't have a definitive answer. The code is not super well commented, so one has to actually understand what it's doing! :-) It seems to my initial perusal that it maintains two symbol tables, one for globals (which accumulates as each file is processed), and one for locals (which is discarded/reset for each file). As Werner mentioned, the message appears when a local symbol referenced in the relocation information in the current file can't be found (in the local symbol table). It's not, I think, simply due to too many local symbols in an input file - there seems to be a check for that as it's reading the input file symbol table: if (lp >= &local[NSYMPR]) error(1, "Local symbol overflow"); *lp++ = symno; *lp++ = sp; although of course there could be a bug which breaks this check. It seems to me that this is an 'impossible' error, one which can only happen due to i) a bug in the loader (a fencepost error, or something), or ii) an error in the input a.out file. I don't want to spend more time on it, since I'm not sure if you've managed to bypass the problem. If not, let me know, and we'll track it down. (This may involve you addding some printf's so we have more info about the details.) Noel

4 years, 11 months

Debugging help sought

by Adam Thornton

I finally munged lbforth.c (https://gist.github.com/lbruder/10007431) into compiling cleanly on mostly-stock v7 with the system compiler (lbforth itself does fine on 211BSD, but it needs a little help to build in a real K&R environment). Which would be nice, except that when it gets to the linker.... $ cc -o 4th forth.c ld:forth.o: Local symbol botch WTF? How do I begin to debug this? Adam

4 years, 11 months

Re: [TUHS] v7, adb, and fcreat

by jnc＠mercury.lcs.mit.edu

> From: Will Senn > it finally clicked that it is just one (of many) bit buckets out there > with the moniker v6. ... I am coming from a world where OS version > floppy/cd/dvd images are copies of a single master ... These tape things > could be snapshots of the systems they originate from at very different > times and with different software/sources etc. Well, sort of. Do remember that everyone with V6 had to have a license, which at that point you could _only_ get from Western Electric. So every _institution_ (which is not the same as every _machine_) had had to have had dealings with them. However, once your institution was 'in the club', stuff just got passed around. E.g. the BBN V6 system with TCP/IP: https://www.tuhs.org/cgi-bin/utree.pl?file=BBN-V6 I got that by driving over to BBN, and talking to Jack Haverty, and he gave us a tape (or maybe a pack, I don't recall exactly). But we had a V6 license, so we could do that. But my particular machine, it worked just the way you described: we got our V6 from the other V6 machine in the Tech Sq building (RTS/DSSR), including not only Bell post-V6 'leakage' like adb, but their local hacks (e.g. their TTY driver, and the ttymod() system call to get to its extended features; the ability to suspend selected applications; yadda, yadda). We never saw a V6 tape. Noel

4 years, 11 months

Re: [TUHS] v7, adb, and fcreat

by jnc＠mercury.lcs.mit.edu

> From: Will Senn > I don't think adb was in v6, where the fcreat function and buf struct > are used... Were Maranzano and Bourne using some kind of hybrid 6+ system? In addition to the point about skew between the released and internal development, it's worth remembering just how long it was between the V6 and V7 releases, and how much ground was covered technically during that period. A lot of that stuff leaked out: we've talked about the upgraded 'Typesetter C' (and compilers), which a lot of people had, and the V6+ system at MIT (partially sort of PWB1) had both 'adb' and the stdio library. The latter also made its way to lots of places; in my 'improved V6 page': http://www.chiappa.net/~jnc/tech/ImprovingV6.html it talks about finding the standard I/O stuff in several later V6 repositories, including a UNSW tape. But it and typsetter C, also on the Shoppa pack, were clearly quite widespread. Noel

4 years, 11 months

v7, adb, and fcreat

by Will Senn

I've done research on this, but I'm confused and would appreciate some help to understand what's going on. In the 7th edition manual, vol 2, there's an ADB tutorial (pp. 323-336). In the tutorial, the authors, Maranzano and Bourne, walk the reader through a debugging session. The first example is predicated on a buffer overflow bug and the code includes: struct buf { int fildes; int nleft; char *nextp; char buff[512]; }bb; struct buf *obuf; ... if((fcreat(argv[1],obuf)) < 0){ ... Well, this isn't v7 code. As discussed in the v7 manual vol 1 (p. VII): Standard I/O. The old fopen, getc, putc complex and the old –lp package are both dead, and even getchar has changed. All have been replaced by the clean, highly efficient, stdio(3) package. The first things to know are that getchar(3) returns the integer EOF (–1), which is not a possible byte value, on end of file, that 518-byte buffers are out, and that there is a defined FILE data type. The buffers are out, fcreat is gone, etc. So, what's up with this? I don't think adb was in v6, where the fcreat function and buf struct are used... Were Maranzano and Bourne using some kind of hybrid 6+ system? Thanks, Will -- GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF

4 years, 11 months

2.11BSD Status Update

by Warner Losh

OK. I'm plowing through a lot of issues with the putative 2.11BSD reconstructions I've done to date. I keep finding things dated too new to be right. And it turns out that a few patches "snuck in" when the patch 80 catch up was done. I've outlined the ones I've found so far at https://bsdimp.blogspot.com/2020/08/missing-211bsd-patches.html but I'm sure there's at least one more. There was much ambiguity over /usr/new and /usr/local that lead to some of these, but others look like they were in the master tree, but never formally published that have all the hallmarks of legit bug fixes... I've also detailed the issues in going backwards. 2.11BSDpl195 had a different .o format than 2.11BSDpl0. And to make matters worse, its assembler couldn't assemble the assembler from the initial release, so I had to get creative (using apout, thanks to all who contributed to that!). I've also blogged about how to walk back a binary change when the old programs no longer build on the new system. I think I got lucky that it was possible at all :). https://bsdimp.blogspot.com/2020/08/bootstrapping-211bsd-no-patches-from.ht… has the blow by blow. There are a lot of steps to building even a normal system... Let alone walking through the minefield of errors that you need to do when stepping back... And neither of these even begin to get into the issues with the build system itself requiring workarounds for that... But anyway, I keep making "ur2.11BSD" tapes, installing them and fixing the issues I find... While much information was destroyed in the process, there's a surprising amount of redundancy that one can use to 'test' putative tapes. Warner P.S. ur2.11BSD is from urFOO in linguisting, meaning the original FOO that's been lost and which some amount of reconstruction / speculation is offered about it. Still looking for a good name for the reconstructed 2.11BSD release....

4 years, 11 months

Re: [TUHS] Regular Expressions

by Rudi Blom

Probably same as others do, when I'm implementing some 'trace' messages in a new program or one just 'under investigation' I try to make sure the message has a nice format to be able to process a few megabyte of logfile easily. Cheers, uncle rubl

4 years, 11 months

Re: [TUHS] Dennis Ritchie's Dissertation

by norman＠oclsc.org

John Gilmore: Yes -- but [Bell Labs'] administration was anything but egalitarian or meritocratic. I know someone who had immense trouble getting inside the door at the Labs because "all" he had was a bachelor's degree. Let their character be judged by how they treated a stranger. Sign me proud to have succeeded in life with no degrees at all, ==== That was where local management came in. I have no degrees at all. I haven't been nearly as successful in many ways as John, but I was recruited and hired by 1127. That I had no degree meant I was initially hired as a `special technical assistant' rather than a `member of technical staff,' but my department head and director and executive director (the last was the legendary Vic Vyssotsky) worked tirelessly on my behalf, without my pushing them at all, to get me upgraded, and succeeded after I'd been there about a year. It was only later that I realized just how much work they'd done on my behalf. The upgrade gave me a big raise in pay, but I was young enough and nerdy enough not to notice. Within the 1127 culture there was no perceptible difference; it was very much an egalitarian culture. I felt respected as an equal from the start (really from the day and a half I spent interviewing there). Not every part of the Labs, let alone AT&T, was like that, especially outside of the Research area. I didn't realize it initially but that was one of the ways I benefited from the success of UNIX (that 1127's and 112's management could push past such bureaucratic barriers). After all, Ken never had more than an MS. Norman Wilson Toronto ON

4 years, 11 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS August 2020