Why would anyone be interested in an old regex package that never was
a part of any Unix distro?
The driving force was Posix, whose regex spec was quite inscrutable. Could
there be a reference implementation? It was easy to fool every
implementation I could get my hands on, including Gnu's over-the-top
9000-line implementation.
But as I got into it, I got fascinated by regexes per se. In making a
recognizer, there's a tradeoff between contruction time and execution
time. Linear execution can be achieved, but at a potentially exponential
cost in construction time (and space). Backreferencing takes the regex
languages out of the class of regular languages.
Recalling that regular languages are closed under intersection and
negation, I wondered about how to implement new regex operators, &
and -. I came up with a scheme for this optional non-Posix feature that
involved layering continuation-passing over more traditional methods. And
while I was at it, I broke out smaller sublanguages for special treatment
(as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
in which the only operation is catenation.
And finally, having followed the development of C++ from its infancy,
I wanted to try out its new template facility, so there's a bit of
that in the package, too. Arnold has discovered that not only has C++
evolved, but also that without the discipline of -Wall to force clean
code, I was rather cavalier about casting, both explicitly and implicitly.
The only real customer the code ever had was the AST project, which
translated it to C. After the C++ had sat idle for a half-dozen years, I
thought to revive it in Linux, but found it riddled with incompatibilities
with that new environment and gave up. Arnold deserves a citation for
bravery in pushing that through 15 years further on.
Doug
[ I've always posted these to TUHS with no objections, so I have no idea
whether COFF would be a better forum; feel free to spank me (I might
even enjoy it!) ]
We lost Per Brinch Hansen, a computer scientist, on this day in 2007. He
specialised in operating systems and concurrent programming, and wrote the
classic book "Operating System Principles" which was published in six
languages for decades. He also wrote another book "The Architecture of
Concurrent Programs" which demonstrated an entire operating system written
in Concurrent Pascal (much like the Lions' books on Unix).
-- Dave
My DuckDuckGo-fu appears to be on the blink (and I refuse to use Google
out of privacy concerns); is there a PDF/PS/groff somewhere? I don't use
fancy-wanky markup languages.
Thanks.
-- Dave
> Thank you for the info - I will certainly look at the USENIX tapes.
>
> I will try to port the C compiler to amd64 - while preserving as much of
> the original code as I can. But not sure if this is even feasible.
>
> Thanks and Regards
> Dibyendu
If that is your goal, you might want to start with the version included with 2.11BSD. It is essentially the same as the version from V7, but with 15 more years of bug fixes. I used that source to port V6 Unix to the TI990 architecture back in 2014/2015 and the good thing about it is that it still compiles with a modern gcc.
For your project, I think you would be able to use the first pass ‘c0’ almost unchanged. The second pass ‘c1’ would need major restructuring. It mainly builds a tree for each expression and then performs various transformations, many of which are PDP11 specific (but also portable ones, like handling of constant expressions). It then covers the tree with code fragments selected from a library. This library (‘optable') would need a full rewrite as well. The last pass ‘c2’ is the optimiser and is also highly PDP11 specific. It reads the assembler output of ‘c1’ function by function, building an instruction list. It then performs some portable optimisations (eliminating unnecessary jumps, etc.) and also more PDP11 specific optimisations (the most complex being removing redundant register loads - the concept of which would be reusable).
There are about 12,000 lines of code and as a rough guess I would say that some 40% needs rewriting. A new code fragment library would probably be some 2 to 3 thousand lines.
I recall reading about a project to revive the Ritchie C compiler one or two years ago, but a quick web search came up dry. Anybody else remember reading that?
Hi All.
I have (mostly) revived Doug McIlroy's C++ regular expression parsing
library. I gratefully acknowledge and thank him for allowing me to
publish the code and for his help in finding all the bits and pieces.
It's available at https://github.com/arnoldrobbins/mcilroy-regex .
The main things I've done are to gather all the bits and pieces, rename files
to have a .cpp extension, and get everything to compile using current g++
and standard make.
I'm at the point where I could use some help. The various tests
do not all run successfully.
1. make retest - a number of tests fail
2. ./tesgrep.sh - a number of tests fail
3. ./testsed.sh - tests fail with core dumps
Looking briefly, some of the code in sed plays C games, casting various
things arouond to pointers of different types and dereferencing them;
these things tend to cause trouble in C++.
I'm hopeful that more eyes on this code will help it come back to life
more quickly. Any and all help will be appreciated.
Thanks,
Arnold
P.S. Let's not start a flame war about C vs. C++ etc. etc. If you can
help, please just dive in. Otherwise, just go, "wow, neat work" and
move on to something else. :-) Thanks.
On 15/07/2018, Warren Toomey <wkt(a)tuhs.org> wrote (in part):
> Also:
> https://www.youtube.com/watch?v=NTfOnGZUZDk
>
> Where GREP Came From - Computerphile (with Brian Kernighan)
I was intrigued by BMK's comment that "ed" was never spokend as "ed"
by "those in the know", which leads me to wonder how things were
spoken. Here is a litte list of how I pronounce things [with others'
versions in brackets]. Others will no doubt be aghast.
ls - "list" sometimes "l s";
rm - "remove";
chmod - "change mode" [but I have heard "ch-mode"]
ar - "archive" [others have said "arrr"]
N.
Hi
I am interested in finding out if the last C compiler code (not the
earliest versions which I know
are available) written by Dennis Ritchie is available somewhere. I
assume that the C compiler in V7 code was written by him?
Thanks and Regards
Dibyendu
On 7/18/18, Doug McIlroy <doug(a)cs.dartmouth.edu> wrote:
>
> The famous exception is grep, which became a verb.
I think the similarity to "grab" and "grope" helped.
> "grep for" and "grep out".
"grep for" I'm familiar with. What does it mean to "grep out"?
-Paul W.
Arnold was clerly on the Unix Room wavelength. ^All those two-letter
commands were spelled out in conversation, even m-v. The pronunciation
of rmdir was hybrid: r-m-dir. But when one talked about an action--not
a command per se--verbs would be used: move or copy a file, list
a directory. The famous exception is grep, which became a verb. There
was no snappy ready-made verb that covered all the aspects of its use:
search for mentions in one file, find files that mention, look for
patterns, filter data, check for malformed data, ... The verb had
two idiomatic variants, "grep for" and "grep out".
Doug