On Sun, May 17, 2020 at 01:53:08AM +0200, Steffen Nurpmeso wrote:
Tony Finch wrote in
<alpine.DEB.2.20.2005142316170.3374(a)grey.csi.cam.ac.uk>:
|Larry McVoy <lm(a)mcvoy.com> wrote:
|>
|> It's got some perl goodness, regexps are part of the syntax, ....
|
|I got into Unix after perl and I've used it a lot. Back in the 1990s I saw
|Henry Spencer's joke that perl was the Swiss Army Chainsaw of Unix, as a
|riff on lex being its Swiss Army Knife. I came to appreciate lex
|regrettably late: lex makes it remarkably easy to chew through a huge pile
|of text and feed the pieces to some library code written in C. I've been
|using re2c recently (
http://re2c.org/) which is differently weird than
|lex, though it still uses YY in all its variable names. It's remarkable
|how much newer lexer/parser generators can't escape from the user
|interface of lex/yacc. Another YY example:
http://www.hwaci.com/sw/lemon/
P.S.: i really hate automated lexers. I never ever got used to
use them. For learning i once tried to use flex/bison, but
i failed really hard. I like that blood, sweat and tears thing,
and using a lexer seems so shattered, all the pieces. And i find
them really hard to read.
They are not bad if you are good at it. One of my guys has a PhD in
compilers and he's good at it.
They are not good at performance. BitKeeper has an extensive printf
like (sort of, different syntax) language that can be used to customize
log output. Rob originally did all that in flex/bison but the performance
started to hurt so he rewrote it all:
/*
* This is a recursive-descent parser that implements the following
* grammar for dspecs (where [[...]] indicates an optional clause
* and {{...}} indicates 0 or more repetitions of):
*
* <stmt_list> -> {{ <stmt> }}
* <stmt> ->
$if(<expr>){<stmt_list>}[[$else{<stmt_list>}]]
* -> $unless(<expr>){<stmt_list>}[[$else{<stmt_list>}]]
* -> $each(:ID:){<stmt_list>}
* -> ${<num>=<stmt_list>}
* -> <atom>
* <expr> -> <expr2> {{ <logop> <expr2> }}
* <expr2> -> <str> <relop> <str>
* -> <str>
* -> (<expr>)
* -> !<expr2>
* <str> -> {{ <atom> }}
* <atom> -> char
* -> escaped_char
* -> :ID:
* -> (:ID:)
* -> $<num>
* <logop> -> " && " | " || "
* <relop> -> "=" | "!=" | "=~"
* -> " -eq " | " -ne " | " -gt " | "
-ge " | " -lt " | " -le "
*
* This grammar is ambiguous due to (:ID:) loooking like a
* parenthesized sub-expression. The code tries to parse (:ID:) first
* as an $each variable, then as a regular :ID:, then as regular text.
*
* Note that this is broken: $if((:MERGE:)){:REV:}
*
* The following procedures can be thought of as implementing an
* attribute grammar where the output parameters are synthesized
* attributes which hold the expression values and the next token
* of lookahead in some cases. It has been written for speed.
*
* NOTE: out==0 means evaluate but throw away.
*
* Written by Rob Netzer <rob(a)bolabs.com> with some hacking
* by wscott & lm.
*/
That stuff screams perf wise.