At 2021-02-21T20:34:55-0600, Will Senn wrote:
All,
So, we've been talking low-level design for a while. I thought I would
ask a fundamental question. In days of old, we built small
single-purpose utilities and used pipes to pipeline the data and
transformations. Even back in the day, it seemed that there was
tension to add yet another option to every utility. Today, as I was
marveling at groff's abilities with regard to printing my man pages
directly to my printer in 2021, I read the groff(1) page:
example here:
https://linux.die.net/man/1/groff
A more up to date copy is available at the Linux man-pages site.
https://man7.org/linux/man-pages/man1/groff.1.html
What struck me (the wrong way) was the second
paragraph of the
description:
The groff program allows to control the whole groff system by command
line options. This is a great simplification in comparison to the
classical case (which uses pipes only).
What strikes _me_ about the above is the awful Denglish in it. I fixed
this back in 2017 and the correction shipped as part of groff 1.22.4 in
December 2018.
Here is the current plethora of options:
groff [-abcegilpstzCEGNRSUVXZ] [-d cs] [-f fam] [-F dir] [-I dir] [-L arg]
[-m name] [-M dir] [-n num] [-o list] [-P arg] [-r cn] [-T dev] [-w name]
[-W name] [file ...]
Now, I appreciate groff, don't get me wrong, but my sensibilities were
offended by the idea that a kazillion options was in any way simpler
than pipelining single-purpose utilities. What say you? Is this the
perfected logical extension of the unix pioneers' work, or have we
gone horribly off the trail.
I'd say it's neither, and reflects (1) the limitations of the Unix
filter model, or at least the linear topology of Unix pipelines[1]; and
(2) an arbitrary set of rules determined by convention and common
practice with respect to sequencing.
Consider the first the question of which *roff preprocessor languages
should be embeddable in another preprocessor's language. Should you be
able to embed equations in tables? What about tables inside equations
(not too insane an idea--consider matrix literals)? Nothing in the Unix
filter model implies a choice between these decisions, but an ordering
decision must be made.
V7 Unix tbl(1)'s man page[3] took a moderately strong position on
preprocessor ordering based on more practical concerns (I suppose
loading on shared systems).
When it is used with
.I eqn
or
.I neqn
the
.I tbl
command should be first, to minimize the volume
of data passed through
pipes.
Another factor is ergonomics. As the number of preprocessors expands,
the number of potential orderings of a document processing pipeline also
grows--combinatorially. Here's the chunk of the groff front-end
program that determines the ordering of the pipeline it constructs for
the user.
// grap, chem, and ideal must come before pic;
// tbl must come before eqn
const int PRECONV_INDEX = 0;
const int SOELIM_INDEX = PRECONV_INDEX + 1;
const int REFER_INDEX = SOELIM_INDEX + 1;
const int GRAP_INDEX = REFER_INDEX + 1;
const int CHEM_INDEX = GRAP_INDEX + 1;
const int IDEAL_INDEX = CHEM_INDEX + 1;
const int PIC_INDEX = IDEAL_INDEX + 1;
const int TBL_INDEX = PIC_INDEX + 1;
const int GRN_INDEX = TBL_INDEX + 1;
const int EQN_INDEX = GRN_INDEX + 1;
const int TROFF_INDEX = EQN_INDEX + 1;
const int POST_INDEX = TROFF_INDEX + 1;
const int SPOOL_INDEX = POST_INDEX + 1;
Sure, you could have a piece of paper with the above ordering taped to
the wall near your terminal, but why? Isn't it better to have a tool to
keep track of these arbitrary complexities instead?
groff, as a front-end and pipeline manager, is much smaller than the
actual formatter. According to sloccount, it's 1,195 lines to troff's
23,023 (measurements taken on groff Git HEAD, where I spend much of my
time).
If you need to alter the pipeline or truncate it, to debug an input
document or resequence the processing order, you can, and groff supplies
the -V flag to help you do so.
A traditionalist need never type the groff command if it offends one's
sensibilities--it would be a welcome change from people grousing about
copyleft. All the pieces of the pipeline are still there and can be
directly invoked.
For an alternative approach to *roff document interpretation and
rendering, albeit in a limited domain, see the mandoc project[4]. It
interprets the man(7) and mdoc(7) macro languages, a subset of *roff,
and tbl(1)'s mini-language with, as I understand it, a single parser.
Regards,
Branden
[1] Tom Duff noted this a long time ago in his paper presenting the rc
shell[2]; see §9.
[2]
https://archive.org/details/rc-shell/page/n2/mode/1up
[3]
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/tbl.1
[4]
https://mandoc.bsd.lv/