At 2024-01-07T21:10:38-0800, Mychaela Falconia wrote:
G. Branden Robinson
<g.branden.robinson(a)gmail.com> wrote:
This sort of broad, nonspecific, reflexive
derogation of groff (or
GNU generally) is unproductive and frequently indicative of
ignorance.
I don't have enough spoons to engage in political fights any more, so
I'll just focus on technical aspects.
That may be a wise choice. A good supplement would be, when expressing
a negative opinion of GNU or any software project to which people
contribute their volunteer labor, to briefly state your grounds for not
using it. "I just can't go along with the copyleft thing" or "I
refuse
to use anything written in C++" might or might not strike people as
rational, but such frankness places the responsibility for starting an
argument squarely on _their_ shoulders.
Any issues people have with groff's implementation quality should be
submitted to its bug tracker. (One can do so anonymously, or create an
account to be emailed when subsequent activity happens.) There are
plenty of defects demanding repair and features needing implementation.
I wish there were fewer. I do what I can.
https://savannah.gnu.org/bugs/?group=groff
But if you are
going for pixel-perfect reproduction of documents
that used fonts you don't have, you're going to need to recreate the
fonts somehow--perfectly (at least for the glyphs that a given
document uses).
The problem you are describing is one which I am *not* actively
working on presently. I am _contemplating_ this problem, but not
actively working on it. In my current stage of 4.3BSD document set
reprinting, I am willing to accept that hyphenations, line breaks and
page breaks will be different from the original because of slightly
different font metrics, and accept the use of only fi and fl ligatures
(in running text, outside of explicit demonstrations) because Adobe's
version dropped ff, ffi and ffl. (In places where original troff docs
explicitly demonstrate the use of all 5 ligatures, I have a hack that
pulls the missing ligs from a different, not-really-matching font.)
I am willing to accept this imperfection because it is fundamentally
no different from what UCB/Usenix themselves did in 1986: they took
Bell Labs docs that were originally written for CAT and troffed them
on their APS-5 ditroff setup - but those two typesetters also had
slight diffs in their font metrics, causing line and page breaks to
move around!
Right. I think this is a reasonable place to erect a threshold of
"fidelity" in document rendering, for two reasons: (1) when you don't
have control over the fonts in use, it's likely the best you can do
anyway, and (2) as a document author you might want to leave yourself
room to change your mind about the typeface you use, particularly for
running text (which will have the greatest impact on the locations of
line and page breaks for most documents).
That I was able to get the breaks in "Typesetting Mathematics" almost
all the same as the published version even though the Times I used was
certainly not the C/A/T's was a due to a combination of (a) good fortune
and (b) the power of binary search when selecting values for the LL and
PO registers.
OTOH I am very willing to entertain, as an
intellectual exercise, what
would it take to produce a new font set that would *truly* replicate
the CAT font set at Bell Labs. The spacing widths of the original
fonts (the key determinant of where breaks will land) are known, right
here:
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/troff/tab3.c
Right. Nowadays we call these (and other measurements besides width)
the "font metrics".
Back in 2004 in one afternoon I threw together a
quick-hack program
that takes the output of original troff (CAT binary codes) and prints
it in PostScript, using standard Adobe fonts. The character
positioning is that of original troff, but because the actual font
characters don't perfectly match these metrics, the result is not
pretty - but the non-pretty result does show *exactly* where every
line and page break lands per original intent!
Nice! A tool I'd like to get added to groff someday is a modern
"cat2dit". It's come up on these mailing lists before; apparently Adobe
had a proprietary one back in the 1980s, and, as I recall, polymath
wizard Henry Spencer wrote one but it's long since become a relic. John
Gardner wrote yet another but it's in JavaScript so not maximally
convenient for a Unix command line grognard.
But best of all would be a "cat2dit" in Seventh Edition Unix-compatible
C, because that would be super convenient for running on a PDP-11 under
SIMH using Ossanna troff. The output would be easy to export because
the device-independent troff output format is plain text (and not too
strict about whitespace), and SIMH of course runs in a terminal window
so it's easy to copy and paste. This would make it much easier to use
Ossanna troff as a regression test bed for groff (or other modern
formatters).
So what would it take to do such a re-creation
properly? My feeling
is that the task would require hiring a professional typeface designer
to produce a modified version of Times font family: modify the fonts
to produce good visual results (change actual characters as needed) to
fit the prescribed, unchangeable metrics as in spacing widths. And
design all 5 f-ligatures while at it.
Another approach would be to obtain the C/A/T font plates and describe
them numerically. Since the only means of scaling was via an optical
lens (from 6 to 36 points), we can conclude that they weren't "hinted"
as digital fonts often are. Since those plates are presumably nearly
all in landfills these days I suppose the same could be accomplished
with sufficiently high-resolution scans of the copy of CSTR #54 in the
Seventh Edition Unix manual (because it depicts all possibly glyphs).
And of course if a person wants a gratuitous thing to put on their
résumé/CV, you could obtain a large number of Times roman faces from a
variety of foundries, render a huge volume of text using them in every
possible combination and at a large number of sizes, and then use those
renderings to train an LLM to generate an "archetypal" Times face for
rendering C/A/T-produced documents. You then unleash it on the world
and wait for the lawsuits to roll in, which should get a person enough
notoriety to land a day job at someplace where the buzzword "AI" excites
hard-charging middle managers.
I have no slightest idea how much it would cost to
hire a professional
typeface designer to do what I just described, hence I have no idea
whether or not it is something that the hobbyist community could
potentially afford, even collectively. But it is an interesting idea
to ponder nonetheless - which is where I leave it for now.
Hobbyist font designers do exist. Some may lurk on one or both of these
lists. I would ask them if it's more or less a solved problem already.
There is a
third problem, whose resolution is in progress, when
producing PDF output from this document; slanted Greek symbols are
present but "not quite right". This is because unlike PostScript,
PDF font repertoires generally don't provide a "slanted symbol"
face.
Can you please elaborate? I personally hate PDF with a passion, but I
concede that in order to make my documents readable by people other
than me, I have to rcp my .ps file from the 4.3BSD machine to a
semi-modern-ish (Slackware) Linux box and run ps2pdf on the file.
Doug McIlroy still does this.[1]
But what "slanted symbol" font are you
talking about that exists in
PostScript but not in PDF? The only PostScript fonts whose existence
I take as a given (as opposed to downloading the font explicitly) are
the standard 14: 4 Times family fonts, 4 Helvetica family fonts, 4
Courier family fonts, Symbol and ZapfDingbats. Which of these 14 is
missing in PDF, and how does "standard" ps2pdf (Ghostscript) handle
it?
Sorry, I elided too much from my response on this point.
I should not have implied that "slanted symbol" is a standard PostScript
font; it is not, per my copy of the _PostScript Language Reference
Manual_ (3e) [see Appendix E].
"Slanted symbol", a.k.a. "SS", is a supplemental face in groff...of
old
provenance--it goes back to groff 1.06 (September 1992) at least. It
exists to solve a problem that can be observed when you compare two
documents already referenced above.
1. Adobe's _PostScript Language Reference Manual_, p. 794. Table E.13,
"Symbol Encoding Vector"
2. CSTR #54 "Nroff/Troff User's Manual" (1976), p. 226*. Table I,
"Font Style Examples"
* using the page numbering in the HRW reprint of Volume 2 recently
discussed on TUHS
You will quickly observe that the C/A/T's "Special Mathematical Font",
bearing the pellucid name "S" in the Ossanna/Thompson naming convention
popular at Bell Labs, renders all its lowercase Greek letters in italic
form. PostScript's Symbol font does not.
A problem for any post-C/A/T typesetting is how to get upright versions
of lowercase Greek letters. AT&T troff was engineered around the
assumption that the lowercase Greek letters typically used for
mathematical and scientific typesetting are slanted/italic rather than
upright. This assumption is baked into the semantics of special
character names *a, *b, *g, and so forth. (Except when using nroff, of
course, where one "naturally" expects upright glyphs instead, just like
the good old Greek box on the Teletype Model 37.) The eqn preprocessor
furthermore--and consequently--assumes it doesn't need to do anything
special for these special characters to show up in italics (making its
rendering to terminals inconsistent with troff output).
If you couldn't guess, I plan to change this in groff. It won't break
eqn documents because what I "take away" in the semantics of the special
characters (an implied font style, which doesn't belong there), I will
"put back" via updated eqn character definitions, so people who say
sin ( 2 theta ) ~ = ~ 2 ~ sin theta cos theta
will continue to get what they expect. eqn users who bust down to *roff
special characters to get Greek will, unfortunately, need to adapt. But
GNU eqn has features to support doing so with minimal pain.[2]
I have read that modern standards of mathematical typography mandate
that constants, like every non-mathematician's favorite, π, should be
set upright, not italicized as people of my generation (and I guess
older ones) are accustomed to seeing it. The idea is that only
_variables_ get italics. But I cannot speak further to this point, as
it's well out of my wheelhouse. If it's true, I hope the increased
flexibility I plan for groff and its eqn will make life easier for those
who typeset math.
https://savannah.gnu.org/bugs/index.php?64231
https://savannah.gnu.org/bugs/index.php?64232
gropdf(1) has not to date supported a slanted symbol font. But it needs
to for the reasons explored on the groff list last June in a lengthy
thread, the relevant portion of which starts here.
https://lists.gnu.org/archive/html/groff/2023-06/msg00088.html
I also wanted my troff to run under 4.3BSD, using only
K&R C, which I
reason would probably be impossible with groff. (I recall reading
somewhere that groff is written in C++ - so it is completely out of
consideration for something that needs to run under 4.3BSD.)
Probably, unless someone wants to resurrect cfront...
C is not my favorite programming language, and C++ even less so. In a
better universe, by my lights, James Clark would have written groff in
Ada. I acknowledge that a lot of people would characterize such a
universe as a variety of Hell.
My software is written BY a pirate (me) FOR other
pirates. If you are
not a pirate, my sw is not for you.
Arrrrrr. I believe I take your meaning. Piracy is an occupational
hazard of rentierism.
Regards,
Branden
[1]
https://lists.gnu.org/archive/html/groff/2023-08/msg00028.html
[2] See eqn(1), subsection "Spacing and typeface".