I don't know if it's an AST, but I think pandoc (
https://pandoc.org/MANUAL.html
) comes close to the practical tool. I use it to translate HTML to Markdown, which I now
prefer to OrgMode.
=*+[]* Marty McGowan +1 908 230-3739
VP of Membership, MIT Club of Princeton
<https://alumcommunity.mit.edu/topics/23427/memberships>
<https://alumcommunity.mit.edu/topics/23427/memberships>
On Sat, Feb 17, 2024, at 17:52, Douglas McIlroy wrote:
To expand on Branden's observation that
translating from one member of the roff family to another is hard, I note that the final
output usually presents a text in a shape that has been fine-tuned for appearance. In
grammatic terms it might best be presented in transformational terms a la Chomsky: a basic
text with a fairly simple grammar tweaked by pretty-printing transforms.
Translation involves parsing input into an AST according to one grammar and unparsing to
generate output according to another. Chomsky's work uses transformational grammars
primarily for generation. I'm not aware of any implementation of the inverse: parsing
according to a transformational grammar. Certainly no practical tools exist for doing so.
Unfortunately, one doesn't consciously write roff according to the model I have
outlined. This means that parsing it is more like parsing a natural language than a
strictly defined programming language. So, the absence of formal tools is exacerbated.
Roff scripts, like everyday English, are written according to an intuitive--and
occasionally ad hoc--grammar that varies both with authors and with time. And seventy
years of hard work has not yet fully automated the parsing of English.
Doug