Hi folks,
At 2024-05-20T07:14:07-0600, arnold(a)skeeve.com wrote:
Douglas McIlroy <douglas.mcilroy(a)dartmouth.edu>
wrote:
I'm surprised by nonchalance about bad
inputs evoking bad program
behavior. That attitude may have been excusable 50 years ago. By
now, though, we have seen so much malicious exploitation of open
avenues of "undefined behavior" that we can no longer ignore bugs
that "can't happen when using the tool correctly". Mature software
should not brook incorrect usage.
It's not nonchalance, not at all!
The current behavior is to die on the first syntax error, instead of
trying to be "helpful" by continuing to try to parse the program in
the hope of reporting other errors.
[...]
The crashes came because errors cascaded. I
don't see a reason to
spend valuable, *personal* time on adding defenses *where they aren't
needed*.
A steel door on your bedroom closet does no good if your front door is
made of balsa wood. My change was to stop the badness at the front
door.
I commend attention to the LangSec movement,
which advocates for
rigorously enforced separation between legal and illegal inputs.
Illegal input, in gawk, as far as I know, should always cause a syntax
error report and an immediate exit.
If it doesn't, that is a bug, and I'll be happy to try to fix it.
I hope that clarifies things.
For grins, and for a data point from elsewhere in GNU-land, GNU troff is
pretty robust to this sort of thing. Much as I might like to boast of
having improved it in this area, it appears to have already come with
iron long johns courtesy of James Clark and/or Werner Lemberg. I threw
troff its own ELF executable as a crude fuzz test some years ago, and I
don't recall needing to fix anything except unhelpfully vague diagnostic
messages (a phenomenon I am predisposed to observe anyway).
I did notice today that in one case we were spewing back out unprintable
characters (newlines, character codes > 127) _in_ one (but only one) of
the diagnostic messages, and while that's ugly, it's not an obvious
exploitation vector to me.
Nevertheless I decided to fix it and it will be in my next push.
So here's the mess you get when feeding GNU troff to itself. No GNU
troff since before 1.22.3 core dumps on this sort of unprepossessing
input.
$ ./build/test-groff -Ww -z /usr/bin/troff 2>&1 | sed 's/:[0-9]\+:/:/' |
sort | uniq -c
17 troff:/usr/bin/troff: error: a backspace character is not allowed in an escape
sequence parameter
10 troff:/usr/bin/troff: error: a space character is not allowed in an escape
sequence parameter
1 troff:/usr/bin/troff: error: a space is not allowed as a starting delimiter
1 troff:/usr/bin/troff: error: a special character is not allowed in an identifier
1 troff:/usr/bin/troff: error: character '-' is not allowed as a starting
delimiter
1 troff:/usr/bin/troff: error: invalid argument ')' to output suppression
escape sequence
1 troff:/usr/bin/troff: error: invalid argument 'c' to output suppression
escape sequence
1 troff:/usr/bin/troff: error: invalid argument 'l' to output suppression
escape sequence
1 troff:/usr/bin/troff: error: invalid argument 'm' to output suppression
escape sequence
1 troff:/usr/bin/troff: error: invalid positional argument number ','
3 troff:/usr/bin/troff: error: invalid positional argument number '<'
3 troff:/usr/bin/troff: error: invalid positional argument number 'D'
1 troff:/usr/bin/troff: error: invalid positional argument number 'E'
10 troff:/usr/bin/troff: error: invalid positional argument number 'H'
1 troff:/usr/bin/troff: error: invalid positional argument number 'Hi'
1 troff:/usr/bin/troff: error: invalid positional argument number 'I'
1 troff:/usr/bin/troff: error: invalid positional argument number 'I9'
1 troff:/usr/bin/troff: error: invalid positional argument number 'L'
1 troff:/usr/bin/troff: error: invalid positional argument number 'LD'
2 troff:/usr/bin/troff: error: invalid positional argument number 'LL'
5 troff:/usr/bin/troff: error: invalid positional argument number 'LT'
1 troff:/usr/bin/troff: error: invalid positional argument number 'M'
4 troff:/usr/bin/troff: error: invalid positional argument number 'P'
5 troff:/usr/bin/troff: error: invalid positional argument number 'X'
1 troff:/usr/bin/troff: error: invalid positional argument number 'dH'
1 troff:/usr/bin/troff: error: invalid positional argument number 'h'
1 troff:/usr/bin/troff: error: invalid positional argument number 'l'
1 troff:/usr/bin/troff: error: invalid positional argument number 'p'
1 troff:/usr/bin/troff: error: invalid positional argument number 'x'
3 troff:/usr/bin/troff: error: invalid positional argument number '|'
35 troff:/usr/bin/troff: error: invalid positional argument number (unprintable)
3 troff:/usr/bin/troff: error: unterminated transparent embedding escape sequence
The second to last (and most frequent) message in the list above is the
"new" one. Here's the diff.
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 8d828a01e..596ecf6f9 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -4556,10 +4556,21 @@ static void interpolate_arg(symbol nm)
}
else {
const char *p;
- for (p = s; *p && csdigit(*p); p++)
- ;
- if (*p)
- copy_mode_error("invalid positional argument number '%1'", s);
+ bool is_valid = true;
+ bool is_printable = true;
+ for (p = s; *p != 0 /* nullptr */; p++) {
+ if (!csdigit(*p))
+ is_valid = false;
+ if (!csprint(*p))
+ is_printable = false;
+ }
+ if (!is_valid) {
+ const char msg[] = "invalid positional argument number";
+ if (is_printable)
+ copy_mode_error("%1 '%2'", msg, s);
+ else
+ copy_mode_error("%1 (unprintable)", msg);
+ }
else
input_stack::push(input_stack::get_arg(atoi(s)));
}
GNU troff may have started out with an easier task in this area than an
AWK or a shell had; its syntax is not block-structured in the same way,
so parser state recovery is easier, and it's _inherently_ a filter.
The only fruitful fuzz attack on groff I can recall was upon indexed
bibliographic database files, which are a binary format. This went
unresolved for several years[1] but I fixed it for groff 1.23.0.
https://bugs.debian.org/716109
Regards,
Branden
[1] I think I understand the low triage priority. Few groff users use
the refer(1) preprocessor, and of those who do, even fewer find
modern systems so poorly performant at text scanning that they
desire the services of indxbib(1) to speed lookup of bibliographic
entries.