At Wed, 27 May 2020 16:00:57 -0500, Nevin Liber <nliber(a)gmail.com> wrote:
Subject: Re: [TUHS] History of popularity of C
On Wed, May 27, 2020 at 2:50 PM Greg A. Woods <woods(a)robohack.ca> wrote:
A big part of the problem is that the C Standard mandates compilation
will and must succeed (and allows this success to be totally silent too)
even if the code contains instances of undefined behaviour.
No it does not.
To quote C11:
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of
erroneous data, for which this International Standard imposes no
requirements
Sorry, I concede. Yes, "no requirements". In C99 at least.
Sadly most compilers, including GCC and Clang/LLVM will, at best, warn
(and warnings are only treated as errors by the most macho|wise); and
compilers only do that now because they've been getting flack from
developers whenever the optimizer does something unexpected.
Much UB cannot be detected at compile time. Much UB
is too expensive to
detect at run time.
Indeed. At best you can get a warning, or optional runtime code to
abort the program.
Now this isn't a problem when "undefined behaviour" becomes
"implementation defined behaviour" for a given implementation.
However that's not portable obviously, except for the trivial cases
where the common compilers for a given type of platform all do the same
things.
The real problems though arise when the optimizer takes advantage of
these rules regardless of what the un-optimized code will do on any
given platform and architecture.
The Linux kernel example I've referred to involved dereferencing a
pointer to do an assignment in a local variable definition, then a few
lines later testing if the pointer was NULL before using the local
variable. Unoptimised the code will dereference a NULL pointer and load
junk from location zero into the variable (because it's kernel code),
then the NULL test will trigger and all will be good. The optimizer
rips out the NULL check because "obviously" the programmer has assumed
the pointer is always a valid non-NULL pointer since they've explicitly
dereferenced it before checking it and they wouldn't want to waste even
a single jump-on-zero instruction checking it again. (It's also quite
possible the code was written "correctly" at first, then someone mushed
all the variable initialisations up onto their definitions.)
In any case there's now a GCC option: -fno-delete-null-pointer-checks
(to go along with -fno-strict-aliasing and -fno-strict-overflow, and
-fno-strict-enums, all of which MUST be used, and sometimes
-fno-strict-volatile-bitfields too, on all legacy code that you don't
want to break)
It's even worse when you have to write bare-metal code that must
explictly dereference a NULL pointer (a not-so-real example: you want
to use location zero in the CPU zero-page (e.g. on a 6502 or 6800, or
PDP-8, etc.) as a pointer) -- it is now impossible to do that in strict
Standard C even though trivially it "should just work" despite the silly
rules. As far as I can tell it always did just work in "plain old" C.
The crazy thing about modern optimizers is that they're way more
persistent and often somewhat more clever than your average programmer.
They follow all the paths. They apply all the rules at every turn.
Take strlen(const char* s) for example. s must be a
valid pointer that
points to a '\0'-terminated string. How would you detect that at compile
time? How would you set up your run time to detect that and error out?
My premise is that you shouldn't try to detect this problem, AND in any
case where the optimizer might be able to prove the pointed at object
isn't a valid string it should not, and must not, abuse that knowledge
to rip out code or cause other even worse mis-behaviour.
I.e. this should not be "undefined", but rather "implementation defined
and without any recourse to allowing optimizer abuses".
--
Greg A. Woods <gwoods(a)acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods(a)robohack.ca>
Planix, Inc. <woods(a)planix.com> Avoncote Farms <woods(a)avoncote.ca>