Just responding to random things that I noticed:
You don't need special syntax for tail-call. It should be done
transparently when a call is the last thing that gets executed. Special
syntax will merely allow confused people to use it in the wrong place and
get confused more.
malloc(0) should return a unique ptr. So that "T* a = malloc(0); T* b =
malloc(0); a != (T*)0 && a != b". Without this, malloc(0) acts differently
from malloc(n) for n > 0.
Note that except for arrays, function arguments & result are copied so
copying a struct makes perfect sense. Passing arrays by reference may have
been due to residual Fortran influence! [Just guessing] Also note: that one
exception has been the cause of many problems.
In any case you have not argued convincingly about why dynamic memory
allocation should be in the language (runtime) :-) And adding that wouldn't
have fixed any of the existing problems with the language.
Bakul
On Sep 28, 2024, at 9:58 AM, G. Branden Robinson
<
At 2024-09-28T09:34:14-0400, Douglas McIlroy wrote:
C's
refusal to specify dynamic memory allocation in the language
runtime (as opposed to, eventually, the standard library)
This complaint overlooks one tenet of C: every operation in what you
call "language runtime" takes O(1) time. Dynamic memory allocation
is not such an operation.
A fair point. Let me argue about it anyway. ;-)
I'd make three observations. First, K&R did _not_ tout this in their
book presenting ANSI C. I went back and checked the prefaces,
introduction, and the section presenting a simple malloc()/free()
implementation. The tenet you claim for the language is not explicitly
articulated and, if I squint really hard, I can only barely perceive
(imagine?) it deeply between the lines in some of the prefatory material
to which K&R mostly confine their efforts to promote the language. In
my view, a "tenet" is something more overt: the sort of thing U.S.
politicians try to get hung on the walls of every public school
classroom, like Henry Spencer's Ten Commandments of C[1] (which itself
does not mention this "core language has only O(1) features" principle).
Second, in reviewing K&R I was reminded that structure copying is part
of the language. ("There are no operations that manipulate an entire
array or string, although structures may be copied as a unit."[2])
Doesn't that break the tenet right there?
Third, and following on from the foregoing, your point reminded me of my
youth programming non-pipelined machines with no caches. You could set
your watch by (just about) any instruction in the set--and often did,
because we penurious microcomputer guys often lacked hardware real-time
clocks, too. That is to say, for a while, every instruction has an
exactly predictable and constant cycle count. (The _value_ of that
constant might depend on the addressing mode, because that would have
consequences on memory fetches, but the principle stood.) When the Z80
extended the 8080's instruction set, they ate from Tree of Knowledge
with block-copy instructions like LDIR and LDDR, and all of a sudden you
had instructions with O(n) cycle counts. But as a rule, programmers
seemed to welcome this instead of recognizing it as knowing sin, because
you generally knew worst-case how many bytes you'd be copying and take
that into account. (The worst worst case was a mere 64kB!)
Further, Z80 home computers in low-end configurations (that is, no disk
drives) often did a shocking thing: they ran with all interrupts masked.
All the time. The one non-maskable interrupt was RESET, after which you
weren't going to be resuming execution of your program anyway. Not from
the same instruction, at least. As I recall the TRS-80 Model I/III/4
didn't even have logic on the motherboard to decode the Z80's "interrupt
mode 2", which was vectored, I think. Even in the "high-end"
configurations of these tiny machines, you got a whopping ONE interrupt
to play with ("IM 1").
Later, when the Hitachi 6309 smuggled similar block-transfer decadence
into its extensions to the Motorola 6809 (to the excitement of we
semi-consciously Unix-adjacent OS-9 users) they faced a starker problem,
because the 6809 didn't wall off interrupts in the same way the 8080 and
Z80. They therefore presented the programmer with the novelty of the
restartable instruction, and a new generation of programmers became
acquainted with the hard lessons time-sharing minicomputer people were
familiar with.
My point in this digression is that, in my opinion, it's tough to hold
fast to the O(1) tenet you claim for C's core language and to another at
the same time: the language's identity as a "portable assembly
language". Unless every programmer has control over the compiler--and
they don't--you can't predict when the compiler will emit an O(n) block
transfer instruction. You'll just have to look at the disassembly.
_Maybe_ you can retain purity by...never copying structs. I don't think
lint or any other tool ever checked for this. Your advocacy of this
tenet is the first time I've heard it presented.
If you were to suggest to me that most of the time I've spent in my life
arguing with C advocates was with rotten exemplars of the species and
therefore was time wasted, I would concede the point.
There's just so danged _many_ of them...
Your hobbyhorse awakened one of mine.
malloc was in v7, before the C standard was written. The standard
spinelessly buckled to allow malloc(0) to return 0, as some
implementations gratuitously did.
What was the alternative? There was no such thing as an exception, and
if a pointer was an int and an int was as wide as a machine address,
there'd be no way to indicate failure in-band, either.
If the choice was that or another instance of atoi()'s wincingly awful
"does this 0 represent an error or successful conversion of a zero
input?" land mine, ANSI might have made the right choice.
I can't imagine that any program ever
actually wanted the feature. Now
it's one more undefined behavior that lurks in thousands of programs.
Hoare admitted to only one billion-dollar mistake. No one dares count
how many to write in C's ledger. This was predicted, wasn't it?
Everyone loved C because it was fast: it was performant, because it
never met a runtime check it didn't eschew--recall again Kernighan
punking Pascal on this exact point--and it was quick for the programmer
to write because it never met a _compile_-time check it didn't eschew.
C was born as a language for wizards who never made mistakes.
The problem is that, like James Madison's fictive government of angels,
such entities don't exist. The staff of the CSRC itself may have been
overwhelmingly populated with frank, modest, and self-deprecating
people--and I'll emphasize here that I'm aware of no accounts that this
is anything but true--but C unfortunately played a part in stoking a
culture of pretension among software developers. "C is a language in
which wizards program. I program in C. Therefore I'm a wizard." is how
the syllogism (spot the fallacy) went. I don't know who does more
damage--the people who believe their own BS, or the ones who know
they're scamming their colleagues.
There are two arguments for malloc(0), Most
importantly, it caters for
a limiting case for aggregates generated at runtime--an instance of
Kernighan's Law, "Do nothing gracefully". It also provides a way to
create a distinctive pointer to impart some meta-information, e.g.
"TBD" or "end of subgroup", distinct from the null pointer, which
merely denotes absence.
I think I might be confused now. I've frequently seen arrays of structs
initialized from lists of literals ending in a series of "NULL"
structure members, in code that antedates or ignores C99's wonderful
feature of designated initializers for aggregate types.[3] How does
malloc(0) get this job done and what benefit does it bring?
Last time I posted to TUHS I mentioned a proposal for explicit tail-call
elimination in C. I got the syntax wrong. The proposal was "return
goto;". The WG14 document number is N2920 and it's by Alex Gilding.
Good stuff.[4] I hope we see it in C2y.
Predictably, I must confess that I didn't make much headway on
Schiller's 1975 "secure kernel" paper. Maybe next time.
Regards,
Branden
[1]
https://web.cs.dal.ca/~jamie/UWO/C/the10fromHenryS.html
I can easily imagine that the tenet held at _some_ point in the
C's history. It's _got_ to be the reason that the language
relegates memset() and memcpy() to the standard library (or to the
programmer's own devise)! :-O
[2] Kernighan & Ritchie, _The C Programming Language_, 2nd edition, p. 2
Having thus admitted the camel's nose to the tent, K&R would have
done the world a major service by making memset(), or at least
bzero(), a language feature, the latter perhaps by having "= 0"
validly apply to an lvalue of non-primitive type. Okay,
_potentially_ a major service. You'd still need the self-regarding
wizard programmers to bother coding it, which they wouldn't in many
cases "because speed". Move fast, break stuff.
C++ screwed this up too, and stubbornly stuck by it for a long time.
https://cplusplus.github.io/CWG/issues/178.html
[3]
https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
[4]
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf