I wrote a letter to the ANSI C (1989) committee.
Please allow malloc(0).
Please allow zero-length arrays.
I got two letters back, saying that malloc(0) is illegal because
zero-length arrays are illegal, and the other vice versa.
I fumed.
On Sun, Sep 29, 2024 at 8:08 AM Douglas McIlroy <
douglas.mcilroy(a)dartmouth.edu> wrote:
> I have to concede Branden's "gotcha". Struct copying is definitely
not
> O(1).
>
> A real-life hazard of non-O(1) operations. Vic Vyssotsky put bzero (I
> forget what Vic called it) into Fortran II. Sometime later, he found that a
> percolation simulation was running annoyingly slowly. It took some study to
> discover that the real inner loop of the program was not the percolation,
> which touched only a small fraction of a 2D field. More time went into the
> innocuous bzero initializer. The fix was to ADD code to the inner loop to
> remember what entries had been touched, and initialize only those for the
> next round of the simulation.
>
> > for a while, every instruction has [sic] an exactly predictable and
> constant cycle
> > count. ...[Then] all of a sudden you had instructions with O(n) cycle
> counts.
>
> O(n) cycle counts were nothing new. In the 1950s we had the IBM 1620
> with arbitrary-length arithmetic and the 709 with "convert" instructions
> whose cycle count went up to 256.
>
> >> spinelessly buckled to allow malloc(0) to return 0, as some
> >> implementations gratuitously did.
>
> > What was the alternative? There was no such thing as an exception, and
> > if a pointer was an int and an int was as wide as a machine address,
> > there'd be no way to indicate failure in-band, either.
>
> What makes you think allocating zero space should fail? If the size n of
> a set is determined at run time, why should one have to special-case its
> space allocation when n=0? Subsequent processing of the form for(i=0; i<n;
> i++) {...} will handle it gracefully with no special code. Malloc should do
> as it did in v7--return a non-null pointer different from any other active
> malloc pointer, as Bakul stated. If worse comes to worst[1] this can be
> done by padding up to the next feasible size. Regardless of how the pointer
> is created, any access via it would of course be out of bounds and hence
> wrong.
>
> > How does malloc(0) get this job done and what benefit does it bring?
>
> If I understand the "job" (about initializing structure members)
> correctly, malloc(0) has no bearing on it. The benefit lies elsewhere.
>
> Apropos of tail calls, Rob Pike had a nice name for an explicit tail
> call, "become". It's certainly reasonable, though, to make compilers
> recognize tail calls implicitly.
>
> [1] Worse didn't come to worst in the original malloc. It attached
> metadata to each block, so even blocks of size zero consumed some memory.
>
> Doug
>
> On Sat, Sep 28, 2024 at 1:59 PM Bakul Shah via TUHS <tuhs(a)tuhs.org>
> wrote:
>
>> Just responding to random things that I noticed:
>>
>> You don't need special syntax for tail-call. It should be done
>> transparently when a call is the last thing that gets executed. Special
>> syntax will merely allow confused people to use it in the wrong place and
>> get confused more.
>>
>> malloc(0) should return a unique ptr. So that "T* a = malloc(0); T* b =
>> malloc(0); a != (T*)0 && a != b". Without this, malloc(0) acts
differently
>> from malloc(n) for n > 0.
>>
>> Note that except for arrays, function arguments & result are copied so
>> copying a struct makes perfect sense. Passing arrays by reference may have
>> been due to residual Fortran influence! [Just guessing] Also note: that one
>> exception has been the cause of many problems.
>>
>> In any case you have not argued convincingly about why dynamic memory
>> allocation should be in the language (runtime) :-) And adding that wouldn't
>> have fixed any of the existing problems with the language.
>>
>> Bakul
>>
>> > On Sep 28, 2024, at 9:58 AM, G. Branden Robinson <
>> g.branden.robinson(a)gmail.com> wrote:
>> >
>> > At 2024-09-28T09:34:14-0400, Douglas McIlroy wrote:
>> >>> C's refusal to specify dynamic memory allocation in the
language
>> >>> runtime (as opposed to, eventually, the standard library)
>> >>
>> >> This complaint overlooks one tenet of C: every operation in what you
>> >> call "language runtime" takes O(1) time. Dynamic memory
allocation
>> >> is not such an operation.
>> >
>> > A fair point. Let me argue about it anyway. ;-)
>> >
>> > I'd make three observations. First, K&R did _not_ tout this in
their
>> > book presenting ANSI C. I went back and checked the prefaces,
>> > introduction, and the section presenting a simple malloc()/free()
>> > implementation. The tenet you claim for the language is not explicitly
>> > articulated and, if I squint really hard, I can only barely perceive
>> > (imagine?) it deeply between the lines in some of the prefatory
>> material
>> > to which K&R mostly confine their efforts to promote the language. In
>> > my view, a "tenet" is something more overt: the sort of thing
U.S.
>> > politicians try to get hung on the walls of every public school
>> > classroom, like Henry Spencer's Ten Commandments of C[1] (which itself
>> > does not mention this "core language has only O(1) features"
>> principle).
>> >
>> > Second, in reviewing K&R I was reminded that structure copying is part
>> > of the language. ("There are no operations that manipulate an entire
>> > array or string, although structures may be copied as a unit."[2])
>> > Doesn't that break the tenet right there?
>> >
>> > Third, and following on from the foregoing, your point reminded me of
>> my
>> > youth programming non-pipelined machines with no caches. You could set
>> > your watch by (just about) any instruction in the set--and often did,
>> > because we penurious microcomputer guys often lacked hardware real-time
>> > clocks, too. That is to say, for a while, every instruction has an
>> > exactly predictable and constant cycle count. (The _value_ of that
>> > constant might depend on the addressing mode, because that would have
>> > consequences on memory fetches, but the principle stood.) When the Z80
>> > extended the 8080's instruction set, they ate from Tree of Knowledge
>> > with block-copy instructions like LDIR and LDDR, and all of a sudden
>> you
>> > had instructions with O(n) cycle counts. But as a rule, programmers
>> > seemed to welcome this instead of recognizing it as knowing sin,
>> because
>> > you generally knew worst-case how many bytes you'd be copying and take
>> > that into account. (The worst worst case was a mere 64kB!)
>> >
>> > Further, Z80 home computers in low-end configurations (that is, no disk
>> > drives) often did a shocking thing: they ran with all interrupts
>> masked.
>> > All the time. The one non-maskable interrupt was RESET, after which
>> you
>> > weren't going to be resuming execution of your program anyway. Not
>> from
>> > the same instruction, at least. As I recall the TRS-80 Model I/III/4
>> > didn't even have logic on the motherboard to decode the Z80's
>> "interrupt
>> > mode 2", which was vectored, I think. Even in the
"high-end"
>> > configurations of these tiny machines, you got a whopping ONE interrupt
>> > to play with ("IM 1").
>> >
>> > Later, when the Hitachi 6309 smuggled similar block-transfer decadence
>> > into its extensions to the Motorola 6809 (to the excitement of we
>> > semi-consciously Unix-adjacent OS-9 users) they faced a starker
>> problem,
>> > because the 6809 didn't wall off interrupts in the same way the 8080
>> and
>> > Z80. They therefore presented the programmer with the novelty of the
>> > restartable instruction, and a new generation of programmers became
>> > acquainted with the hard lessons time-sharing minicomputer people were
>> > familiar with.
>> >
>> > My point in this digression is that, in my opinion, it's tough to hold
>> > fast to the O(1) tenet you claim for C's core language and to another
>> at
>> > the same time: the language's identity as a "portable assembly
>> > language". Unless every programmer has control over the compiler--and
>> > they don't--you can't predict when the compiler will emit an O(n)
block
>> > transfer instruction. You'll just have to look at the disassembly.
>> >
>> > _Maybe_ you can retain purity by...never copying structs. I don't
>> think
>> > lint or any other tool ever checked for this. Your advocacy of this
>> > tenet is the first time I've heard it presented.
>> >
>> > If you were to suggest to me that most of the time I've spent in my
>> life
>> > arguing with C advocates was with rotten exemplars of the species and
>> > therefore was time wasted, I would concede the point.
>> >
>> > There's just so danged _many_ of them...
>> >
>> >> Your hobbyhorse awakened one of mine.
>> >>
>> >> malloc was in v7, before the C standard was written. The standard
>> >> spinelessly buckled to allow malloc(0) to return 0, as some
>> >> implementations gratuitously did.
>> >
>> > What was the alternative? There was no such thing as an exception, and
>> > if a pointer was an int and an int was as wide as a machine address,
>> > there'd be no way to indicate failure in-band, either.
>> >
>> > If the choice was that or another instance of atoi()'s wincingly awful
>> > "does this 0 represent an error or successful conversion of a zero
>> > input?" land mine, ANSI might have made the right choice.
>> >
>> >> I can't imagine that any program ever actually wanted the feature.
Now
>> >> it's one more undefined behavior that lurks in thousands of
programs.
>> >
>> > Hoare admitted to only one billion-dollar mistake. No one dares count
>> > how many to write in C's ledger. This was predicted, wasn't it?
>> > Everyone loved C because it was fast: it was performant, because it
>> > never met a runtime check it didn't eschew--recall again Kernighan
>> > punking Pascal on this exact point--and it was quick for the programmer
>> > to write because it never met a _compile_-time check it didn't eschew.
>> > C was born as a language for wizards who never made mistakes.
>> >
>> > The problem is that, like James Madison's fictive government of
angels,
>> > such entities don't exist. The staff of the CSRC itself may have been
>> > overwhelmingly populated with frank, modest, and self-deprecating
>> > people--and I'll emphasize here that I'm aware of no accounts that
this
>> > is anything but true--but C unfortunately played a part in stoking a
>> > culture of pretension among software developers. "C is a language in
>> > which wizards program. I program in C. Therefore I'm a wizard."
is
>> how
>> > the syllogism (spot the fallacy) went. I don't know who does more
>> > damage--the people who believe their own BS, or the ones who know
>> > they're scamming their colleagues.
>> >
>> >> There are two arguments for malloc(0), Most importantly, it caters for
>> >> a limiting case for aggregates generated at runtime--an instance of
>> >> Kernighan's Law, "Do nothing gracefully". It also
provides a way to
>> >> create a distinctive pointer to impart some meta-information, e.g.
>> >> "TBD" or "end of subgroup", distinct from the null
pointer, which
>> >> merely denotes absence.
>> >
>> > I think I might be confused now. I've frequently seen arrays of
>> structs
>> > initialized from lists of literals ending in a series of "NULL"
>> > structure members, in code that antedates or ignores C99's wonderful
>> > feature of designated initializers for aggregate types.[3] How does
>> > malloc(0) get this job done and what benefit does it bring?
>> >
>> > Last time I posted to TUHS I mentioned a proposal for explicit
>> tail-call
>> > elimination in C. I got the syntax wrong. The proposal was "return
>> > goto;". The WG14 document number is N2920 and it's by Alex
Gilding.
>> > Good stuff.[4] I hope we see it in C2y.
>> >
>> > Predictably, I must confess that I didn't make much headway on
>> > Schiller's 1975 "secure kernel" paper. Maybe next time.
>> >
>> > Regards,
>> > Branden
>> >
>> > [1]
https://web.cs.dal.ca/~jamie/UWO/C/the10fromHenryS.html
>> >
>> > I can easily imagine that the tenet held at _some_ point in the
>> > C's history. It's _got_ to be the reason that the language
>> > relegates memset() and memcpy() to the standard library (or to the
>> > programmer's own devise)! :-O
>> >
>> > [2] Kernighan & Ritchie, _The C Programming Language_, 2nd edition, p.
>> 2
>> >
>> > Having thus admitted the camel's nose to the tent, K&R would
have
>> > done the world a major service by making memset(), or at least
>> > bzero(), a language feature, the latter perhaps by having "=
0"
>> > validly apply to an lvalue of non-primitive type. Okay,
>> > _potentially_ a major service. You'd still need the self-regarding
>> > wizard programmers to bother coding it, which they wouldn't in many
>> > cases "because speed". Move fast, break stuff.
>> >
>> > C++ screwed this up too, and stubbornly stuck by it for a long time.
>> >
>> >
https://cplusplus.github.io/CWG/issues/178.html
>> >
>> > [3]
https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
>> > [4]
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf
>>
>>