[TUHS] Re: Minimum Array Sizes in 16 bit C (was Maximum)

28 Sep 2024

On Sat, Sep 28, 2024, 5:05 PM Rob Pike &lt;robpike(a)gmail.com&gt; wrote:
...
  I wrote a letter to the ANSI C (1989) committee.
 Please allow malloc(0).
 Please allow zero-length arrays.
 I got two letters back, saying that malloc(0) is illegal because
 zero-length arrays are illegal, and the other vice versa.
 I fumed.

And now we have zero length arrays an UB malloc(0).
Warner
-rob
...

 On Sun, Sep 29, 2024 at 8:08 AM Douglas McIlroy <
 douglas.mcilroy(a)dartmouth.edu&gt; wrote:
> I have to concede Branden's "gotcha". Struct copying is definitely not
> O(1).
>
> A real-life hazard of non-O(1) operations. Vic Vyssotsky put bzero (I
> forget what Vic called it) into Fortran II. Sometime later, he found that a
> percolation simulation was running annoyingly slowly. It took some study to
> discover that the real inner loop of the program was not the percolation,
> which touched only a small fraction of a 2D field. More time went into the
> innocuous bzero initializer. The fix was to ADD code to the inner loop to
> remember what entries had been touched, and initialize only those for the
> next round of the simulation.
>
> > for a while, every instruction has [sic] an exactly predictable and
> constant cycle
> >  count. ...[Then] all of a sudden you had instructions with O(n) cycle
> counts.
>
> O(n) cycle counts were nothing  new. In the 1950s we had the IBM 1620
> with arbitrary-length arithmetic and the 709 with "convert" instructions
> whose cycle count went up to 256.
>
> >> spinelessly buckled to allow malloc(0) to return 0, as some
> >> implementations gratuitously did.
>
> > What was the alternative?  There was no such thing as an exception, and
> > if a pointer was an int and an int was as wide as a machine address,
> > there'd be no way to indicate failure in-band, either.
>
> What makes you think allocating zero space should fail? If the size n of
> a set is determined at run time, why should one have to special-case its
> space allocation when n=0?  Subsequent processing of the form for(i=0; i<n;
> i++) {...} will handle it gracefully with no special code. Malloc should do
> as it did in v7--return a non-null pointer different from any other active
> malloc pointer, as Bakul stated. If worse comes to worst[1] this can be
> done by padding up to the next feasible size. Regardless of how the pointer
> is created, any access via it would of course be out of bounds and hence
> wrong.
>
> > How does malloc(0) get this job done and what benefit does it bring?
>
> If I understand the "job" (about initializing structure members)
> correctly, malloc(0) has no bearing on it. The benefit lies elsewhere.
>
> Apropos of tail calls, Rob Pike had a nice name for an explicit tail
> call, "become". It's certainly reasonable, though, to make compilers
> recognize tail calls implicitly.
>
> [1] Worse didn't come to worst in the original malloc. It attached
> metadata to each block, so even blocks of size zero consumed some memory.
>
> Doug
>
> On Sat, Sep 28, 2024 at 1:59 PM Bakul Shah via TUHS &lt;tuhs(a)tuhs.org&gt;
> wrote:
>
>> Just responding to random things that I noticed:
>>
>> You don't need special syntax for tail-call. It should be done
>> transparently when a call is the last thing that gets executed. Special
>> syntax will merely allow confused people to use it in the wrong place and
>> get confused more.
>>
>> malloc(0) should return a unique ptr. So that "T* a = malloc(0); T* b =
>> malloc(0); a != (T*)0 && a != b". Without this, malloc(0) acts
differently
>> from malloc(n) for n > 0.
>>
>> Note that except for arrays, function arguments & result are copied so
>> copying a struct makes perfect sense. Passing arrays by reference may have
>> been due to residual Fortran influence! [Just guessing] Also note: that one
>> exception has been the cause of many problems.
>>
>> In any case you have not argued convincingly about why dynamic memory
>> allocation should be in the language (runtime) :-) And adding that wouldn't
>> have fixed any of the existing problems with the language.
>>
>> Bakul
>>
>> > On Sep 28, 2024, at 9:58 AM, G. Branden Robinson <
>> g.branden.robinson(a)gmail.com&gt; wrote:
>> >
>> > At 2024-09-28T09:34:14-0400, Douglas McIlroy wrote:
>> >>> C's refusal to specify dynamic memory allocation in the
language
>> >>> runtime (as opposed to, eventually, the standard library)
>> >>
>> >> This complaint overlooks one tenet of C: every operation in what you
>> >> call "language runtime" takes O(1) time. Dynamic memory
allocation
>> >> is not such an operation.
>> >
>> > A fair point.  Let me argue about it anyway.  ;-)
>> >
>> > I'd make three observations.  First, K&R did _not_ tout this in
their
>> > book presenting ANSI C.  I went back and checked the prefaces,
>> > introduction, and the section presenting a simple malloc()/free()
>> > implementation.  The tenet you claim for the language is not explicitly
>> > articulated and, if I squint really hard, I can only barely perceive
>> > (imagine?) it deeply between the lines in some of the prefatory
>> material
>> > to which K&R mostly confine their efforts to promote the language.  In
>> > my view, a "tenet" is something more overt: the sort of thing U.S.
>> > politicians try to get hung on the walls of every public school
>> > classroom, like Henry Spencer's Ten Commandments of C[1] (which itself
>> > does not mention this "core language has only O(1) features"
>> principle).
>> >
>> > Second, in reviewing K&R I was reminded that structure copying is part
>> > of the language.  ("There are no operations that manipulate an entire
>> > array or string, although structures may be copied as a unit."[2])
>> > Doesn't that break the tenet right there?
>> >
>> > Third, and following on from the foregoing, your point reminded me of
>> my
>> > youth programming non-pipelined machines with no caches.  You could set
>> > your watch by (just about) any instruction in the set--and often did,
>> > because we penurious microcomputer guys often lacked hardware real-time
>> > clocks, too.  That is to say, for a while, every instruction has an
>> > exactly predictable and constant cycle count.  (The _value_ of that
>> > constant might depend on the addressing mode, because that would have
>> > consequences on memory fetches, but the principle stood.)  When the Z80
>> > extended the 8080's instruction set, they ate from Tree of Knowledge
>> > with block-copy instructions like LDIR and LDDR, and all of a sudden
>> you
>> > had instructions with O(n) cycle counts.  But as a rule, programmers
>> > seemed to welcome this instead of recognizing it as knowing sin,
>> because
>> > you generally knew worst-case how many bytes you'd be copying and take
>> > that into account.  (The worst worst case was a mere 64kB!)
>> >
>> > Further, Z80 home computers in low-end configurations (that is, no disk
>> > drives) often did a shocking thing: they ran with all interrupts
>> masked.
>> > All the time.  The one non-maskable interrupt was RESET, after which
>> you
>> > weren't going to be resuming execution of your program anyway.  Not
>> from
>> > the same instruction, at least.  As I recall the TRS-80 Model I/III/4
>> > didn't even have logic on the motherboard to decode the Z80's
>> "interrupt
>> > mode 2", which was vectored, I think.  Even in the "high-end"
>> > configurations of these tiny machines, you got a whopping ONE interrupt
>> > to play with ("IM 1").
>> >
>> > Later, when the Hitachi 6309 smuggled similar block-transfer decadence
>> > into its extensions to the Motorola 6809 (to the excitement of we
>> > semi-consciously Unix-adjacent OS-9 users) they faced a starker
>> problem,
>> > because the 6809 didn't wall off interrupts in the same way the 8080
>> and
>> > Z80.  They therefore presented the programmer with the novelty of the
>> > restartable instruction, and a new generation of programmers became
>> > acquainted with the hard lessons time-sharing minicomputer people were
>> > familiar with.
>> >
>> > My point in this digression is that, in my opinion, it's tough to hold
>> > fast to the O(1) tenet you claim for C's core language and to another
>> at
>> > the same time: the language's identity as a "portable assembly
>> > language".  Unless every programmer has control over the compiler--and
>> > they don't--you can't predict when the compiler will emit an O(n)
block
>> > transfer instruction.  You'll just have to look at the disassembly.
>> >
>> > _Maybe_ you can retain purity by...never copying structs.  I don't
>> think
>> > lint or any other tool ever checked for this.  Your advocacy of this
>> > tenet is the first time I've heard it presented.
>> >
>> > If you were to suggest to me that most of the time I've spent in my
>> life
>> > arguing with C advocates was with rotten exemplars of the species and
>> > therefore was time wasted, I would concede the point.
>> >
>> > There's just so danged _many_ of them...
>> >
>> >> Your hobbyhorse awakened one of mine.
>> >>
>> >> malloc was in v7, before the C standard was written. The standard
>> >> spinelessly buckled to allow malloc(0) to return 0, as some
>> >> implementations gratuitously did.
>> >
>> > What was the alternative?  There was no such thing as an exception, and
>> > if a pointer was an int and an int was as wide as a machine address,
>> > there'd be no way to indicate failure in-band, either.
>> >
>> > If the choice was that or another instance of atoi()'s wincingly awful
>> > "does this 0 represent an error or successful conversion of a zero
>> > input?" land mine, ANSI might have made the right choice.
>> >
>> >> I can't imagine that any program ever actually wanted the feature.
Now
>> >> it's one more undefined behavior that lurks in thousands of
programs.
>> >
>> > Hoare admitted to only one billion-dollar mistake.  No one dares count
>> > how many to write in C's ledger.  This was predicted, wasn't it?
>> > Everyone loved C because it was fast: it was performant, because it
>> > never met a runtime check it didn't eschew--recall again Kernighan
>> > punking Pascal on this exact point--and it was quick for the programmer
>> > to write because it never met a _compile_-time check it didn't eschew.
>> > C was born as a language for wizards who never made mistakes.
>> >
>> > The problem is that, like James Madison's fictive government of angels,
>> > such entities don't exist.  The staff of the CSRC itself may have been
>> > overwhelmingly populated with frank, modest, and self-deprecating
>> > people--and I'll emphasize here that I'm aware of no accounts that
this
>> > is anything but true--but C unfortunately played a part in stoking a
>> > culture of pretension among software developers.  "C is a language in
>> > which wizards program.  I program in C.  Therefore I'm a wizard."
is
>> how
>> > the syllogism (spot the fallacy) went.  I don't know who does more
>> > damage--the people who believe their own BS, or the ones who know
>> > they're scamming their colleagues.
>> >
>> >> There are two arguments for malloc(0), Most importantly, it caters for
>> >> a limiting case for aggregates generated at runtime--an instance of
>> >> Kernighan's Law, "Do nothing gracefully". It also
provides a way to
>> >> create a distinctive pointer to impart some meta-information, e.g.
>> >> "TBD" or "end of subgroup", distinct from the null
pointer, which
>> >> merely denotes absence.
>> >
>> > I think I might be confused now.  I've frequently seen arrays of
>> structs
>> > initialized from lists of literals ending in a series of "NULL"
>> > structure members, in code that antedates or ignores C99's wonderful
>> > feature of designated initializers for aggregate types.[3]  How does
>> > malloc(0) get this job done and what benefit does it bring?
>> >
>> > Last time I posted to TUHS I mentioned a proposal for explicit
>> tail-call
>> > elimination in C.  I got the syntax wrong.  The proposal was "return
>> > goto;".  The WG14 document number is N2920 and it's by Alex
Gilding.
>> > Good stuff.[4]  I hope we see it in C2y.
>> >
>> > Predictably, I must confess that I didn't make much headway on
>> > Schiller's 1975 "secure kernel" paper.  Maybe next time.
>> >
>> > Regards,
>> > Branden
>> >
>> > [1] https://web.cs.dal.ca/~jamie/UWO/C/the10fromHenryS.html
>> >
>> >    I can easily imagine that the tenet held at _some_ point in the
>> >    C's history.  It's _got_ to be the reason that the language
>> >    relegates memset() and memcpy() to the standard library (or to the
>> >    programmer's own devise)!  :-O
>> >
>> > [2] Kernighan & Ritchie, _The C Programming Language_, 2nd edition, p.
>> 2
>> >
>> >    Having thus admitted the camel's nose to the tent, K&R would
have
>> >    done the world a major service by making memset(), or at least
>> >    bzero(), a language feature, the latter perhaps by having "= 0"
>> >    validly apply to an lvalue of non-primitive type.  Okay,
>> >    _potentially_ a major service.  You'd still need the self-regarding
>> >    wizard programmers to bother coding it, which they wouldn't in many
>> >    cases "because speed".  Move fast, break stuff.
>> >
>> >    C++ screwed this up too, and stubbornly stuck by it for a long time.
>> >
>> >    https://cplusplus.github.io/CWG/issues/178.html
>> >
>> > [3] https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html
>> > [4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2920.pdf
>>
>> 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Re: Minimum Array Sizes in 16 bit C (was Maximum)