On Sat, Sep 21, 2024 at 01:07:11AM +1000, Dave Horsfall wrote:
On Fri, 20 Sep 2024, Paul Winalski wrote:
On Thu, Sep 19, 2024 at 7:52???PM Rich Salz
<rich.salz(a)gmail.com> wrote:
In my first C programming job I saw the source to V7 grep which
had a "foo[-2]" construct.
That sort of thing is very dangerous with modern compilers.?? Does K&R C
require that variables be allocated in the order that they are declared??? If
not, you're playing with fire.?? To get decent performance out of modern
processors, the compiler must perform data placement to maximize cache
efficiency, and that practically guarantees that you can't rely on
out-of-bounds array references.
[...]
Unless I'm mistaken (quite possible at my age), the OP was referring to
that in C, pointers and arrays are pretty much the same thing i.e.
"foo[-2]" means "take the pointer 'foo' and go back two
things" (whatever
a "thing" is).
Yes, but that was a stack variable. Let me see if I can say it more clearly.
foo()
{
int a = 1, b = 2;
int alias[5];
alias[-2] = 0; // try and set a to 0.
}
In v7 days, the stack would look like
[stuff]
[2 bytes for a]
[2 bytes for b]
[2 bytes for the alias address, which I think points forward]
[10 bytes for alias contents]
I'm hazy on how the space for alias[] is allocated, so I made that up. It's
probably something like I said but Paul (or someone) will correct me.
When using a negative index for alias[], the coder is assuming that the stack
variables are placed in the order they were declared. Paul tried to explain
that _might_ be true but is not always true. Modern compilers will look see
which variables are used the most in the function, and place them next to
each other so that if you have the cache line for one heavily used variable,
the other one is right there next to it. Like so:
int heavy1 = 1;
int rarely1 = 2;
int spacer[10];
int heavy2 = 3;
int rarel2 = 4;
The compiler might figure out that heavy{1,2} are used a lot and lay out the
stack like so:
[2 bytes (or 4 or 8 these days) for heavy1]
[bytes for heavy2]
[bytes for rarely1]
[bytes for spacer[10]]
[bytes for rarely2]
Paul was saying that using a negative index in the array creates an alias,
or another name, for the scalar integer on the stack (his description made
me understand, for the first time in decades, why compiler writers hate
aliases and I get it now). Aliases mess hard with optimizers. Optimizers
may reorder the stack for better cache line usage and what you think
array[-2] means doesn't work any more unless the optimizer catches that
you made an alias and preserves it.
Paul, how did I do? I'm not a compiler guy, just had to learn enough to
walk the stack when the kernel panics.