Steve Johnson writes:
This is one of my pet peeves. "Random
Access" memory is far from random when
you look at the time it takes to do the accesses. With modern memories,
accessing a column can be 20 to 40x slower than accessing a row. This is
particularly irritating when doing AI training, where training reuses 4-d
tensors transposed, a very painful operation.
In FORTRAN days, I once used a vector package in which you described a vector
by giving the first element, the second element, and a count. So you could
describe rows, columns, a matrix diagonal, and even rows and columns from back
to front. Fortran passed arguments by address, which made the whole thing easy
and fast.
Steve
Remember the words of that great performance pioneer Jimmy Durante: ras-a-ma-cas.