Dan Cross <crossd(a)gmail.com> wrote:
I guess it's potentially faster if you don't
have to swab
bytes between similar architectures?
The "potential" speedup is completely superficial, and the N^2
complexities make the code hard to test and hard to maintain. It's
better to do something simple and correct, and then you can leave it
untouched for a decade.
I implemented the byte-order handling code in the GNU BFD library back
in the early '90s. We picked up every integer as a series of unsigned
byte accesses and shifted them and added them, even when the byte order
of the data matched the byte order of the machine. Some machines can do
unaligned 4- or 8-byte fetches and some can't. The people who design
object file formats (or packet formats) don't always align things on the
boundaries that your hardware prefers. We wrote simple, easy to test
code that would and did run on ANY machine. We did the same for stores
as well as loads.
Every data structure had two representations: The external one, defined
by a struct full of unsigned char arrays; and the internal one, in
native data formats. For each data format, we wrote a trivial routine
to convert the external format to the internal; and its inverse. These
called a series of the lower level pick-up-bytes-in-particular-order
routines, one call per struct member. None of this was even inlined at
the time.
I never measured the overhead of these get- or put- routines as being
above 1% or 2% in the execution of the whole program (e.g. the GNU
linker).
We had enough complexity to deal with already, because every vendor made
their own slightly different version of COFF or ELF or a.out object file
formats. Some of these were in different byte orders. Some truly
insane vendors had the object file HEADERS in one byte order and the
actual binaries in a different byte order! We made a library that could
read and write them all -- and manage their symbol tables -- and even
link them together from a variety of formats and write the resulting
binary in a different format. This was all part of making the GNU
compilers into cross-compilers, in which your "host" byte order and
object file format are completely orthogonal to your "target" byte order
and object file format. We then built test suites that built the same
test code on a dozen host systems and made sure that all the resulting
binaries for target system "X" were bit-for-bit identical. Building in
those capabilities, and that level of reliability, was much more
important than any 2% speedup.
Premature optimization is the root of much evil.
John