At 2019-10-18T19:20:35-0400, Arthur Krewat wrote:
I didn't have an 8087 floating point accelerator,
so I wrote my
assembler example to use two 16-bit words of integers, combining them
for a 31-bit integer value with sign.
Now mind you, the C version used real floating point, and a software
floating point library with no hardware accelerator. At that point, I
realized C was the way to go. It had passed my experiment with flying
colors. The C compiler, I believe, was from Computer Innovations,
Copyright (c)1981,82,83,84,85
The reason this is similar to Ken's statement above: In the assembler
version, the cube would deform quite a bit before the run would
finish. A 31-bit integer didn't accurately reflect the result of the
math. Over time, that slight inaccuracy really added up. The accuracy
of the C version using floats was spot on. So while I basically
cheated for the assembler version, causing the deformation of the cube
over time, the C version was 100% accurate even though it was slower.
I wonder, is there something inherently different between PDP-11/7
floats and Intel's leading to the inaccuracy Ken mentions? Was the
PDP-11 (or the -7) floating point that much different than IEEE-754 ?
It sounds like it could be a simple matter of precision to me.
It takes 32 bits to store a single-precision floating point value.
Double-precision requires 64. In IEEE 754, the significand is 53 bits
(52 bits plus the implicit leading 1).
I can never remember the C type promotion rules without looking them up,
but IIRC at least in some circumstances C promotes floats to doubles, at
least for intermediate results. And the software floating-point library
you used could well have done the same, or perhaps it used doubles all
the way internally. Either of these could have prevented accumulated
roundoff.
I've heard, with a level of conviction somewhere between folklore and
formal demonstration[1], that for many practical numerical problems,
single-precision is just not quite good enough, but double-precision is
ample. Somewhere between 24 and 53 bits of significant, perhaps, there
is a sweet spot.
The wisdom I've absorbed is, if you have to do floating-point, use
doubles, unless you can clearly and convincingly articulate why you
absolutely need more precision, or can get away with less. (For same 3D
game-rendering applications, half-precision is adequate.)
A non-quantified "single-precision will be faster" declaration should be
understood to include a lot of "!!1!11" punctuation after it, and such
people should be handled as delicately as any other Gentoo user.
Regards,
Branden
[1] Example: Ben Klemens, _21st-Century C_, O'Reilly.