You have one heck of a complicated question!
There are additional complications
with the Intel FPU in terms of reproducible results. The FPU uses 80 bits internally for
its calculations so its using more bits for intermediate calculations than
went in. You may find this article of interest:
Intel FPU Precision .
As opposed to your commonly quoted link above, this one is more like: "What most computer scientists
probably don't need to know about floating point"! However some of this may be required for your
question. I hope this "how the guts work" article helps in at least what the hardware is
doing part.