It seems that both gcc and icc also have options specifically aimed at the problem:
Gnu compilers default to observing parentheses in accordance with language standards. Traditional C K&R behavior of permitting elision of parentheses according to algebraic rules is included in gcc –ffast-math and icc –fp-model fast.
Gfortran sets –fprotect-parens as a default even under –ffast-math, so that no normally used options will violate parentheses:
http://gcc.gnu.org/onlinedocs/gfortran/Code-Gen-Options.html
Apparently this was default for gcc only for the version which first introduced the option, and the option was later withdrawn, so gcc (and icc) don’t allow for using parentheses to control associativity under normally aggressive optimizations.
Ifort has the option –assume protect_parens for this purpose. With the addition of ,minus0 clause this may be enough for Fortran 95 standard compliance.
In order to set observance of parentheses (gcc –fprotect-parens) in icc without disabling other aggressive optimizations, the combination –fp-model source –ftz –fast-transcendentals –no-prec-div –no-prec-sqrt may be set. This may be a frequently needed setting for Intel® Xeon Phi™. For Intel® Xeon host, of these options, only –fast-transcendentals has a consistently large effect on performance.
Among well-known algorithms which will break when the compiler elides parentheses is Kahan summation. Consistent elimination of parentheses could reduce Kahan summation to a standard lower accuracy vectorizable sum. Inconsistent K&R treatment (or inconsistent extra precision) produces incorrect (not simply lower accuracy) results.
Intel compilers don’t recognize common sub-expressions unless they are parenthesized, and an option is set to observe the parentheses, e.g.
a1 *= ((d1 *(e1 + f1)+ e1 * f1) + c1 *((e1 + f1) + d1))
+ b1 *(((e1 + f1) + d1) + c1);
b1 *= (d1 *(e1 + f1)+ e1 * f1) + c1 *((e1 + f1) + d1);
while other compilers don’t need the extra parentheses as long as the common subexpressions match during left to right evaluation. In this code fragment, there are 5 repeated add instructions which may be eliminated, although the Intel compilers will replace some by equally time-consuming copy (move) instructions. The difference in performance may be quite large when fused multiply-add instructions are used.
Additional named assignments for local sub-expressions are undesirable, particularly in parallelized code, as they increase register pressure and invite the compiler not to perform register optimization.
But when these things came to be isn't clear. That said; the Kahan summation algorithm goes back to 1989; and compiler writers have been aware of it for a long time.
Of course; that says nothing about the various programmers who've had their hands on this code since.