(j3.2006) (SC22WG5.5366) [ukfortran] Nondeterminacy of reductions
N.M. Maclaren
nmm1
Wed Nov 12 07:36:10 EST 2014
On Nov 11 2014, Van Snyder wrote:
>
>Sylvain Collange et al remark in ...
>
>that parallel computations, especially reductions, are non-deterministic
>due to floating-point computations not being computationally
>associative.
Which has been well-known for at least half a century. Indeed, it is
best to regard ALL floating-point computations as non-deterministic,
because of the way that optimisation and different underlying algorithms
affect the result. That is the model used for traditional numerical
analysis, after all.
>This method can accumulate an exact dot product as fast as data can be
>provided. With a super accumulator, the method is somewhat simpler than
>a floating-point fused-multiply-add. The size of the superaccumulator
>advocated therein is 536 bytes (4288 bits) for IEEE binary64 format.
>Contemporary processors have 16k of registers that could be organized
>into a super accumulator.
Which, inter alia, will increase the network and memory bandwidth
required by a factor of nearly 70. Or, in the case of IEEE 128-bit,
a factor of over 500.
More importantly, this trick (and it is simply a trick) handles
ONLY reduction by summation. It can't be extended to multiplication,
let alone to more complicated reductions.
>The present descriptions of REDUCE and CO_REDUCE do not accomodate the
>use of EXACT_DOT_PRODUCT. If EXACT_SUM is parallel to SUM, it also
>cannot be used in those contexts. An alternative to EXACT_SUM that
>takes two scalar arguments that are independently either floating-point
>(of any kind) or complete, and produces a complete result, would allow
>to use it in those contexts.
Yes, it can. All you need to do is the following:
Write a call EXACT_DOT_PRODUCT or EXACT_SUM
Expand the multiplication or number in that call to a suitable
derived type or array
Call CO_REDUCE on that expanded form with a suitable operation
Reduce the result to normal precision and store it back
If it were regarded as desirable that the standard should include such
a facility, it would be FAR cleaner to have specific intrinsics, and/or
an optional argument to CO_SUM, because this is not a general facility.
Regards,
Nick.
More information about the J3
mailing list