(j3.2006) (SC22WG5.5366) [ukfortran] Nondeterminacy of reductions

N.M. Maclaren nmm1
Wed Nov 12 07:36:10 EST 2014


On Nov 11 2014, Van Snyder wrote:
>
>Sylvain Collange et al remark in ...
>
>that parallel computations, especially reductions, are non-deterministic
>due to floating-point computations not being computationally
>associative.

Which has been well-known for at least half a century.  Indeed, it is
best to regard ALL floating-point computations as non-deterministic,
because of the way that optimisation and different underlying algorithms
affect the result.  That is the model used for traditional numerical
analysis, after all.

>This method can accumulate an exact dot product as fast as data can be
>provided.  With a super accumulator, the method is somewhat simpler than
>a floating-point fused-multiply-add.  The size of the superaccumulator
>advocated therein is 536 bytes (4288 bits) for IEEE binary64 format.
>Contemporary processors have 16k of registers that could be organized
>into a super accumulator.

Which, inter alia, will increase the network and memory bandwidth
required by a factor of nearly 70.  Or, in the case of IEEE 128-bit,
a factor of over 500.

More importantly, this trick (and it is simply a trick) handles
ONLY reduction by summation.  It can't be extended to multiplication,
let alone to more complicated reductions.

>The present descriptions of REDUCE and CO_REDUCE do not accomodate the
>use of EXACT_DOT_PRODUCT.  If EXACT_SUM is parallel to SUM, it also
>cannot be used in those contexts.  An alternative to EXACT_SUM that
>takes two scalar arguments that are independently either floating-point
>(of any kind) or complete, and produces a complete result, would allow
>to use it in those contexts.

Yes, it can.  All you need to do is the following:

    Write a call EXACT_DOT_PRODUCT or EXACT_SUM

    Expand the multiplication or number in that call to a suitable
    derived type or array

    Call CO_REDUCE on that expanded form with a suitable operation

    Reduce the result to normal precision and store it back

If it were regarded as desirable that the standard should include such
a facility, it would be FAR cleaner to have specific intrinsics, and/or
an optional argument to CO_SUM, because this is not a general facility.


Regards,
Nick.




More information about the J3 mailing list