(j3.2006) (SC22WG5.5365) Nondeterminacy of reductions
Van Snyder
Van.Snyder
Tue Nov 11 14:29:32 EST 2014
Sylvain Collange et al remark in
"Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation
on Multi- and Many-Core Architectures"
https://www.researchgate.net/publication/267982095_Full-Speed_Deterministic_Bit-Accurate_Parallel_Floating-Point_Summation_on_Multi-_and_Many-Core_Architectures
that parallel computations, especially reductions, are non-deterministic
due to floating-point computations not being computationally
associative.
The method they advocate is an exact summation, using super accumulators
and arithmetic units in contemporary processors that are not generally
used in ordinary computations.
It is possible to compute exact sums and inner products using a method
due to Siegfried Rump, in e.g. "Ultimately fast accurate summation,"
which appeared in SIAM Journal on Scientific Computation 31(5) (2009)
3466-3502, but the algorithm is complicated and expensive.
Another alternative proposed by Collange et al is the use of a super
accumulator, as described by Kulisch in
"The Exact Dot Product as Basic Tool for Long Interval Arithmetic"
https://www.researchgate.net/publication/44886114_The_Exact_Dot_Product_as_Basic_Tool_for_Long_Interval_Arithmetic
This method can accumulate an exact dot product as fast as data can be
provided. With a super accumulator, the method is somewhat simpler than
a floating-point fused-multiply-add. The size of the superaccumulator
advocated therein is 536 bytes (4288 bits) for IEEE binary64 format.
Contemporary processors have 16k of registers that could be organized
into a super accumulator.
The exact dot product is sufficient for solving this problem, but an
exact sum would be simpler to use in many cases.
Independently of the article by Collange et al, several colleagues have
asked about exact dot product and exact sum in Fortran.
They could be provided by additional intrinsic functions, say
EXACT_DOT_PRODUCT and EXACT_SUM (and EXACT_DIFFERENCE).
The result should be a "complete format" result, as described in "The
Exact Dot Product...." This could be described in the standard as a
derived type with private components, provided by ISO_Fortran_ENV (or
another intrinsic module), or an intrinsic numeric type that represents
such a format could be provided.
The present descriptions of REDUCE and CO_REDUCE do not accomodate the
use of EXACT_DOT_PRODUCT. If EXACT_SUM is parallel to SUM, it also
cannot be used in those contexts. An alternative to EXACT_SUM that
takes two scalar arguments that are independently either floating-point
(of any kind) or complete, and produces a complete result, would allow
to use it in those contexts.
Whether a correctly-rounded dot product with a floating-point result is
produced by a method such as Rump's, or is computed using Kulisch's
method, and whether exact dot product and exact sum are provided by
hardware or software, should be invisible to a language standard.
More information about the J3
mailing list