[J3] (SC22WG5.6113) Two things from IFIP WG 2.5 meeting
Van Snyder
Van.Snyder at jpl.nasa.gov
Thu Jul 25 21:56:28 EDT 2019
Two things that ought to be of interest to the Fortran standard were
discussed at the recent meeting of IFIP Working Group 2.5 (Numerical
Software).
1. The dot product is an ubiquitous operation in mathematically-related
problems, not just in library modules. Floating-point arithmetic is not
associative. The dot product can be arbitrarily badly conditioned. The
numerical software community would like to have either an exact dot
product or a correctly-rounded dot product. Using a super-accumulator
as originally described by Kulisch, the exact dot product can be
computed as fast as operands can be fetched. IBM implemented the exact
dot product in hardware for the 370, in an option called ACRITH.
Kulisch and his students implemented the exact dot product on a PCI
card. Demmel and his students implemented the exact dot product on a
PCI card. Both Kulisch and Demmel observed that the PCI card computed
an exact dot product six times faster than a double-precision floating
point method using a Pentium.
Computing a correctly-rounded dot product is possible using conventional
floating point arithmetic, at a cost of up to six times the cost of the
obvious method. It is difficult for users to implement it because
processors sometimes do not respect parentheses. For example, a
compensated-summation loop might be written
sum = 0.0; comp = 0.0
do k = 1, 10
oldsum = sum
comp = comp + term(k)
sum = comp + oldsum
1 comp = ( oldsum - sum ) + comp
end do
sum = sum + comp ! The final compensated result
but a too-clever optimizer might realize that statement 1 is a clever
way to compute comm = 0, and therefore scrub out all appearances of
comp. (Siegfried Rump's correctly-rounded dot product algorithm uses
similar methods).
It would be useful to have an optional logical argument to DOT_PRODUCT
to request a correctly-rounded dot product. Under the covers, the
processor could compute this using either a Kulisch super accumulator,
or (e.g.) Siegfried Rump's method.
It would be useful to have an EXACT_DOT_PRODUCT intrinsic function that
produces a result of a type specified in ISO_Fortran_ENV. That type
should have type-bound operator(+) and operator(-) operations, that take
either two exact dot-product result objects, or one such object and a
floating-point object, producing an exact dot-product object. This
object would be a representation of a Kulisch super accumulator. It
should have a kind type parameter, allowed to have any of the values
allowed for type REAL. One might hope that eventually there would be
hardware support to compute the exact dot product. Software simulation
would be very slow.
2. It was observed (again) that parenthesization of matrix products has
a profound impact on performance. For example, in ABx, where A and B
are NxN matrices and x is an n-vector, (AB)x has N^3 complexity, while
A(Bx) has N^2 complexity.
Determining the optimal parenthesization is a simple dynamic-programming
problem, but producing a general procedure to handle an arbitrary number
of factors is non-trivial for user programs. It would not be difficult
for a processor to compute and execute the optimal parenthesization if
the MAMTUL intrinsic function allowed an arbitrary number of arguments.
One might hope that products involving the transpose of a matrix would
be noticed by a processor, and optimized so as not to produce the
transpose as a separate object. For example, MATMUL(TRANSPOSE(A),B)
ought not to create a representation of TRANSPOSE(A); rather, it ought
to do the multiplication differently. xGEMM (or xGEMV) can do this, so
if MATMUL simply uses xGEMM under the covers, there should be no
representation of TRANSPOSE(A). This should not be part of normative
text; rather it ought to be quality-of-implementation issue, perhaps
mentioned in a note, or Annex C.
Neither one of these seems to be a large project.
More information about the J3
mailing list