[J3] (SC22WG5.6121) Two things from IFIP WG 2.5 meeting
Bill Long
longb at cray.com
Mon Jul 29 09:29:50 EDT 2019
> On Jul 27, 2019, at 10:20 AM, Van Snyder via J3 <j3 at mailman.j3-fortran.org> wrote:
>
> On Sat, 2019-07-27 at 10:35 +0100, N.M. Maclaren wrote:
>
> Specification that a DOT_PRODUCT produce a correctly-rounded result does
> not depend upon the kind of the arguments. If a processor has a super
> accumulator that only works for binary64 (or binary32), it could use it
> for those precisions, and use a software method otherwise. The
> processor could detect at run time whether the CPU (or a coprocessor)
> provides a super accumulator, and use the appropriate method.
>
The focus on dot product seems odd for a couple of reasons. Most processors use IEEE floating point representations where the exponent range for 64-bit reals (used whenever accuracy is important) is ~ 10**300. The number of particles in the universe is ~10*80, so overflow seem very unlikely for any reasonably formed problem. Similarly for underflow, as tiny values are ~10**-300. The numerical problem I see as potential is for a very long vector sum at the end where the values are significantly different in size or have lots of cancellations because of alternating signs. There are a couple of techniques that can help using existing hardware. First, vectorize the operation which results in parallel accumulation of partial sums that are combined at the end. (Note that this only happens at non-zero optimization levels). Vectorization alone is sufficient for the vast majority of users. For widely varying sized terms in the summation, resulting in truncation errors in the additions, you could sort the list and sum staring at the small end.
The relevant question for WG5 is whether this issue is in the scope of the Fortran standard. For the small number of cases where these round-off issues matter to a program, the user might want to either write separate code, or use one of the professionally written libraries (NAG, IMSL, …) for the dot_product computation. Fortran specifies the result in mathematical terms. I think it is unwise to be specifying implementation details.
> I recall descriptions of neural network training failing to converge in
> even 100,000 iterations without correct linear algebra. Iterative
> refinement was tried, but made only a small dent, because of poor
> conditioning of the dot product. It finally worked when an accurate dot
> product was used. IIRC, this was described in a paper by Victor
> Pereyra, in connection with variable projection and separable nonlinear
> least-squares problems. I've asked correspondents for more examples.
The linear algebra needs in neural network training codes are beyond the skill set of most of the Python programmers who dominate the field. They typically just (ultimately) call the library routines supplied by the GPU vendors appropriate for the target hardware. If those routines are insufficient for some reason, comments to that effect should be directed to the vendors.
> At some time during the last thirty years, maybe more than once, I
> proposed a STRICT block. This was included in Ada 83, because it was in
> the requirements specifications from a very early stage. The vast
> majority of the meeting (or correspondents, I don't recall which), were
> against it.
Anton correctly explained that this is unnecessary.
>
> The SGI priorities -- get it out, get it fast, get it right -- permeated
> the Fortran committee even in 1986. You might recall that SGI sometimes
> got through the second step, but rarely the third.
And after more than one trip through bankruptcy, SGI no longer exists as a separate company.
Cheers,
Bill
Bill Long longb at cray.com
Principal Engineer, Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9143
Cray Inc./ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425
More information about the J3
mailing list