[J3] New reference BLAS NRM2 routines

Robert Corbett rpcorbett at att.net
Tue Oct 9 07:52:09 EDT 2018

I assume that the bad optimization
you refer to is the one described in
the final paragraph of Section 8.1
of the article.  I find the authors'
analysis of the problem to be
dubious.  The article does not
give enough information to allow
a reader to check the authors'
assertion that the Intel compiler
does not honor parentheses in
optimizations.  The Fortran code
is not present in the article.

The version of the Intel compiler
they are using (14.0.2) might be
using x87 arithmetic.  If so, the
compiled code might retain extra
precision across statements.
Extra precision will break any of
the compensated arithmetic
algorithms of which I am aware.
The option fp:precise eliminates
the extra precision, which would
explain why adding the option
fixes the code.

The authors might have correctly
analyzed the problem.  If they
based their analysis on examination
of the generated code, then they
probably are correct.  The bad
optimization they describe is
hard to implement.  If a compiler
could do that optimization, it would
probably do much more as well.

The question of whether compiled
code can retain extra precision
across statements goes back to
the first Fortran 90 interpretation.

Robert Corbett

> On Oct 8, 2018, at 1:20 PM, Van Snyder via J3 <j3 at mailman.j3-fortran.org> wrote:
> New IEEE-aware Fortran reference level-1 BLAS routines *NRM2 for the
> 2-norm of a vector are described in ACM Transactions on Mathematical
> Software 44, 3 (2018).  They produce results accurate to one ULP for
> vectors up to length 1.7e9 (IEEE single precision) and 9e15 (IEEE double
> precision).
> They use Kahan compensated summation, and therefore have a cost of about
> 3.5 compared to the original reference level-1 BLAS routine (Lawson et
> al 1979).  They return mathematically correct IEEE flags and values for
> non-finite input.  In particular, if the input contains at least one NaN
> and at least one Inf, the result is +Inf and normal status.
> Some processors had to be convinced not to do some oprimizations that
> the standard doesn't explicitly allow in order to work correctly.  These
> were inter-statement "optimizations" that produce mathematically
> equivalent results, but not computationally equivalent results.  The
> standard allows this only for optimizations within statements.
> For the reference and LAPACK versions, which do not use compensated
> summation, the article reports relative error in single precision for
> vectors of length 1e5, with random values in (0,1), was about 82 ULP.
> Van

More information about the J3 mailing list