[J3] Questions about sqrt -- mostly for vendors

Steven G. Kargl kargl at troutmask.apl.washington.edu
Tue Jan 29 21:05:23 EST 2019


On Tue, Jan 29, 2019 at 04:21:37PM -0800, Van Snyder wrote:
> 
> > gfortran will normally use sqrt(x) and a division for y = 1 / sqrt(x).
> > If, however, -ffast-math or -funsafe-math-optimization is specified
> > on the command line, gfortran will use rsqrt(x) and a Newton-Raphson
> > step to ensure precision.  This applies to REAL x.  For DOUBLE PRECISION
> > x, there is no rsqrt so gfortran uses sqrt and does the division.
> 
> Algorithms for rsqrt are faster than algorithms for sqrt because the
> Newton iteration doesn't require divides.
> 
> For inverse sqrt, it is almost certainly faster to use the
> single-precision rsqrt instruction and one (or maybe two) Newton
> iterations to get to double precision.  If sqrt is desired, it's almost
> certainly faster to compute rsqrt, and then multiply by X.  Don't
> divide.
> 

With x86, rsqrtss returns an at most 11-bit accurate result.  One
NR iteration gets you at most 22 bits.  Two NR iterations gets you
at most 44 bits (and the extra 2 bits needed for single precision).
For double precision with 53-bits of precision, you need 3 iterations.

It takes a bit of effort to find info of other instruction sets.
Some classes of PowerPC processors support a frsqrte instruction. 
This produces the reciprocal square root of a double precision
quantities.  The doc I found stated that the result "is correct
to a precision of one part in 32 of the reciprocal of the square
root of (FRB)".  FRB here would be an estimate of 1/x, and has a
stated precision of 1 in 256.  It's unclear what is meant in this
documentation about the estimated precisions, but I suspect one
needs 1, maybe 2, NR to get an accurate result.  

Multipling an estimate of the reciprocal sqrt by x will provide
an estimate for sqrt(x).  It would be interesting to see if this
meets the requirements of IEEE-754 especially if IEEE_ARITHMETIC
is used.

I am not saying the RSQRT should not be added to Fortran.  But,
I think it may be an over-simplification to assume that if a
cpu offers a reciprocal sqrt instruction that it will provide a
faster method for computing 1/sqrt(x).  Even if it is faster, 
it may be inaccurate estimate or not meet the requirements of
IEEE-754.  I tend to avoid getting the wrong answer faster, and
think the Fortran standard should as well. 

-- 
Steve


More information about the J3 mailing list