[J3] Questions about sqrt -- mostly for vendors
Steven G. Kargl
kargl at troutmask.apl.washington.edu
Tue Jan 29 21:05:23 EST 2019
On Tue, Jan 29, 2019 at 04:21:37PM -0800, Van Snyder wrote:
>
> > gfortran will normally use sqrt(x) and a division for y = 1 / sqrt(x).
> > If, however, -ffast-math or -funsafe-math-optimization is specified
> > on the command line, gfortran will use rsqrt(x) and a Newton-Raphson
> > step to ensure precision. This applies to REAL x. For DOUBLE PRECISION
> > x, there is no rsqrt so gfortran uses sqrt and does the division.
>
> Algorithms for rsqrt are faster than algorithms for sqrt because the
> Newton iteration doesn't require divides.
>
> For inverse sqrt, it is almost certainly faster to use the
> single-precision rsqrt instruction and one (or maybe two) Newton
> iterations to get to double precision. If sqrt is desired, it's almost
> certainly faster to compute rsqrt, and then multiply by X. Don't
> divide.
>
With x86, rsqrtss returns an at most 11-bit accurate result. One
NR iteration gets you at most 22 bits. Two NR iterations gets you
at most 44 bits (and the extra 2 bits needed for single precision).
For double precision with 53-bits of precision, you need 3 iterations.
It takes a bit of effort to find info of other instruction sets.
Some classes of PowerPC processors support a frsqrte instruction.
This produces the reciprocal square root of a double precision
quantities. The doc I found stated that the result "is correct
to a precision of one part in 32 of the reciprocal of the square
root of (FRB)". FRB here would be an estimate of 1/x, and has a
stated precision of 1 in 256. It's unclear what is meant in this
documentation about the estimated precisions, but I suspect one
needs 1, maybe 2, NR to get an accurate result.
Multipling an estimate of the reciprocal sqrt by x will provide
an estimate for sqrt(x). It would be interesting to see if this
meets the requirements of IEEE-754 especially if IEEE_ARITHMETIC
is used.
I am not saying the RSQRT should not be added to Fortran. But,
I think it may be an over-simplification to assume that if a
cpu offers a reciprocal sqrt instruction that it will provide a
faster method for computing 1/sqrt(x). Even if it is faster,
it may be inaccurate estimate or not meet the requirements of
IEEE-754. I tend to avoid getting the wrong answer faster, and
think the Fortran standard should as well.
--
Steve
More information about the J3
mailing list