[J3] (SC22WG5.6114) Kahan paper

Ondřej Čertík ondrej at certik.us
Fri Jul 26 00:44:51 EDT 2019


On Thu, Jul 25, 2019, at 8:47 PM, Van Snyder via J3 wrote:
> Attached paper by W. Kahan might be interesting.

I played with the example on the page 7 some time ago. Here is Fortran code that does it:

program kahan
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: s, os, comp
integer :: k
s = 0; os = -1; comp = 0; k = 0;
do while (s > os)
        k = k + 1; os = s; comp = comp + term(k)
        s = comp + os
        comp = (os-s)+comp
end do
print *, k, s, term(k)
s = s + (tail(k) + comp)
print *, s

contains

        real(dp) function term(k) result(r)
        integer, intent(in) :: k
        r = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
        end function

        real(dp) function tail(k) result(r)
        integer, intent(in) :: k
        r = 3465/(k+0.5_dp) + 3465/(k+1._dp)
        end function

end


When compiled and run without and with -ffast-math, it produces:

$ gfortran -O3 -march=native -funroll-loops kahan.f90 && time ./a.out
        61728404   9239.9998877340167        1.8187086585703921E-012
   9240.0000000000000

real    0m0.263s
user    0m0.260s
sys     0m0.000s
$ gfortran -O3 -march=native -ffast-math -funroll-loops kahan.f90 && time ./a.out
        87290410   9239.9999320850657        9.0949468492785197E-013
   9240.0000114752293

real    0m0.204s
user    0m0.200s
sys     0m0.000s



So with fast-math, the compiler applies the aggressive optimization and accuracy is lost. Precisely as the article states.


However, what the article is missing is the solution: we really want to be using fast-math and let the compiler rearrange and optimize out the code as much as it can. So we have to write the code in a way that will still work when such optimizations are applied. So we analyze the array that is being summed, and we notice that the first term is ~5000, the last one is ~1e-12. Summing such numbers greatly depends on the way they are summed. So we first sum the first few big numbers, say 600, and then we sum the rest. Here is the new code:


program kahan
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: sum_, sum1, sum2, t
integer :: k, k_split
sum1 = 0
k_split = 600
do k = 1, k_split
    sum1 = sum1 + term(k)
end do
sum2 = 0; k = k_split + 1; t = term(k)
do while (t > 1.8187e-12_dp)
    sum2 = sum2 + t
    k = k + 1; t = term(k)
end do
print *, k, sum1+sum2, t
sum_ = sum1 + sum2 + tail(k)
print *, sum_

contains

    real(dp) function term(k) result(r)
    integer, intent(in) :: k
    r = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
    end function

    real(dp) function tail(k) result(r)
    integer, intent(in) :: k
    r = 3465/(k+0.5_dp) + 3465/(k+1._dp)
    end function

end


When we run it, we obtain high accuracy results with and without -ffast-math:


$ gfortran -O3 -march=native -funroll-loops kahan.f90 && time ./a.out
    61728551   9239.9998877342841        1.8186999964570246E-012
   9240.0000000000000

real    0m0.126s
user    0m0.124s
sys     0m0.000s
$ gfortran -O3 -march=native -ffast-math -funroll-loops kahan.f90 && time ./a.out
    61728551   9239.9998877342823        1.8186999964570246E-012
   9239.9999999999982

real    0m0.126s
user    0m0.124s
sys     0m0.000s


Not only we have completely fixed the accuracy, we have also improved the speed, because the new code does not use Kahan summation, and rather vectorizes well. And works with fast-math. Although in this example fast-math does not seem to give much speedup, but the point is that it can be used safely.


Conclusion: one must always examine the arrays that are being summed and ensure that one does not sum big and small numbers together.

Ondrej


More information about the J3 mailing list