[J3] (SC22WG5.6114) Kahan paper
Ondřej Čertík
ondrej at certik.us
Fri Jul 26 00:44:51 EDT 2019
On Thu, Jul 25, 2019, at 8:47 PM, Van Snyder via J3 wrote:
> Attached paper by W. Kahan might be interesting.
I played with the example on the page 7 some time ago. Here is Fortran code that does it:
program kahan
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: s, os, comp
integer :: k
s = 0; os = -1; comp = 0; k = 0;
do while (s > os)
k = k + 1; os = s; comp = comp + term(k)
s = comp + os
comp = (os-s)+comp
end do
print *, k, s, term(k)
s = s + (tail(k) + comp)
print *, s
contains
real(dp) function term(k) result(r)
integer, intent(in) :: k
r = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
end function
real(dp) function tail(k) result(r)
integer, intent(in) :: k
r = 3465/(k+0.5_dp) + 3465/(k+1._dp)
end function
end
When compiled and run without and with -ffast-math, it produces:
$ gfortran -O3 -march=native -funroll-loops kahan.f90 && time ./a.out
61728404 9239.9998877340167 1.8187086585703921E-012
9240.0000000000000
real 0m0.263s
user 0m0.260s
sys 0m0.000s
$ gfortran -O3 -march=native -ffast-math -funroll-loops kahan.f90 && time ./a.out
87290410 9239.9999320850657 9.0949468492785197E-013
9240.0000114752293
real 0m0.204s
user 0m0.200s
sys 0m0.000s
So with fast-math, the compiler applies the aggressive optimization and accuracy is lost. Precisely as the article states.
However, what the article is missing is the solution: we really want to be using fast-math and let the compiler rearrange and optimize out the code as much as it can. So we have to write the code in a way that will still work when such optimizations are applied. So we analyze the array that is being summed, and we notice that the first term is ~5000, the last one is ~1e-12. Summing such numbers greatly depends on the way they are summed. So we first sum the first few big numbers, say 600, and then we sum the rest. Here is the new code:
program kahan
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: sum_, sum1, sum2, t
integer :: k, k_split
sum1 = 0
k_split = 600
do k = 1, k_split
sum1 = sum1 + term(k)
end do
sum2 = 0; k = k_split + 1; t = term(k)
do while (t > 1.8187e-12_dp)
sum2 = sum2 + t
k = k + 1; t = term(k)
end do
print *, k, sum1+sum2, t
sum_ = sum1 + sum2 + tail(k)
print *, sum_
contains
real(dp) function term(k) result(r)
integer, intent(in) :: k
r = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
end function
real(dp) function tail(k) result(r)
integer, intent(in) :: k
r = 3465/(k+0.5_dp) + 3465/(k+1._dp)
end function
end
When we run it, we obtain high accuracy results with and without -ffast-math:
$ gfortran -O3 -march=native -funroll-loops kahan.f90 && time ./a.out
61728551 9239.9998877342841 1.8186999964570246E-012
9240.0000000000000
real 0m0.126s
user 0m0.124s
sys 0m0.000s
$ gfortran -O3 -march=native -ffast-math -funroll-loops kahan.f90 && time ./a.out
61728551 9239.9998877342823 1.8186999964570246E-012
9239.9999999999982
real 0m0.126s
user 0m0.124s
sys 0m0.000s
Not only we have completely fixed the accuracy, we have also improved the speed, because the new code does not use Kahan summation, and rather vectorizes well. And works with fast-math. Although in this example fast-math does not seem to give much speedup, but the point is that it can be used safely.
Conclusion: one must always examine the arrays that are being summed and ensure that one does not sum big and small numbers together.
Ondrej
More information about the J3
mailing list