(j3.2006) IEEE modules and NORM2
Van Snyder
Van.Snyder
Thu Oct 9 13:51:36 EDT 2008
Dick Hanson sent me the following interesting result. It confirms the
usefulness of the IEEE modules, and points out that when vendors get
around to implementing NORM2 they shouldn't just use level 1 reference
BLAS.
Van
As an illustration for using the IEEE modules, I coded up DNRM2 of the
BLAS. The basic idea is simple: Do the easy loop first and then
check for exceptions and fix things up. For that I used Jim Blue's
1978 TOMS algorithm. It prompted the question about the static
constants.
The basic conclusions are that this IEEE version is always more
accurate. It pulls away from Hammarling's LAPACK version at about
n=40 and gets steadily faster, perhaps to factors or 30 or more. This
is on the IBM PowerStation.
So this is good news for the IEEE module supporters, especially John
Reid. The example he gave in M, R and C for the planar length is
always going to be a lot slower than simply scaling. This is because
there is overhead in the calls to get and set flags. For DNRM2 that
overhead time gets swamped out by compute time at about n=40.
More information about the J3
mailing list