(j3.2006) an intrinsic for SORT()

Van Snyder van.snyder
Tue Feb 6 14:59:52 EST 2018


On Tue, 2018-02-06 at 14:46 -0500, Steve Lionel wrote:
> On 2/6/2018 2:45 PM, Van Snyder wrote:
> >
> > If I write MATMUL(TRANSPOSE(A),B) will most processors create a temp for
> > TRANSPOSE(A), or will they do what DGEMM does and handle the TRANSPOSE
> > as a flag that changes how the multiply is done?
> >
> That depends on whether or not such usage is in a benchmark...

So I should be using DGEMM, just to be sure.  TRANSPOSE is only O(n^2),
seemingly small compared to the O(n^3) for the multiply, but the
coefficient is large because it has to visit the storage allocator.

In a language intended for science and engineering, not accounting,
adding a "transpose" flag for either argument would seem to be a higher
priority than a SORT intrinsic.  But, yes, I do sort stuff in my
satellite data analysis codes.  It wasn't hard to find sort routines.
It isn't hard to write correct ones.  I wrote several for our math
library decades ago.  Who among us hasn't?  I chose insertion sort
because the array is almost sorted.  A runtime algorithm selector would
just add overhead.

> Steve




More information about the J3 mailing list