(j3.2006) What do typical processors do?
Van Snyder
Van.Snyder
Tue Jul 18 20:33:07 EDT 2017
On Tue, 2017-07-18 at 23:30 +0000, Bill Long wrote:
> I depends on how ?non-gemm? the expression is. If you had
>
> C = matmul ( A, transpose(B) )
>
> then the compiler can pretty easily pattern match that to call to the optimized CGEMM library routine. You basically get the GEMM performance with a lot simpler expression in your code.
>
>
>
> > On Jul 18, 2017, at 6:00 PM, Van Snyder <Van.Snyder at jpl.nasa.gov> wrote:
> >
> > Should I expect a processor to optimize
> >
> > C = matmul ( A, conjg(transpose(B)) )
>
> This is not so obvious. But if you used 1 temp:
>
> D = conjg(transpose(B))
> C = matmul ( A, D )
>
> then the second line does get compiled as a CGEMM call.
CGEMM can be told to use the conjugate transpose of its first or second
argument. Maybe someday processors will recognize
C = matmul ( A, conjg(transpose(B)) )
as a call to CGEMM, without needing any temps.
For new, an ugly direct call to CGEMM looks more attractive.
It might be helpful if the TRANSPOSE intrinsic had an optional CONJG
logical argument.
My application is actually
C = matmul ( A, conjg(transpose(A)) )
so I can use
C = matmul ( conjg(A), transpose(A) )
Would a processor recognize that and turn it into a call to CGEMM
without creating temps?
>
> Cheers,
> Bill
>
> >
> > without making two or three temps, or should I write a matmul that has
> > options to do that, or use *GEMM?
> >
> >
> > _______________________________________________
> > J3 mailing list
> > J3 at mailman.j3-fortran.org
> > http://mailman.j3-fortran.org/mailman/listinfo/j3
>
> Bill Long longb at cray.com
> Principal Engineer, Fortran Technical Support & voice: 651-605-9024
> Bioinformatics Software Development fax: 651-605-9143
> Cray Inc./ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425
>
>
More information about the J3
mailing list