(j3.2006) What do typical processors do?
Bill Long
longb
Tue Jul 18 19:30:18 EDT 2017
I depends on how ?non-gemm? the expression is. If you had
C = matmul ( A, transpose(B) )
then the compiler can pretty easily pattern match that to call to the optimized CGEMM library routine. You basically get the GEMM performance with a lot simpler expression in your code.
> On Jul 18, 2017, at 6:00 PM, Van Snyder <Van.Snyder at jpl.nasa.gov> wrote:
>
> Should I expect a processor to optimize
>
> C = matmul ( A, conjg(transpose(B)) )
This is not so obvious. But if you used 1 temp:
D = conjg(transpose(B))
C = matmul ( A, D )
then the second line does get compiled as a CGEMM call.
Cheers,
Bill
>
> without making two or three temps, or should I write a matmul that has
> options to do that, or use *GEMM?
>
>
> _______________________________________________
> J3 mailing list
> J3 at mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3
Bill Long longb at cray.com
Principal Engineer, Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9143
Cray Inc./ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425
More information about the J3
mailing list