(j3.2006) What do typical processors do?

Bill Long longb
Tue Jul 18 19:30:18 EDT 2017


I depends on how ?non-gemm? the expression is.  If you had

  C = matmul ( A, transpose(B) )

then the compiler can pretty easily pattern match that to call to the optimized CGEMM library routine.   You basically get the GEMM performance with a lot simpler expression in your code.



> On Jul 18, 2017, at 6:00 PM, Van Snyder <Van.Snyder at jpl.nasa.gov> wrote:
> 
> Should I expect a processor to optimize
> 
>  C = matmul ( A, conjg(transpose(B)) )

This is not so obvious.  But if you used 1 temp:

  D = conjg(transpose(B))
  C = matmul ( A, D )

then the second line does get compiled as a CGEMM call. 

Cheers,
Bill

> 
> without making two or three temps, or should I write a matmul that has
> options to do that, or use *GEMM?
> 
> 
> _______________________________________________
> J3 mailing list
> J3 at mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3

Bill Long                                                                       longb at cray.com
Principal Engineer, Fortran Technical Support &   voice:  651-605-9024
Bioinformatics Software Development                      fax:  651-605-9143
Cray Inc./ 2131 Lindau Lane/  Suite 1000/  Bloomington, MN  55425





More information about the J3 mailing list