(j3.2006) (SC22WG5.5889) 3 levels of parallelism?

Bill Long longb
Wed Jul 5 15:46:02 EDT 2017


> On Jul 5, 2017, at 2:30 PM, Brian Friesen <bfriesen at lbl.gov> wrote:
> 
> On Wed, Jul 5, 2017 at 12:01 PM, Clune, Thomas L. (GSFC-6101) <thomas.l.clune at nasa.gov> wrote:
> Thanks.    I should have realized that array notation was the missing bit.
> 
> It will be interesting to see if Nvidia sees the situation in a similar light.   Gary? ...
> 
> I can (naively) imagine a scenario in which a GPU compiler would do the same thing that Bill mentioned for CPUs:

Right.  If GPU?s became as pervasive as multiple cores on a processor chip, I suspect compiler vendors would take advantage of DO CONCURRENT to target GPU?s.  Ultimately, this is a customer demand issue, since generating code for GPU?s is a non-trivial task. 

Cheers,
Bill

> 
> > Note that, based on the provided semantics, the compiler can choose threading, or vectorization, or both, for the loop, depending on the code involved in the loop body.
> 
> It seems to me that GPUs have 2 levels of intra-node parallelism as well, namely warps+threads. So a GPU compiler encountering a DO CONCURRENT could choose to divide the loop iteration space into particular configurations of warps and threads based on the amount of work in the loop, just like a CPU compiler. Unless I grossly misunderstand GPU parallelism (Gary?).
> _______________________________________________
> J3 mailing list
> J3 at mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3

Bill Long                                                                       longb at cray.com
Principal Engineer, Fortran Technical Support &   voice:  651-605-9024
Bioinformatics Software Development                      fax:  651-605-9143
Cray Inc./ 2131 Lindau Lane/  Suite 1000/  Bloomington, MN  55425





More information about the J3 mailing list