(j3.2006) (SC22WG5.5889) 3 levels of parallelism?

Brian Friesen bfriesen
Wed Jul 5 15:30:22 EDT 2017


On Wed, Jul 5, 2017 at 12:01 PM, Clune, Thomas L. (GSFC-6101) <
thomas.l.clune at nasa.gov> wrote:
>
> Thanks.    I should have realized that array notation was the missing bit.
>
> It will be interesting to see if Nvidia sees the situation in a similar
> light.   Gary? ...
>

I can (naively) imagine a scenario in which a GPU compiler would do the
same thing that Bill mentioned for CPUs:

> Note that, based on the provided semantics, the compiler can choose
threading, or vectorization, or both, for the loop, depending on the code
involved in the loop body.

It seems to me that GPUs have 2 levels of intra-node parallelism as well,
namely warps+threads. So a GPU compiler encountering a DO CONCURRENT could
choose to divide the loop iteration space into particular configurations of
warps and threads based on the amount of work in the loop, just like a CPU
compiler. Unless I grossly misunderstand GPU parallelism (Gary?).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.j3-fortran.org/pipermail/j3/attachments/20170705/ee3b4917/attachment.html 



More information about the J3 mailing list