(j3.2006) (SC22WG5.5889) 3 levels of parallelism?
Bill Long
longb
Wed Jul 5 15:46:02 EDT 2017
> On Jul 5, 2017, at 2:30 PM, Brian Friesen <bfriesen at lbl.gov> wrote:
>
> On Wed, Jul 5, 2017 at 12:01 PM, Clune, Thomas L. (GSFC-6101) <thomas.l.clune at nasa.gov> wrote:
> Thanks. I should have realized that array notation was the missing bit.
>
> It will be interesting to see if Nvidia sees the situation in a similar light. Gary? ...
>
> I can (naively) imagine a scenario in which a GPU compiler would do the same thing that Bill mentioned for CPUs:
Right. If GPU?s became as pervasive as multiple cores on a processor chip, I suspect compiler vendors would take advantage of DO CONCURRENT to target GPU?s. Ultimately, this is a customer demand issue, since generating code for GPU?s is a non-trivial task.
Cheers,
Bill
>
> > Note that, based on the provided semantics, the compiler can choose threading, or vectorization, or both, for the loop, depending on the code involved in the loop body.
>
> It seems to me that GPUs have 2 levels of intra-node parallelism as well, namely warps+threads. So a GPU compiler encountering a DO CONCURRENT could choose to divide the loop iteration space into particular configurations of warps and threads based on the amount of work in the loop, just like a CPU compiler. Unless I grossly misunderstand GPU parallelism (Gary?).
> _______________________________________________
> J3 mailing list
> J3 at mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3
Bill Long longb at cray.com
Principal Engineer, Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9143
Cray Inc./ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425
More information about the J3
mailing list