(j3.2006) (SC22WG5.5889) 3 levels of parallelism?

Bill Long longb
Wed Jul 5 14:56:00 EDT 2017


> On Jul 5, 2017, at 8:02 AM, Clune, Thomas L. (GSFC-6101) <thomas.l.clune at nasa.gov> wrote:
> 
> 
> Coarrays and DO CONCURRENT are major advances for parallel programming in Fortran.    However, as we look down the road, I think it is important for us to consider some of the insights that have come from the HPC community.   In particular, there is fairly clear consensus that it is important in user code to explicitly manage _3_ different levels of parallelism.    This is more explicit in the cases like GPU?s but even Intel Phi and conventional processors have shown the importance of carefully coding at each of 3 levels.   Roughly speaking, these levels correspond to (1) coarse-grained message passing (inter-node),  (2) threading (within-node), and (3) vectorization.     But this correspondence is only suggestive - the actual breakdown in GPU?s is somewhat different.

1) Coarrays, and the general parallel model that goes with them, cover the internode case.  Actually inter-image, to use Fortran language.  The mapping of images to nodes is outside the scope of the standard.  Because you could map images to cores within a node, this model can be applied to option (2) as well - within-node. 

2) DO CONCURRENT does not require threading. It provides the compiler with sufficient information to ensure that threaded code is ?safe?.  The language, in general, provides semantics that permit various forms of optimization.  DO CONCURRENT provides enough to allow the compiler to assign different loop iterations to different threads.  Note that, based on the provided semantics, the compiler can choose threading, or vectorization, or both, for the loop, depending on the code involved in the loop body. 

3) Fortran was a pioneer in ?vectorization? with arrays being first class objects and ?array-syntax? expressions.   Array expressions can usually be vectorized trivially.  If the target hardware can benefit from vector code, you should expect the compiler to generate it.  Automatic vectorization of loops has been available in Fortran compilers for decades. Since Fortran 90, automatic vectorization of array expressions has been the norm. 

Optimistically, compilers claiming to be F2008 conforming should be handling all these cases by default.  Although some still require compiler options to enable the coarray-based SPMD capabilities.   I expect that limitation to go away eventually. 

Cheers,
Bill

> 
> In Garching various statements were made that Coarrays are good for both (1) and (2).   Likewise statements were made that DO CONCURRENT is good for (2) and (3).    And I?m not arguing for or against this. But it would behoove us to be certain that we really can _effectively_ address all 3 levels of parallelism with the standard as is.    Otherwise, to ensure that Fortran retains its focus on HPC, we should be looking for a suitable extension that enables explicit control at all 3 levels in an architecture independent manner.    Maybe it is obvious to others in the committee, in which case I?ll be happy to sit back and absorb wisdom.
> 
> Cheers,
> 
> - Tom
> 
> _______________________________________________
> J3 mailing list
> J3 at mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3

Bill Long                                                                       longb at cray.com
Principal Engineer, Fortran Technical Support &   voice:  651-605-9024
Bioinformatics Software Development                      fax:  651-605-9143
Cray Inc./ 2131 Lindau Lane/  Suite 1000/  Bloomington, MN  55425





More information about the J3 mailing list