(j3.2006) (SC22WG5.5895) 3 levels of parallelism?

Keith Bierman khbkhb
Wed Jul 5 15:30:16 EDT 2017


Out of curiosity, how do folks think we could best leverage/hijack an
integrated FPGA resource? Traditionally these have been "coprocessors" with
a great separation from the language and typically hanging off an IO bridge
(conceptually similar to the early array processors). I'm seeing some
movement towards greater / closer integration on the motherboard so that
tight integration with the application program is at least theoretically
within reach.

Typically the folks doing this sort of thing code for the FPGA in verilog
... and integrating with that in C.

Compiling directly to the FPGA would be nice; but assuming that's still
done in verilog ... I assume coroutines would be our best bet ... any other
obvious approaches (that someone is prepared to say anything about ...
without an NDA ;>).

Keith Bierman
khbkhb at gmail.com
303 997 2749

On Wed, Jul 5, 2017 at 1:01 PM, Clune, Thomas L. (GSFC-6101) <
thomas.l.clune at nasa.gov> wrote:

> Bill,
>
> Thanks.    I should have realized that array notation was the missing bit.
>
> It will be interesting to see if Nvidia sees the situation in a similar
> light.   Gary? ...
>
> - Tom
>
>
>
> > On Jul 5, 2017, at 2:56 PM, Bill Long <longb at cray.com> wrote:
> >
> >
> >> On Jul 5, 2017, at 8:02 AM, Clune, Thomas L. (GSFC-6101) <
> thomas.l.clune at nasa.gov> wrote:
> >>
> >>
> >> Coarrays and DO CONCURRENT are major advances for parallel programming
> in Fortran.    However, as we look down the road, I think it is important
> for us to consider some of the insights that have come from the HPC
> community.   In particular, there is fairly clear consensus that it is
> important in user code to explicitly manage _3_ different levels of
> parallelism.    This is more explicit in the cases like GPU?s but even
> Intel Phi and conventional processors have shown the importance of
> carefully coding at each of 3 levels.   Roughly speaking, these levels
> correspond to (1) coarse-grained message passing (inter-node),  (2)
> threading (within-node), and (3) vectorization.     But this correspondence
> is only suggestive - the actual breakdown in GPU?s is somewhat different.
> >
> > 1) Coarrays, and the general parallel model that goes with them, cover
> the internode case.  Actually inter-image, to use Fortran language.  The
> mapping of images to nodes is outside the scope of the standard.  Because
> you could map images to cores within a node, this model can be applied to
> option (2) as well - within-node.
> >
> > 2) DO CONCURRENT does not require threading. It provides the compiler
> with sufficient information to ensure that threaded code is ?safe?.  The
> language, in general, provides semantics that permit various forms of
> optimization.  DO CONCURRENT provides enough to allow the compiler to
> assign different loop iterations to different threads.  Note that, based on
> the provided semantics, the compiler can choose threading, or
> vectorization, or both, for the loop, depending on the code involved in the
> loop body.
> >
> > 3) Fortran was a pioneer in ?vectorization? with arrays being first
> class objects and ?array-syntax? expressions.   Array expressions can
> usually be vectorized trivially.  If the target hardware can benefit from
> vector code, you should expect the compiler to generate it.  Automatic
> vectorization of loops has been available in Fortran compilers for decades.
> Since Fortran 90, automatic vectorization of array expressions has been the
> norm.
> >
> > Optimistically, compilers claiming to be F2008 conforming should be
> handling all these cases by default.  Although some still require compiler
> options to enable the coarray-based SPMD capabilities.   I expect that
> limitation to go away eventually.
> >
> > Cheers,
> > Bill
> >
> >>
> >> In Garching various statements were made that Coarrays are good for
> both (1) and (2).   Likewise statements were made that DO CONCURRENT is
> good for (2) and (3).    And I?m not arguing for or against this. But it
> would behoove us to be certain that we really can _effectively_ address all
> 3 levels of parallelism with the standard as is.    Otherwise, to ensure
> that Fortran retains its focus on HPC, we should be looking for a suitable
> extension that enables explicit control at all 3 levels in an architecture
> independent manner.    Maybe it is obvious to others in the committee, in
> which case I?ll be happy to sit back and absorb wisdom.
> >>
> >> Cheers,
> >>
> >> - Tom
> >>
> >> _______________________________________________
> >> J3 mailing list
> >> J3 at mailman.j3-fortran.org
> >> http://mailman.j3-fortran.org/mailman/listinfo/j3
> >
> > Bill Long
>        longb at cray.com
> > Principal Engineer, Fortran Technical Support &   voice:  651-605-9024
> > Bioinformatics Software Development                      fax:
> 651-605-9143
> > Cray Inc./ 2131 Lindau Lane/  Suite 1000/  Bloomington, MN  55425
> >
> >
> > _______________________________________________
> > J3 mailing list
> > J3 at mailman.j3-fortran.org
> > http://mailman.j3-fortran.org/mailman/listinfo/j3
>
> _______________________________________________
> J3 mailing list
> J3 at mailman.j3-fortran.org
> http://mailman.j3-fortran.org/mailman/listinfo/j3
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.j3-fortran.org/pipermail/j3/attachments/20170705/8ccd28d8/attachment.html 



More information about the J3 mailing list