[J3] Performance Portability and Fortran: Making Fortran cool again

William Clodius w.clodius at icloud.com
Fri Jan 18 10:41:11 EST 2019



> On Jan 17, 2019, at 10:47 PM, Bill Long via J3 <j3 at mailman.j3-fortran.org> wrote:
> 
> 
>> On Jan 17, 2019, at 7:12 PM, William Clodius via J3 <j3 at mailman.j3-fortran.org> wrote:
>> 
>> 
>> 
>>> On Jan 16, 2019, at 10:24 AM, Bill Long via J3 <j3 at mailman.j3-fortran.org> wrote:
>>> 
>>> Hi Ondrej,
>>> 
>>> This sort of insight is very valuable. Thanks for posting it. 
>>> 
>>> There seems to be a lot of focus on using GPU’s.  (Maybe that’s why they asked Gary -who works for NVIDIA - to participate?)
>>> 
>>> I would point out that a DO CONCURRENT construct has semantics that are quite compatible with execution on a GPU.   Typically, DO CONCURRENT constructs are threaded, using the same underlying infrastructure as OpenMP.  I’ve mentioned to our compiler developers about adding GPU support, but the chicken-egg problem is “no customer is asking for this”.  If customers, especially ones as large and visible as LANL, ask, you might get.   If the standard needs tweaks to better enable GPU execution of DO CONCURRENT, that is something we should look into. 
>> <snip>
>> 
>> What might work for GPUs is defining a special REAL kind, say FASTREAL, that maps to the fastest performing components of the processor. This could default to REAL32 if the system doesn’t have a GPU or other enhanced speed sub-processor.
> 
> Right. For example, if the ‘normal’ reals are KIND = 4 and 8, the ‘fast’ ones could be 104 and 108, which would be the values of named constants in iso_fortran_env, with names like FAST_REAL32_KIND and FAST_REAL64_KIND.  The values of the named constants could be swapped by the compiler back to the ‘normal’ ones depending on whether you compile with -h gpu or -h nogpu (or whatever you want to name the command line options).  This preserves the goal of having the same source code  for either mode. 
> 
> 
> We would want to be careful with the design, though. There are other forms of “fast” memory on the horizon - HBM glued to the CPU chip, for example. What if your node has both a GPU and HBM (High Bandwidth Memory) on the non-gpu processor?  
> 
> Cheers,
> Bill
> 
> 
> Bill Long                                                                       longb at cray.com
> Principal Engineer, Fortran Technical Support &   voice:  651-605-9024
> Bioinformatics Software Development                      fax:  651-605-9143
> Cray Inc./ 2131 Lindau Lane/  Suite 1000/  Bloomington, MN  55425
> 
> 
I would say that 
1. The processor would define separate kinds, say GPU_REAL32 and HBM_REAL32, that, as given, would be non portable, but remappable by processor specific user defined modules
2. Which one mapped to FAST_REAL32 would be processor dependent, but probably controlled by the command line.




More information about the J3 mailing list