(j3.2006) Integration of co-arrays with the intrinsic shift functions

Craig Rasmussen crasmussen
Mon Jul 16 14:06:22 EDT 2007

On Jul 13, 2007, at 8:59 AM, Bill Long wrote:
> Co-arrays basically provide two facilities:  a simple and efficient  
> way to access data on a different image, and ways to enforce  
> execution order between images.  That's about it.  There is no  
> prescription about what the data objects on different images mean  
> or are part of.  If Craig wants to partition a conceptually global  
> array across images ala a data parallel programming model, that's  
> fine. If Aleks wants to think of the co-dimensions as additional  
> planes in a higher dimension array, that's also fine.  The  
> important point is that co-arrays prescribes neither view, it just  
> provides a means to implement either.  I've written code using  
> Craig's model, and it worked quite well for that problem.  Most of  
> the time I employ a third approach.  This is the underlying power  
> of co-arrays.  Because it is fundamentally low level, it is  
> flexible enough to be used for a wide range of problems and  
> programming models.
> Given the intent and design of co-arrays, I think that Craig's  
> proposed intrinsics are not a good idea. (Sorry, Craig).   They are  
> really only useful in the context of a particular usage of co- 
> arrays, namely this HPF style view of data distribution.  That sort  
> of thing is a great idea for a separate library, and these  
> functions could be pretty easily written using the existing co- 
> array capabilities.  Things like this should not be enshrined in  
> the standard.

It WAS fine to say that one can conceptually view co-array  
distributions across images in anyway one wants.  But, I have  
identified at least one place in the standard that requires us to  
define precisely how one is to view a co-array distribution.  In most  
instances an agnostic view is fine, as co-arrays is "fundamentally  
low level" ( as you say) and provide  for  programming at a "level  
close to what assembly programming was for sequential  
languages" [Diaconescu and Zima].  However, the spirit of Fortran is  
not assembly language and to imply that the co-array spec is complete  
when it breaks existing Fortran (from the 90 standard) is just plain  
wrong in my opinion.

How can we say we have finished integrating a new type (co-arrays)  
into the standard, when it won't work properly with current features?

You mention that this sort of thing is "a great idea for a separate  
library, and these functions could be pretty easily written using the  
existing co-array capabilities."  This is true we must keep two  
things in mind:
      1. CSHIFT and EOSHIFT are already intrinsic functions.
      2. For performance reasons, it is critical that these functions  
are in the language in order for the compiler to optimize the  
operations.  For example (see my code example below), the compiler  
could inline these functions in a loop body and get rid of a  
temporary array copy.  The compiler could also use two-phase  
communication to interleave communication (prefetch "halo" cells)  
with computation (compute on interior of loop).  These optimization  
would not be possible with libraries.

So that everyone knows the programming models that Bill and I are  
referring to, I've included and example of a routine that updates by  
averaging over a 3 cell stencil (local cell plus 2 1D neighbors).  It  
is fine for a programmer to use either model, but I claim the data- 
parallel model provides the following advantages:
     1. Less code (30% reduction in code size in real code, much more  
in my simple example).
     2. Less complex and error prone (again as found in converting  
real LANL codes).  Consider how long it takes you to verify that  
Bill's example is correct.
     3. The data-parallel code is easier to move to heterogeneous  
processing units like GPUs.  Microsoft has obtained speed  
improvements of up to 17 times by moving code (written in data- 
parallel) off to a GPU.  LANL new "advanced architecture" machine is  
has heterogeneous processing units and we see heterogeneous  
architectures as ubiquitous in the future.


----------------------  Data Parallel Code -----------------------

subroutine update_dp(T, Tnew)
   real :: T(:)[*], Tnew(:)[*]

   sync all
   Tnew = (co_cshift(T,-1) + T + co_cshift(T,+1))/3.
   sync all

end subroutine update_dp ! 6 lines

---------------------- standard CAF code (is there an error in  
code?)  -------------------------

subroutine update_caf(T, Tnew)
   real :: T(:)[*], Tnew(:)[*]
   integer :: i, il, ir, nmax

   nmax = co_ubound(T,1)

   if (this_image() == 1) then
     il = T(nmax)[num_images()]
     il = this_image() - 1
   end if

   if (this_image == num_images()) then
     ir = T(1)[1]
     ir = this_image() + 1
   end if

   sync all
   Tnew(1) = (T(nmax)[il] + T(1) + T(2))/3.
   Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir])/3.
   sync all

   do i = 2, nmax-1
     Tnew(i) = (T(i-1) + T(i) + T(i+1))/3.
   end do

end subroutine update ! 22 lines

More information about the J3 mailing list