(j3.2006) Integration of co-arrays with the intrinsic shift functions

Bill Long longb
Tue Jul 17 18:48:32 EDT 2007

Craig Rasmussen wrote:
> It WAS fine to say that one can conceptually view co-array  
> distributions across images in anyway one wants.  But, I have  
> identified at least one place in the standard that requires us to  
> define precisely how one is to view a co-array distribution.  

Where is that place?  We should fix it.

> In most  
> instances an agnostic view is fine, as co-arrays is "fundamentally  
> low level" 

as are most things in Fortran.

> ( as you say) and provide  for  programming at a "level  
> close to what assembly programming was for sequential  
> languages" [Diaconescu and Zima]. 

I don't think anyone who understood co-arrays would make such a claim. 

>  However, the spirit of Fortran is  
> not assembly language and to imply that the co-array spec is complete  
> when it breaks existing Fortran (from the 90 standard) is just plain  
> wrong in my opinion.

Please explain what aspect of Fortran is broken by co-arrays. 

> How can we say we have finished integrating a new type (co-arrays)  
> into the standard, when it won't work properly with current features?

I still don't see which feature of f03 that previously worked is now broken.

> You mention that this sort of thing is "a great idea for a separate  
> library, and these functions could be pretty easily written using the  
> existing co-array capabilities."  This is true we must keep two  
> things in mind:
>       1. CSHIFT and EOSHIFT are already intrinsic functions.

There are about 100 pages of intrinsics.  None are broken by co-arrays 
that I can see.

>       2. For performance reasons, it is critical that these functions  
> are in the language in order for the compiler to optimize the  
> operations.  For example (see my code example below), the compiler  
> could inline these functions in a loop body and get rid of a  
> temporary array copy.  The compiler could also use two-phase  
> communication to interleave communication (prefetch "halo" cells)  
> with computation (compute on interior of loop).  These optimization  
> would not be possible with libraries.

Really? The compiler can inline library routines if they are available, 
preferably in a module.  In that case, the optimizations you are 
suggesting do not look all that difficult.

> So that everyone knows the programming models that Bill and I are  
> referring to, I've included and example of a routine that updates by  
> averaging over a 3 cell stencil (local cell plus 2 1D neighbors).  It  
> is fine for a programmer to use either model, but I claim the data- 
> parallel model provides the following advantages:
>      1. Less code (30% reduction in code size in real code, much more  
> in my simple example).
>      2. Less complex and error prone (again as found in converting  
> real LANL codes).  Consider how long it takes you to verify that  
> Bill's example is correct.
>      3. The data-parallel code is easier to move to heterogeneous  
> processing units like GPUs. 

For the (small number of) examples where data parallel model programming 
actually is relevant to the problem, this may be true.  In the real 
world, this is rarely easy.

>  Microsoft has obtained speed  
> improvements of up to 17 times by moving code (written in data- 
> parallel) off to a GPU. 

Which data parallel programming language were they using?   Was this a 
real application, or a benchmarking stunt?

>  LANL new "advanced architecture" machine is  
> has heterogeneous processing units and we see heterogeneous  
> architectures as ubiquitous in the future.

Quite possible.  Programming (in any model) will be a lot easier if each 
image has the same set of heterogeneous computation elements.  If the 
complexity is confined to an image, then cross-image operations can 
remain simple.

> Regards,
> Craig
> ----------------------  Data Parallel Code -----------------------
> subroutine update_dp(T, Tnew)
>    real :: T(:)[*], Tnew(:)[*]
>    sync all
>    Tnew = (co_cshift(T,-1) + T + co_cshift(T,+1))/3.
>    sync all
> end subroutine update_dp ! 6 lines
> ---------------------- standard CAF code (is there an error in  
> code?)  -------------------------
> subroutine update_caf(T, Tnew)
>    real :: T(:)[*], Tnew(:)[*]
>    integer :: i, il, ir, nmax

I tend to work with predefined values in a module named mype and npes 
for this_image() and num_images(), so the next few lines would look like

          nmax = ubound(T)
          il = merge(npes, mype-1, mype==1)
          ir = merge(1,    mype+1, mype==npes)

>    nmax = co_ubound(T,1)
>    if (this_image() == 1) then
>      il = T(nmax)[num_images()]
>    else
>      il = this_image() - 1
>    end if
>    if (this_image == num_images()) then
>      ir = T(1)[1]
>    else
>      ir = this_image() + 1
>    end if

I would probably have the syncs external, but if kept internal, they 
should surround all the work below.
>    sync all
>    Tnew(1) = (T(nmax)[il] + T(1) + T(2))/3.
>    Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir])/3.
>    sync all

Using a loop here, and then array assignment in the other example is 
           sync all
           Tnew(1)        = (T(nmax)[il] + T(1)        + T(2)     )/3.
           Tnew(2:nmax-1) = (T(1:nmax-2) + T(2:nmax-1) + T(3:nmax))/3.
           Tnew(nmax)     = (T(nmax-1)   + T(nmax)     + T(1)[ir] )/3.
           sync all

>    do i = 2, nmax-1
>      Tnew(i) = (T(i-1) + T(i) + T(i+1))/3.
>    end do
> end subroutine update ! 22 lines

OK, 12 lines.  If this were not in a module where mype and npes were 
defined, then maybe 13.  But all of this would be pretty easy to inline 
and optimize by the compiler, especially if the sync statements were 


> _______________________________________________
> J3 mailing list
> J3 at j3-fortran.org
> http://j3-fortran.org/mailman/listinfo/j3

Bill Long                                   longb at cray.com
Fortran Technical Support    &              voice: 651-605-9024
Bioinformatics Software Development         fax:   651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://j3-fortran.org/pipermail/j3/attachments/20070717/784b7a9d/attachment.html 

More information about the J3 mailing list