(j3.2006) Integration of co-arrays with the intrinsic shift functions
Craig Rasmussen
crasmussen
Mon Jul 16 14:06:22 EDT 2007
On Jul 13, 2007, at 8:59 AM, Bill Long wrote:
>>
> Co-arrays basically provide two facilities: a simple and efficient
> way to access data on a different image, and ways to enforce
> execution order between images. That's about it. There is no
> prescription about what the data objects on different images mean
> or are part of. If Craig wants to partition a conceptually global
> array across images ala a data parallel programming model, that's
> fine. If Aleks wants to think of the co-dimensions as additional
> planes in a higher dimension array, that's also fine. The
> important point is that co-arrays prescribes neither view, it just
> provides a means to implement either. I've written code using
> Craig's model, and it worked quite well for that problem. Most of
> the time I employ a third approach. This is the underlying power
> of co-arrays. Because it is fundamentally low level, it is
> flexible enough to be used for a wide range of problems and
> programming models.
>
> Given the intent and design of co-arrays, I think that Craig's
> proposed intrinsics are not a good idea. (Sorry, Craig). They are
> really only useful in the context of a particular usage of co-
> arrays, namely this HPF style view of data distribution. That sort
> of thing is a great idea for a separate library, and these
> functions could be pretty easily written using the existing co-
> array capabilities. Things like this should not be enshrined in
> the standard.
>
It WAS fine to say that one can conceptually view co-array
distributions across images in anyway one wants. But, I have
identified at least one place in the standard that requires us to
define precisely how one is to view a co-array distribution. In most
instances an agnostic view is fine, as co-arrays is "fundamentally
low level" ( as you say) and provide for programming at a "level
close to what assembly programming was for sequential
languages" [Diaconescu and Zima]. However, the spirit of Fortran is
not assembly language and to imply that the co-array spec is complete
when it breaks existing Fortran (from the 90 standard) is just plain
wrong in my opinion.
How can we say we have finished integrating a new type (co-arrays)
into the standard, when it won't work properly with current features?
You mention that this sort of thing is "a great idea for a separate
library, and these functions could be pretty easily written using the
existing co-array capabilities." This is true we must keep two
things in mind:
1. CSHIFT and EOSHIFT are already intrinsic functions.
2. For performance reasons, it is critical that these functions
are in the language in order for the compiler to optimize the
operations. For example (see my code example below), the compiler
could inline these functions in a loop body and get rid of a
temporary array copy. The compiler could also use two-phase
communication to interleave communication (prefetch "halo" cells)
with computation (compute on interior of loop). These optimization
would not be possible with libraries.
So that everyone knows the programming models that Bill and I are
referring to, I've included and example of a routine that updates by
averaging over a 3 cell stencil (local cell plus 2 1D neighbors). It
is fine for a programmer to use either model, but I claim the data-
parallel model provides the following advantages:
1. Less code (30% reduction in code size in real code, much more
in my simple example).
2. Less complex and error prone (again as found in converting
real LANL codes). Consider how long it takes you to verify that
Bill's example is correct.
3. The data-parallel code is easier to move to heterogeneous
processing units like GPUs. Microsoft has obtained speed
improvements of up to 17 times by moving code (written in data-
parallel) off to a GPU. LANL new "advanced architecture" machine is
has heterogeneous processing units and we see heterogeneous
architectures as ubiquitous in the future.
Regards,
Craig
---------------------- Data Parallel Code -----------------------
subroutine update_dp(T, Tnew)
real :: T(:)[*], Tnew(:)[*]
sync all
Tnew = (co_cshift(T,-1) + T + co_cshift(T,+1))/3.
sync all
end subroutine update_dp ! 6 lines
---------------------- standard CAF code (is there an error in
code?) -------------------------
subroutine update_caf(T, Tnew)
real :: T(:)[*], Tnew(:)[*]
integer :: i, il, ir, nmax
nmax = co_ubound(T,1)
if (this_image() == 1) then
il = T(nmax)[num_images()]
else
il = this_image() - 1
end if
if (this_image == num_images()) then
ir = T(1)[1]
else
ir = this_image() + 1
end if
sync all
Tnew(1) = (T(nmax)[il] + T(1) + T(2))/3.
Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir])/3.
sync all
do i = 2, nmax-1
Tnew(i) = (T(i-1) + T(i) + T(i+1))/3.
end do
end subroutine update ! 22 lines
More information about the J3
mailing list