(j3.2006) Integration of co-arrays with the intrinsic shift functions
Jim Xia
jimxia
Mon Jul 16 15:42:49 EDT 2007
This piece of code seems odd.
subroutine update_caf(T, Tnew)
real :: T(:)[*], Tnew(:)[*]
integer :: i, il, ir, nmax
nmax = co_ubound(T,1) !<-- shouldn't nmax be ubound(T,1)
?
if (this_image() == 1) then
il = T(nmax)[num_images()] !<-- shouldn't il = num_images()?
else
il = this_image() - 1
end if
if (this_image == num_images()) then
ir = T(1)[1] !<-- ir should be 1
else
ir = this_image() + 1
end if
sync all
Tnew(1) = (T(nmax)[il] + T(1) + T(2))/3.
Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir])/3.
sync all
do i = 2, nmax-1
Tnew(i) = (T(i-1) + T(i) + T(i+1))/3.
end do
end subroutine update ! 22 lines
Although this code seems to approve Craig's point, I still agree with Bill
that the language should be neutral with regard to memory models
underneath the co-array. It should not explicitly adopt one model versus
another, Allowing intrinisics like co_cshift may appear to someone that
the language implicitly state what memory model it is supporting.
Cheers,
Jim Xia
XL Fortran Compiler Testing
IBM Toronto Lab at 8200 Warden Ave.
Phone (905) 413-3444 Tie-line 969-3444
D2/NAH/8200 /MKM
Craig Rasmussen <crasmussen at lanl.gov>
Sent by: j3-bounces at j3-fortran.org
07/16/2007 02:06 PM
Please respond to
fortran standards email list for J3 <j3 at j3-fortran.org>
To
fortran standards email list for J3 <j3 at j3-fortran.org>
cc
Subject
Re: (j3.2006) Integration of co-arrays with the intrinsic shift
functions
On Jul 13, 2007, at 8:59 AM, Bill Long wrote:
>>
> Co-arrays basically provide two facilities: a simple and efficient
> way to access data on a different image, and ways to enforce
> execution order between images. That's about it. There is no
> prescription about what the data objects on different images mean
> or are part of. If Craig wants to partition a conceptually global
> array across images ala a data parallel programming model, that's
> fine. If Aleks wants to think of the co-dimensions as additional
> planes in a higher dimension array, that's also fine. The
> important point is that co-arrays prescribes neither view, it just
> provides a means to implement either. I've written code using
> Craig's model, and it worked quite well for that problem. Most of
> the time I employ a third approach. This is the underlying power
> of co-arrays. Because it is fundamentally low level, it is
> flexible enough to be used for a wide range of problems and
> programming models.
>
> Given the intent and design of co-arrays, I think that Craig's
> proposed intrinsics are not a good idea. (Sorry, Craig). They are
> really only useful in the context of a particular usage of co-
> arrays, namely this HPF style view of data distribution. That sort
> of thing is a great idea for a separate library, and these
> functions could be pretty easily written using the existing co-
> array capabilities. Things like this should not be enshrined in
> the standard.
>
It WAS fine to say that one can conceptually view co-array
distributions across images in anyway one wants. But, I have
identified at least one place in the standard that requires us to
define precisely how one is to view a co-array distribution. In most
instances an agnostic view is fine, as co-arrays is "fundamentally
low level" ( as you say) and provide for programming at a "level
close to what assembly programming was for sequential
languages" [Diaconescu and Zima]. However, the spirit of Fortran is
not assembly language and to imply that the co-array spec is complete
when it breaks existing Fortran (from the 90 standard) is just plain
wrong in my opinion.
How can we say we have finished integrating a new type (co-arrays)
into the standard, when it won't work properly with current features?
You mention that this sort of thing is "a great idea for a separate
library, and these functions could be pretty easily written using the
existing co-array capabilities." This is true we must keep two
things in mind:
1. CSHIFT and EOSHIFT are already intrinsic functions.
2. For performance reasons, it is critical that these functions
are in the language in order for the compiler to optimize the
operations. For example (see my code example below), the compiler
could inline these functions in a loop body and get rid of a
temporary array copy. The compiler could also use two-phase
communication to interleave communication (prefetch "halo" cells)
with computation (compute on interior of loop). These optimization
would not be possible with libraries.
So that everyone knows the programming models that Bill and I are
referring to, I've included and example of a routine that updates by
averaging over a 3 cell stencil (local cell plus 2 1D neighbors). It
is fine for a programmer to use either model, but I claim the data-
parallel model provides the following advantages:
1. Less code (30% reduction in code size in real code, much more
in my simple example).
2. Less complex and error prone (again as found in converting
real LANL codes). Consider how long it takes you to verify that
Bill's example is correct.
3. The data-parallel code is easier to move to heterogeneous
processing units like GPUs. Microsoft has obtained speed
improvements of up to 17 times by moving code (written in data-
parallel) off to a GPU. LANL new "advanced architecture" machine is
has heterogeneous processing units and we see heterogeneous
architectures as ubiquitous in the future.
Regards,
Craig
---------------------- Data Parallel Code -----------------------
subroutine update_dp(T, Tnew)
real :: T(:)[*], Tnew(:)[*]
sync all
Tnew = (co_cshift(T,-1) + T + co_cshift(T,+1))/3.
sync all
end subroutine update_dp ! 6 lines
---------------------- standard CAF code (is there an error in
code?) -------------------------
subroutine update_caf(T, Tnew)
real :: T(:)[*], Tnew(:)[*]
integer :: i, il, ir, nmax
nmax = co_ubound(T,1)
if (this_image() == 1) then
il = T(nmax)[num_images()]
else
il = this_image() - 1
end if
if (this_image == num_images()) then
ir = T(1)[1]
else
ir = this_image() + 1
end if
sync all
Tnew(1) = (T(nmax)[il] + T(1) + T(2))/3.
Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir])/3.
sync all
do i = 2, nmax-1
Tnew(i) = (T(i-1) + T(i) + T(i+1))/3.
end do
end subroutine update ! 22 lines
_______________________________________________
J3 mailing list
J3 at j3-fortran.org
http://j3-fortran.org/mailman/listinfo/j3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://j3-fortran.org/pipermail/j3/attachments/20070716/e4bc4d52/attachment.html
More information about the J3
mailing list