(j3.2006) Integration of co-arrays with the intrinsic shift functions
Bill Long
longb
Tue Jul 17 18:48:32 EDT 2007
Craig Rasmussen wrote:
>
> It WAS fine to say that one can conceptually view co-array
> distributions across images in anyway one wants. But, I have
> identified at least one place in the standard that requires us to
> define precisely how one is to view a co-array distribution.
Where is that place? We should fix it.
> In most
> instances an agnostic view is fine, as co-arrays is "fundamentally
> low level"
as are most things in Fortran.
> ( as you say) and provide for programming at a "level
> close to what assembly programming was for sequential
> languages" [Diaconescu and Zima].
I don't think anyone who understood co-arrays would make such a claim.
> However, the spirit of Fortran is
> not assembly language and to imply that the co-array spec is complete
> when it breaks existing Fortran (from the 90 standard) is just plain
> wrong in my opinion.
>
Please explain what aspect of Fortran is broken by co-arrays.
> How can we say we have finished integrating a new type (co-arrays)
> into the standard, when it won't work properly with current features?
>
I still don't see which feature of f03 that previously worked is now broken.
> You mention that this sort of thing is "a great idea for a separate
> library, and these functions could be pretty easily written using the
> existing co-array capabilities." This is true we must keep two
> things in mind:
> 1. CSHIFT and EOSHIFT are already intrinsic functions.
>
There are about 100 pages of intrinsics. None are broken by co-arrays
that I can see.
> 2. For performance reasons, it is critical that these functions
> are in the language in order for the compiler to optimize the
> operations. For example (see my code example below), the compiler
> could inline these functions in a loop body and get rid of a
> temporary array copy. The compiler could also use two-phase
> communication to interleave communication (prefetch "halo" cells)
> with computation (compute on interior of loop). These optimization
> would not be possible with libraries.
>
Really? The compiler can inline library routines if they are available,
preferably in a module. In that case, the optimizations you are
suggesting do not look all that difficult.
> So that everyone knows the programming models that Bill and I are
> referring to, I've included and example of a routine that updates by
> averaging over a 3 cell stencil (local cell plus 2 1D neighbors). It
> is fine for a programmer to use either model, but I claim the data-
> parallel model provides the following advantages:
> 1. Less code (30% reduction in code size in real code, much more
> in my simple example).
> 2. Less complex and error prone (again as found in converting
> real LANL codes). Consider how long it takes you to verify that
> Bill's example is correct.
> 3. The data-parallel code is easier to move to heterogeneous
> processing units like GPUs.
For the (small number of) examples where data parallel model programming
actually is relevant to the problem, this may be true. In the real
world, this is rarely easy.
> Microsoft has obtained speed
> improvements of up to 17 times by moving code (written in data-
> parallel) off to a GPU.
Which data parallel programming language were they using? Was this a
real application, or a benchmarking stunt?
> LANL new "advanced architecture" machine is
> has heterogeneous processing units and we see heterogeneous
> architectures as ubiquitous in the future.
>
Quite possible. Programming (in any model) will be a lot easier if each
image has the same set of heterogeneous computation elements. If the
complexity is confined to an image, then cross-image operations can
remain simple.
> Regards,
> Craig
>
> ---------------------- Data Parallel Code -----------------------
>
> subroutine update_dp(T, Tnew)
> real :: T(:)[*], Tnew(:)[*]
>
> sync all
> Tnew = (co_cshift(T,-1) + T + co_cshift(T,+1))/3.
> sync all
>
> end subroutine update_dp ! 6 lines
>
> ---------------------- standard CAF code (is there an error in
> code?) -------------------------
>
> subroutine update_caf(T, Tnew)
> real :: T(:)[*], Tnew(:)[*]
> integer :: i, il, ir, nmax
>
I tend to work with predefined values in a module named mype and npes
for this_image() and num_images(), so the next few lines would look like
nmax = ubound(T)
il = merge(npes, mype-1, mype==1)
ir = merge(1, mype+1, mype==npes)
> nmax = co_ubound(T,1)
>
> if (this_image() == 1) then
> il = T(nmax)[num_images()]
> else
> il = this_image() - 1
> end if
>
> if (this_image == num_images()) then
> ir = T(1)[1]
> else
> ir = this_image() + 1
> end if
>
I would probably have the syncs external, but if kept internal, they
should surround all the work below.
> sync all
> Tnew(1) = (T(nmax)[il] + T(1) + T(2))/3.
> Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir])/3.
> sync all
>
Using a loop here, and then array assignment in the other example is
cheating!
sync all
Tnew(1) = (T(nmax)[il] + T(1) + T(2) )/3.
Tnew(2:nmax-1) = (T(1:nmax-2) + T(2:nmax-1) + T(3:nmax))/3.
Tnew(nmax) = (T(nmax-1) + T(nmax) + T(1)[ir] )/3.
sync all
> do i = 2, nmax-1
> Tnew(i) = (T(i-1) + T(i) + T(i+1))/3.
> end do
>
> end subroutine update ! 22 lines
>
OK, 12 lines. If this were not in a module where mype and npes were
defined, then maybe 13. But all of this would be pretty easy to inline
and optimize by the compiler, especially if the sync statements were
removed.
Cheers,
Bill
>
>
>
>
> _______________________________________________
> J3 mailing list
> J3 at j3-fortran.org
> http://j3-fortran.org/mailman/listinfo/j3
>
--
Bill Long longb at cray.com
Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://j3-fortran.org/pipermail/j3/attachments/20070717/784b7a9d/attachment.html
More information about the J3
mailing list