(j3.2006) BLOCK-type statement
Aleksandar Donev
donev1
Fri May 30 13:47:14 EDT 2008
On Friday 30 May 2008 08:52, Bill Long wrote:
> SYNC MEMORY already applies to non-coarrays. You basically ensure all
> the outstanding locally initiated memory operations completed.
Which does not include things external to the processor (such as an MPI thread
running somewhere).
> (Technically, you can get by with coarrays and any local variable with
> the TARGET attribute, but in practice, this will entail some sort of
> "fence" hardware instruction that will ?take care of all local memory
> operations as a side effect.)
I don't care about "practice" versus "theory". The theory should cover the
practice, and also the *future* practice. At present, SYNC MEMORY *only*
works for TARGETs and coarrays, period. Unless we explicitly say otherwise,
not in some hand-waving Note, but in clear, well-defined normative text.
> It's hard for asynchronous to apply to user-defined I/O (such as netCDF,
> or MPI IO), except for the simple implementation where asynchronous ==
> volatile.
The essential difference with VOLATILE is that ASYNCHRONOUS does not disable
optimizations *within* blocks of code in-between I/O statements. This is just
like optimizations being fully enabled within our coarray segments, with the
only difference that no memory references to coarrays and TARGETs may be
moved across segment boundaries. This is why ASYNCHRONOUS is much preferable
to VOLATILE.
What is wrong with this?
--------------------
Within a segment, objects with the ASYNCHRONOUS attribute may be modified by
means external to the processor (like VOLATILEs) in addition to Fortran
pending I/O. The object may not be referenced or defined during such
segments.
--------------------
With this model, which IMO is as simple as gets, the call to MPI_IRecv, which
initiates the asynchronous transfer, should also include a SYNC MEMORY to
start a new segment. I know that this is no really necessary since the buffer
is an argument to it, however, in other cases the async transfer may itself
be started by a call to a routine that does not take the buffer as an
argument (a pointer could have been saved earlier). Example:
REAL..., ASYNCHRONOUS :: buffer1, buffer2, ...
CALL PrepareNonBlocking(buffer1, buffer2, ...) ! Build internal pointers etc.
! This may take some time to initialize, but is done only once
! No copy in/out will happen if buffers are simply-contiguous
! and the interface has ASYNCHRONOUS on the dummies
....
buffer2=...
CALL BeginNonBlocking() ! Start async transfer
SYNC MEMORY
.... ! Cannot reference buffers within this segment
.... ! This may span across many procedure calls or even scoping units
CALL WaitNonBlocking()
SYNC MEMORY
WRITE(*,*) buffer1
This works for both MPI and other libraries. We need to think a little about
whether the SYNC MEMORY should go before or after each of the CALLs (or
both :-)
> I don't think so. ?The example below should be just fine with our
> current definitions.
Again, I disagree. I think we want to add something like this:
--------------------
Within a segment, objects with the ASYNCHRONOUS attribute may be modified by
means external to the processor or other images (like VOLATILEs) in addition
to Fortran pending I/O. The object may not be referenced or redefined during
such segments.
--------------------
Note that this covers mixed coarray/MPI programs as well, where there may be
asynchronous MPI communication going on in addition to image-traffic
initiated by the processor. This is similar to how SYNC MEMORY is already
necessary if something like MPI_Barrier is used to synchronize images (see
NOTE 8.39 on page 191).
Even better, so as to not require changing existing codes (too much) or
writing wrappers that do nothing more than add a SYNC MEMORY, I propose
adding the SYNC attribute to procedures, that will cause any CALL to them to
have the effect of a SYNC MEMORY (executed both at the start and end of the
execution), i.e., would make such CALLs image-control statements. Example:
INTERFACE
SYNC SUBROUTINE MPI_Wait() BIND(C,NAME="MPI_Wait")
END SUBROUTINE
END INTERFACE
This attribute would not be compatible with PURE, of course.
Thoughts?
Aleks
More information about the J3
mailing list