(j3.2006) BLOCK-type statement

Aleksandar Donev donev1
Fri May 30 13:47:14 EDT 2008


On Friday 30 May 2008 08:52, Bill Long wrote:
> SYNC MEMORY already applies to non-coarrays. You basically ensure all
> the outstanding locally initiated memory operations completed.
Which does not include things external to the processor (such as an MPI thread 
running somewhere).

> (Technically, you can get by with coarrays and any local variable with
> the TARGET attribute, but in practice, this will entail some sort of
> "fence" hardware instruction that will ?take care of all local memory
> operations as a side effect.)
I don't care about "practice" versus "theory". The theory should cover the 
practice, and also the *future* practice. At present, SYNC MEMORY *only* 
works for TARGETs and coarrays, period. Unless we explicitly say otherwise, 
not in some hand-waving Note, but in clear, well-defined normative text.

> It's hard for asynchronous to apply to user-defined I/O (such as netCDF,
> or MPI IO), except for the simple implementation where asynchronous ==
> volatile.
The essential difference with VOLATILE is that ASYNCHRONOUS does not disable 
optimizations *within* blocks of code in-between I/O statements. This is just 
like optimizations being fully enabled within our coarray segments, with the 
only difference that no memory references to coarrays and TARGETs may be 
moved across segment boundaries. This is why ASYNCHRONOUS is much preferable 
to VOLATILE.

What is wrong with this?
--------------------
Within a segment, objects with the ASYNCHRONOUS attribute may be modified by 
means external to the processor (like VOLATILEs) in addition to Fortran 
pending I/O. The object may not be referenced or defined during such 
segments.
--------------------

With this model, which IMO is as simple as gets, the call to MPI_IRecv, which 
initiates the asynchronous transfer, should also include a SYNC MEMORY to 
start a new segment. I know that this is no really necessary since the buffer 
is an argument to it, however, in other cases the async transfer may itself 
be started by a call to a routine that does not take the buffer as an 
argument (a pointer could have been saved earlier). Example:

REAL..., ASYNCHRONOUS :: buffer1, buffer2, ...

CALL PrepareNonBlocking(buffer1, buffer2, ...) ! Build internal pointers etc.
	! This may take some time to initialize, but is done only once
	! No copy in/out will happen if buffers are simply-contiguous
	! and the interface has ASYNCHRONOUS on the dummies
....
buffer2=...
CALL BeginNonBlocking() ! Start async transfer
SYNC MEMORY
.... ! Cannot reference buffers within this segment
.... ! This may span across many procedure calls or even scoping units
CALL WaitNonBlocking()
SYNC MEMORY
WRITE(*,*) buffer1

This works for both MPI and other libraries. We need to think a little about 
whether the SYNC MEMORY should go before or after each of the CALLs (or 
both :-)

> I don't think so. ?The example below should be just fine with our
> current definitions.
Again, I disagree. I think we want to add something like this:
--------------------
Within a segment, objects with the ASYNCHRONOUS attribute may be modified by 
means external to the processor or other images (like VOLATILEs) in addition 
to Fortran pending I/O. The object may not be referenced or redefined during 
such segments.
--------------------
Note that this covers mixed coarray/MPI programs as well, where there may be 
asynchronous MPI communication going on in addition to image-traffic 
initiated by the processor. This is similar to how SYNC MEMORY is already 
necessary if something like MPI_Barrier is used to synchronize images (see 
NOTE 8.39 on page 191).

Even better, so as to not require changing existing codes (too much) or 
writing wrappers that do nothing more than add a SYNC MEMORY, I propose 
adding the SYNC attribute to procedures, that will cause any CALL  to them to 
have the effect of a SYNC MEMORY (executed both at the start and end of the 
execution), i.e., would make such CALLs image-control statements. Example:

INTERFACE
	SYNC SUBROUTINE MPI_Wait() BIND(C,NAME="MPI_Wait")
	END SUBROUTINE
END INTERFACE	

This attribute would not be compatible with PURE, of course.

Thoughts?
Aleks




More information about the J3 mailing list