(j3.2006) an alternative to Aleks' asynchronous proposal

Dick Hendrickson dick.hendrickson
Thu Jun 19 17:37:45 EDT 2008

I have a proposal that I believe will solve the
problem of optimizers moving code across MPI_wait
calls.  I'm offering it as an alternative to
Aleks' paper  It's an expansion of his idea of
a sync procedure attribute.  I think it's more
than syntactic sugar.

Unfortunately, I won?t be at the August meeting.
So, I will rough out the proposal.  If anyone is
interested, they can turn it in to a full fledged
paper with page numbers, edits, etc.  If no one
is interested enough to do that, then it's unlikely
the proposal would pass at the August even if I
did it.

Because this is a new idea, it might be to
late to get in anyhow.

However, Aleks seem to be trying to shoehorn
MPI into some sort of asynchronous I/O and
coarray syncmemory model.  I believe that is
a mistake.  Processors are free (even encouraged?)
to treat the ASYNCHRONOUS I/O stuff essentially
as no-ops.  Trying to use an "optional" feature
to enforce good compiler behavior isn't likely
to be portable, nor to be easily understood.
It's a simple syntax thing to add, with a few
constraints.  The meat of the proposal is a few
notes to implementers telling them to do the
right thing and a couple of notes to users
telling them not to do the wrong thing.  Neither
require much care nor time to develop.

I'm proposing a new attribute for subroutines.
I can't think of a good name, so I'll use PERSISTENT.
The basic idea is that a persistent subroutine
maintains access to it's actual arguments, and
to all of the actual arguments of any other
persistent subroutines, after it has returned.
This allows things like
        call MPI_do_something(buffer,size,status,id)
        call MPI_wait(id)
to do the right thing.  It's the users responsibility
to not misuse the variables between the calls,
essentially the same restrictions are for asynchronous
I/O.  But, the new thing is that it is the compiler's
responsibility to not move code around the calls nor
to use copy-in/copy-out as an argument passing mechanism
for PERSISTENT routines.

As I see it, the PERSISTENT attribute can be declared
in a module and the user will not have to do any code
modifications to his existing calls.  Indeed, users
need not even be aware of the attribute nor what it
does.  (They'll have to correctly use their variables,
but that's been a requirement since 1966).

Here's the outline of the proposal.  I've been a
little heavy-handed on the constraints.  It's
easier to relax an unneeded constraint later on.

Chapter 2.1, add an entry
Persistent subroutine.  A subroutine which maintains
association with its arguments after it has returned.
This allows it to perform actions similar to asynchronous
I/O or to treat it's arguments as if they were volatile.

Add a new statement in chapter 5, but not a declaration
PERSISTENT subroutine name list
this allows an implicit interface and effectively
disallows call by dope vector.  (We could do with out
this, but a nice side effect is that it allows users
to do timing and debugging by declaring some routine
to be persistent.)

Add PERSISTENT to the prefix list in R1226
with the constraints:

PERSISTENT can only be used with subroutines.  (lets not
try to figure out whether or not functions get executed
and in which order now).

if PERSISTENT appears, the subroutine requires an
explicit interface and all array arguments must be
explicit shape (otherwise the compiler might be
tempted to try pass by dope vector or by
copy in/copy out.  We don't want to allow dope
vectors because they are usually built on the
stack and might go away while the persistent
subroutine is persisting.)

If PERSISTENT appears, none of the other prefixes
can appear.

If one entity in a generic interface is PERSISTENT,
they all must be.

A PERSISTENT subroutine can't be a finalizer nor
can it be type bound.

Add a note:
NOTE.  The previous constraints are designed to
require a compiler to use what is commonly called
pass-by-address as the calling scheme.  Dope
vectors or copy in/copy out are not allowed.

Add a section similar to 12.7
12.8 PERSISTENT subroutines
A persistent subroutine is a subroutine which
maintains association with its actual arguments,
and possibly the actual arguments of other
persistent subroutines, after it has returned.
The method of maintaining the association is
not specified by this standard.  If an argument
is a pointer, the persistence applies to both the
pointer and its target.  The persistent subroutine
is allowed to access, define, reference, undefined,
or anything else to arguments that it maintains
(or acquires from another persistent subroutine)
an association to.

The mechanism is designed to allow calling routines
that work asynchronously with the program.   From the
user's point of view, it is similar to asynchronous
I/O.  The expected use is that a worker subroutine
will be called to perform some task, generally on
another processor or in another thread, and sometime
later a wait or status checking routine will be called.
It is entirely the programmers responsibility to
correctly use variables passed as arguments to
persistent routines.  Generally, if a variable is
used to "initialize" the persistent routine it can
be referenced or defined after the call.  If a
variable is an "input" buffer, it can be referenced
but not defined.  If a variable is an "output" buffer,
it can be neither referenced nor defined.  These user
requirements are similar to the one for asynchronous
I/O in 9.6.4.  How the user determines when a persistent
routine is done with its arguments is not specified
by thiss standard

None of the arguments to a persistent subroutine shall
be an array section.  All whole array arguments must either
be explicit shape arrays or have the CONTIGUOUS attribute.
NOTE: this is intended to prevent either pass by dope
vector or pass by copy in/copy out.
The compiler must use pass by address to allow the
persistence to be maintained.
NOTE: It's OK pass a pointer to a discontiguous array
since the pointer itself will be passed.

Although the standard does not specifically define
"optimization", it is widely practiced by processors.
A call to a persistent subroutine prevents processors
from applying some common optimizations across the call.
In particular, the compiler is not allowed to do things
such as code motion, common subexpression analysis, or
maintaining values in registers, etc. across a call.
It is as if every accessible variable appeared in a
call statement to a separately compiled external
subroutine with implicit interface and no intents
known.  The compiler must store all variables to
memory before the call and reload any needed ones
after the call.  From the compilers point of view,
it is as if every accessible variable were declared
"locally volatile" at the call to a persistent routine.
The restriction is to all accessible variables because,
just as for asynchronous I/O, there can be multiple
persistent subroutines in "execution", some calls
might have occurred in different scoping units, and
calling one subroutine could interact with another,
possibly triggering an implicit synchronization.

More information about the J3 mailing list