(j3.2006) (SC22WG5.3867) Fwd: BOUNCE sc22wg5 at open-std.org: Non-member submission from [Jeff Squyres <jsquyres at cisco.com>]

Van Snyder Van.Snyder
Wed Jan 21 14:33:53 EST 2009


The complication that Jeff explains below duplicates a small part of the
functionality of Fortran I/O statements.  Let's not duplicate it, too.
MPI is I/O.  We should get at it with I/O, not additional complications.

> >>   1) Most people seem to agree that the semantics of the buffers used
> >> for MPI non-blocking transfers and pending input/output storage
> >> affectors are essentially identical, with READ, WRITE and WAIT
> >> corresponding to MPI_Isend, MPI_IRecv and MPI_Wait (and variations).
> >>
> >> Do you agree with this and, if not, why not?
> >
> > I'm an MPI implementor; I don't know enough about Fortran to answer
> > your questions definitively, but I can state what the MPI non-blocking
> > send/receive buffer semantics are.
> >
> > There are several different flavors of non-blocking sends/receives in
> > MPI; I'll use MPI_ISEND and MPI_IRECV as token examples ("I" =
> > "immediate", meaning that the function returns "immediately",
> > potentially before the message has actually been sent or received).
> >
> > 1. When an application invokes MPI_ISEND / MPI_IRECV, it essentially
> > hands off the buffer to the MPI implementation and promises not to
> > write to the buffer until later.  The MPI implementation then "owns"
> > the buffer.
> >
> > 2. A rule is just about to be passed in MPI-2.2 such that *sends*
> > (e.g., MPI_ISEND) can still *read* the buffer while the send is
> > ongoing (writing to the buffer while the send is ongoing is nonsense,
> > of course).
> >
> > 3. The buffer is specified by a triple of arguments (I'll explain in
> > terms of C because of my inexperience with Fortran):
> >
> >    - void *buffer: a pointer representing the first base of the buffer
> > (NOTE: it may not actually point to the first byte of the message!)
> >    - int count: number of datatypes in the message (see the next
> > argument)
> >    - MPI_Datatype type: the datatype of the message, implying both the
> > size and the interpretation of the bytes
> >
> > MPI has a number of intrinsic datatypes (such as MPI_INTEGER,
> > representing a single fortran INTEGER).  The intrinsic MPI datatypes
> > can be combined in several ways to represent complex data structures.
> > Hence, it is possible to build up a user-defined MPI_Datatype that
> > represents a C struct -- even if the struct has memory "holes" in it.
> > As such, MPI_Datatypes can be considered a memory map of (relative
> > offset, type) tuples, where the "relative offset" part is relative to
> > the (buffer) argument in MPI_ISEND/MPI_IRECV/etc.  MPI_INTEGER could
> > therefore be considered a single (0, N-byte integer) tuple (where N is
> > whatever is correct for your platform).
> >
> > A special buffer, denoted by MPI_BOTTOM, is an arbitrarily-fixed place
> > in memory (usually 0, but it doesn't have to be).  Since MPI_Datatypes
> > are composed of relative offsets, applications can build datatypes
> > relative to MPI_BOTTOM for [effectively] direct placement into memory.
> >
> > Some Fortran examples
> >
> >      INTEGER i
> >      CALL MPI_ISEND(i, 1, MPI_INTEGER, ...)
> >    Sends a single INTEGER starting at the buffer pointed to by i
> >
> >      INTEGER iarray(10)
> >      CALL MPI_ISEND(iarray, 10, MPI_INTEGER, ...)
> >    Sends 10 INTEGERs starting at the buffer pointed to by iarray
> >
> >      INTEGER iarray(9999)
> >      CALL MPI_ISEND(iarray, 10, MPI_INTEGER, ...)
> >    Same as above -- sends the first 10 INTEGERs starting at the buffer
> > pointed to by iarray
> >
> >      INTEGER iarray(9999)
> >      CALL MPI_ISEND(iarray(37), 10, MPI_INTEGER, ...)
> >    Sends iarray(37) through iarray(46)
> >
> >      INTEGER iarray(9999)
> >     C ..build up a datatype relative to MPI_BOTTOM that points to
> > iarray..
> >      CALL MPI_ISEND(MPI_BOTTOM, 10, my_datatype, ...)
> >    Sends the first 10 elements of iarray
> >
> > Some C examples:
> >
> >      int i;
> >      MPI_Isend(&i, 1, MPI_INT, ...);
> >    Sends 1 int starting at the buffer pointed to by &i
> >
> >      int i[9999];
> >      MPI_Isend(&i[37], 10, MPI_INT, ...);
> >    Sends i[37] through i[46]
> >
> >      int i[9999];
> >      /* ..build up MPI_Datatype relative to MPI_BOTTOM that points to
> > &i[0].. */
> >      MPI_Isend(MPI_BOTTOM, 1, my_datatype, ...);
> >    Sends i[0]
> >
> >      struct foo { int a; double b; char c; } foo_instance;
> >      /* ..build up MPI_Datatype to represent struct foo.. */
> >      MPI_Isend(&foo_instance, 1, foo_datatype, ...);
> >    Sends the foo struct (likely only transmitting the data, not the
> > "holes")
> >
> > 4. A returned value from MPI_ISEND and MPI_RECV is a handle that can
> > be passed to MPI later to check and see if the communication
> > associated with that handle has completed.  There are essentially two
> > flavors of the check-for-completion semantic: polling blocking.
> >
> >    - MPI_TEST accepts a single request handle and polls to see if the
> > associated communication has completed, and essentially returns
> > "true" (the communication has completed; the application now owns the
> > buffer) or "false" (the communication has not yet completed; MPI still
> > owns the buffer).
> >
> >    - MPI_WAIT accepts a single request handle and blocks until the
> > associated communication has completed.  When MPI_WAIT returns, the
> > application owns the buffer associated with the communication.
> >
> >    - There are array versions of MPI_TEST and MPI_WAIT as well; you
> > can pass an array of requests to the array flavors of MPI_TEST (where
> > some may complete and some may not) or MPI_WAIT (where all requests
> > will complete before returning).
> >
> > 5. All Fortran MPI handles are [currently] expressed as INTEGERs.  The
> > MPI implementation takes these integers and converts them to a back-
> > end C pointer.  We are contemplating changing this for the upcoming
> > F03 MPI bindings to avoid this translation where Fortran handles will
> > likely be the same representation as C MPI handles (i.e., pointers --
> > or, thought of differently, "very large address-sided integers").
> >
> > Hope that made sense!
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >




More information about the J3 mailing list