(j3.2006) (SC22WG5.3862) Fwd: BOUNCE sc22wg5 at open-std.org: Non-member submission from [Jeff Squyres <jsquyres at cisco.com>]

David Muxworthy d.muxworthy
Wed Jan 21 09:20:23 EST 2009


> From jsquyres at cisco.com  Wed Jan 21 15:01:10 2009
> Return-Path: <jsquyres at cisco.com>
> X-Original-To: sc22wg5 at open-std.org
> Delivered-To: sc22wg5 at open-std.org
> X-Greylist: delayed 583 seconds by postgrey-1.18 at www2.open- 
> std.org; Wed, 21 Jan 2009 15:01:09 CET
> Received: from rtp-iport-2.cisco.com (rtp-iport-2.cisco.com  
> [64.102.122.149])
> 	by www2.open-std.org (Postfix) with ESMTP id 5EE2AC178E0
> 	for <sc22wg5 at open-std.org>; Wed, 21 Jan 2009 15:01:09 +0100 (CET)
> X-IronPort-AV: E=Sophos;i="4.37,300,1231113600";
>    d="scan'208";a="34389102"
> Received: from rtp-dkim-2.cisco.com ([64.102.121.159])
>   by rtp-iport-2.cisco.com with ESMTP; 21 Jan 2009 13:51:24 +0000
> Received: from rtp-core-2.cisco.com (rtp-core-2.cisco.com  
> [64.102.124.13])
> 	by rtp-dkim-2.cisco.com (8.12.11/8.12.11) with ESMTP id  
> n0LDpORB032660;
> 	Wed, 21 Jan 2009 08:51:24 -0500
> Received: from xbh-rtp-211.amer.cisco.com (xbh-rtp-211.cisco.com  
> [64.102.31.102])
> 	by rtp-core-2.cisco.com (8.13.8/8.13.8) with ESMTP id n0LDpOcI019335;
> 	Wed, 21 Jan 2009 13:51:24 GMT
> Received: from xfe-rtp-202.amer.cisco.com ([64.102.31.21]) by xbh- 
> rtp-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830);
> 	 Wed, 21 Jan 2009 08:51:24 -0500
> Received: from rtp-jsquyres-8711.cisco.com ([10.116.19.194]) by xfe- 
> rtp-202.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830);
> 	 Wed, 21 Jan 2009 08:51:23 -0500
> Cc: WG5 <sc22wg5 at open-std.org>
> Message-Id: <AB8FDDC3-1A39-4DCD-B0A6-D27C6EA841E9 at cisco.com>
> From: Jeff Squyres <jsquyres at cisco.com>
> To: MPI-3 Fortran working group <mpi3-fortran at lists.mpi-forum.org>
> In-Reply-To: <Prayer.1.3.1.0901211104060.5654 at hermes-2.csi.cam.ac.uk>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> Content-Transfer-Encoding: 7bit
> Mime-Version: 1.0 (Apple Message framework v930.3)
> Subject: Re: [MPI3 Fortran] MPI non-blocking transfers
> Date: Wed, 21 Jan 2009 08:51:22 -0500
> References: <Prayer.1.3.1.0901211104060.5654 at hermes-2.csi.cam.ac.uk>
> X-Mailer: Apple Mail (2.930.3)
> X-OriginalArrivalTime: 21 Jan 2009 13:51:24.0169 (UTC) FILETIME= 
> [59325F90:01C97BCF]
> DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; l=5730; t=1232545884;  
> x=1233409884;
> 	c=relaxed/simple; s=rtpdkim2001;
> 	h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version;
> 	d=cisco.com; i=jsquyres at cisco.com;
> 	z=From:=20Jeff=20Squyres=20<jsquyres at cisco.com>
> 	|Subject:=20Re=3A=20[MPI3=20Fortran]=20MPI=20non-blocking=2
> 	0transfers
> 	|Sender:=20
> 	|To:=20MPI-3=20Fortran=20working=20group=20<mpi3-fortran at li
> 	sts.mpi-forum.org>;
> 	bh=5HZ95FjN9iPhO3RvW0+AY/eKonYKSLqFlBYbeV4YLsI=;
> 	b=zwCjhrDhmo47vFe8Md9e3K4yaL9TzuyhJ4OH2c7vtPD1UL6y2A4rdxDrNz
> 	NuY7MT+3JlAx58cFCbaAvYo3wvLhLNlqmW2AAwAED/q7Zi1g8C9aCgqhconC
> 	ROx1t+JKFO;
> Authentication-Results: rtp-dkim-2; header.From=jsquyres at cisco.com;  
> dkim=pass (
> 	sig from cisco.com/rtpdkim2001 verified; );
>
> On Jan 21, 2009, at 6:04 AM, N.M. Maclaren wrote:
>
>>   1) Most people seem to agree that the semantics of the buffers used
>> for MPI non-blocking transfers and pending input/output storage
>> affectors are essentially identical, with READ, WRITE and WAIT
>> corresponding to MPI_Isend, MPI_IRecv and MPI_Wait (and variations).
>>
>> Do you agree with this and, if not, why not?
>
> I'm an MPI implementor; I don't know enough about Fortran to answer
> your questions definitively, but I can state what the MPI non-blocking
> send/receive buffer semantics are.
>
> There are several different flavors of non-blocking sends/receives in
> MPI; I'll use MPI_ISEND and MPI_IRECV as token examples ("I" =
> "immediate", meaning that the function returns "immediately",
> potentially before the message has actually been sent or received).
>
> 1. When an application invokes MPI_ISEND / MPI_IRECV, it essentially
> hands off the buffer to the MPI implementation and promises not to
> write to the buffer until later.  The MPI implementation then "owns"
> the buffer.
>
> 2. A rule is just about to be passed in MPI-2.2 such that *sends*
> (e.g., MPI_ISEND) can still *read* the buffer while the send is
> ongoing (writing to the buffer while the send is ongoing is nonsense,
> of course).
>
> 3. The buffer is specified by a triple of arguments (I'll explain in
> terms of C because of my inexperience with Fortran):
>
>    - void *buffer: a pointer representing the first base of the buffer
> (NOTE: it may not actually point to the first byte of the message!)
>    - int count: number of datatypes in the message (see the next
> argument)
>    - MPI_Datatype type: the datatype of the message, implying both the
> size and the interpretation of the bytes
>
> MPI has a number of intrinsic datatypes (such as MPI_INTEGER,
> representing a single fortran INTEGER).  The intrinsic MPI datatypes
> can be combined in several ways to represent complex data structures.
> Hence, it is possible to build up a user-defined MPI_Datatype that
> represents a C struct -- even if the struct has memory "holes" in it.
> As such, MPI_Datatypes can be considered a memory map of (relative
> offset, type) tuples, where the "relative offset" part is relative to
> the (buffer) argument in MPI_ISEND/MPI_IRECV/etc.  MPI_INTEGER could
> therefore be considered a single (0, N-byte integer) tuple (where N is
> whatever is correct for your platform).
>
> A special buffer, denoted by MPI_BOTTOM, is an arbitrarily-fixed place
> in memory (usually 0, but it doesn't have to be).  Since MPI_Datatypes
> are composed of relative offsets, applications can build datatypes
> relative to MPI_BOTTOM for [effectively] direct placement into memory.
>
> Some Fortran examples
>
>      INTEGER i
>      CALL MPI_ISEND(i, 1, MPI_INTEGER, ...)
>    Sends a single INTEGER starting at the buffer pointed to by i
>
>      INTEGER iarray(10)
>      CALL MPI_ISEND(iarray, 10, MPI_INTEGER, ...)
>    Sends 10 INTEGERs starting at the buffer pointed to by iarray
>
>      INTEGER iarray(9999)
>      CALL MPI_ISEND(iarray, 10, MPI_INTEGER, ...)
>    Same as above -- sends the first 10 INTEGERs starting at the buffer
> pointed to by iarray
>
>      INTEGER iarray(9999)
>      CALL MPI_ISEND(iarray(37), 10, MPI_INTEGER, ...)
>    Sends iarray(37) through iarray(46)
>
>      INTEGER iarray(9999)
>     C ..build up a datatype relative to MPI_BOTTOM that points to
> iarray..
>      CALL MPI_ISEND(MPI_BOTTOM, 10, my_datatype, ...)
>    Sends the first 10 elements of iarray
>
> Some C examples:
>
>      int i;
>      MPI_Isend(&i, 1, MPI_INT, ...);
>    Sends 1 int starting at the buffer pointed to by &i
>
>      int i[9999];
>      MPI_Isend(&i[37], 10, MPI_INT, ...);
>    Sends i[37] through i[46]
>
>      int i[9999];
>      /* ..build up MPI_Datatype relative to MPI_BOTTOM that points to
> &i[0].. */
>      MPI_Isend(MPI_BOTTOM, 1, my_datatype, ...);
>    Sends i[0]
>
>      struct foo { int a; double b; char c; } foo_instance;
>      /* ..build up MPI_Datatype to represent struct foo.. */
>      MPI_Isend(&foo_instance, 1, foo_datatype, ...);
>    Sends the foo struct (likely only transmitting the data, not the
> "holes")
>
> 4. A returned value from MPI_ISEND and MPI_RECV is a handle that can
> be passed to MPI later to check and see if the communication
> associated with that handle has completed.  There are essentially two
> flavors of the check-for-completion semantic: polling blocking.
>
>    - MPI_TEST accepts a single request handle and polls to see if the
> associated communication has completed, and essentially returns
> "true" (the communication has completed; the application now owns the
> buffer) or "false" (the communication has not yet completed; MPI still
> owns the buffer).
>
>    - MPI_WAIT accepts a single request handle and blocks until the
> associated communication has completed.  When MPI_WAIT returns, the
> application owns the buffer associated with the communication.
>
>    - There are array versions of MPI_TEST and MPI_WAIT as well; you
> can pass an array of requests to the array flavors of MPI_TEST (where
> some may complete and some may not) or MPI_WAIT (where all requests
> will complete before returning).
>
> 5. All Fortran MPI handles are [currently] expressed as INTEGERs.  The
> MPI implementation takes these integers and converts them to a back-
> end C pointer.  We are contemplating changing this for the upcoming
> F03 MPI bindings to avoid this translation where Fortran handles will
> likely be the same representation as C MPI handles (i.e., pointers --
> or, thought of differently, "very large address-sided integers").
>
> Hope that made sense!
>
> -- 
> Jeff Squyres
> Cisco Systems
>




More information about the J3 mailing list