(j3.2006) (SC22WG5.5028) WG5 vote on draft TS on further coarray features

John Reid John.Reid
Wed Jul 10 05:07:12 EDT 2013


Bill Long has been very efficient in incorporating all the changes 
agreed in Delft to the draft TS. Here is a letter ballot on the new 
draft. I am sorry not to have got this out earlier - I have been 
overwhelmed by other activities since getting back home. Here are:

N1981 Requirements for further coarray features (Long)
      - supersedes N1930
N1983 Draft TS 18508 Additional Parallel Features in Fortran (Long)
       - supersedes N1967
N1986 WG5 letter ballot on N1983 (Reid)

This ballot ends 9 a.m. (UK time) on 12 August.

WG5 committed itself to a ballot on interpretations in July. Because we 
are behind schedule with the TS and I did not want to hold two ballots 
with the same end date, I will not start the interps ballot for 2 weeks.

Best wishes,


-------------- next part --------------
                                         ISO/IEC JTC1/SC22/WG5 N1981

             Requirements for additional parallel features in Fortran

                        John Reid,  28-Jun-2013

A Technical Specification, "Additional Parallel Features in Fortran", is

1. Overall size

S1. The complexity of the TS should be comparable with that of
document N1858, from the point of view of both implementation and
edits to the standard. This is the essence of Resolution G9 of the
Garching meeting (see N1861).  

This set of requirements specifies a TEAM facility different from the
one in N1858, an EVENT facility as an alternative to the NOTIFY/QUERY
facility in N1858, and a simpler set of collective subroutines. It
adds new intrinsic procedures for atomic memory operations, but omits
the parallel I/O facilities in N1858. On balance, the requirement S1
is satisfied.

2. Teams

Teams provide a capability to restrict the image set of remote memory
references, coarray allocations, and synchronizations to a subset of
all the images of the program. This simplifies writing programs that
involve segregated activities (parts of a climate model, for example)
that might be more easily be written independently or may have already
been written as independent programs. Teams provide a minimal portable 
mechanism that can be used to enable continued execution in the 
presence of failed images. Teams also provide a mechanism
for subdividing the computation for the sake of better performance
(such as within local SMP domains). Finally, teams provide the
capability to execute procedures (such as library procedures) that use
coarrays internally on a subset of the images of a program. 

T1: When a block of code is executed on images executing as a team,
    1. Image indices shall be relative to the team, starting at 1 and
       ending with the number of images in the team.  

    2. Collective activities that would involve all images, such as
       SYNC ALL, allocation and deallocation of coarrays, collective
       subroutine execution, and inquiry intrinsics such as THIS_IMAGE
       and NUM_IMAGES shall be relative to the team.       
T2: At any one time, an image executes as a member of one and only 
    one current team. Access to variables on images of an ancestor team 
    is permitted through syntax that refers to that team. 

T3: It should be possible to split a team into mutually exclusive
    subsets that are themselves teams. This should be dynamic in order
    to allow different groupings of images during different stages of
    execution. It is desirable to have a compact mechanism for an
    image to specify which team it wishes to belong.

T4: There shall be a construct mechanism for changing the current
    team, involving the synchronization of all members of the new team
    at the beginning and end of the construct. The construct shall
    support separate execution blocks based on team membership. The
    construct shall make apparent (both to the system and the
    programmer) where team execution begins and ends.  
T5: There shall be a type for variables identifying a team collection
    (probably an opaque derived type defined in the intrinsic module

T6: There needs to be a mechanism to find the image index relative to
    the set of an ancestor team. This might best be done by adding an
    optional argument to IMAGE_INDEX that specifies the ancestor team.

T7: An allocatable coarray that is allocated within a team construct
    shall be deallocated before execution of the team construct
    terminates.  An coarray that was allocated in a parent team shall
    not be deallocated within an child team construct.

T8: The restriction that standard input is attached only to image 1 is
    unchanged, and the designated image is image 1 of the original set
    of images present at program startup.

3. Collectives

A collective subroutine is an intrinsic subroutine that is executed by
a set of images. It performs a computation based on values on the
images of the set. Collective subroutines offer the possibility of
substantially more efficient execution of reduction operations than
would be possible by non-expert programmers. Corresponding routines
are widely used in MPI programs.

C1: A call to a collective subroutine is not an image control
    statement. However, such a call shall appear only in a context
    that allows an image control statement.  Even though calls to
    collective subroutines involve internal synchronization required
    by the usual rules for reference and definition of subroutine
    arguments, they do not facilitate ordering of segments.
C2: If a collective subroutine is invoked on one image, it shall be
    invoked by the same statement on all images of the current team.

C3: A collective subroutine based on a user-written procedure that
    applies the required operation to local variables shall be
    provided. In addition, because they are often needed, there should
    be specific collective subroutines for SUM, MAX, and MIN for
    intrinsic types for which the corresponding operations are
    defined.  Forms that provide the result to just one image or to
    all the images involved should be provided. Beyond this, there
    should be a collective subroutine that broadcasts a value on one
    image to a set of images. Coindexed source and result arguments
    are not permitted.

4. Additional intrinsic atomic subroutines

Atomic memory operations provide powerful low-level primitives for
synchronization of activities among images without use of heavy-weight
synchronization and lock statements. They can provide substantial
performance advantages.

A1: Atomic intrinsic subroutines shall be provided for
    atomic-compare-and-swap, atomic-integer-add, atomic-bitwise-and,
    atomic-bitwise-or, and atomic-bitwise-xor.  For the integer add
    and bitwise logical operations, both the direct and "fetch-and"
    versions should be supplied.

5. Synchronization using events

The NOTIFY and QUERY statements were proposed in N1858, but for
matching the execution of a NOTIFY statement on one image with the
execution of a QUERY statement on another image, the feature relied on
the numbers of times the statements were executed on the images. This
mechanism is not robust in the presence of segment reordering; for
example, an image that would otherwise be idle might bring other work
forward. The preferred mechanism involves tagged events. The tagging
aspect is important for employing this capability in a library routine
in such a way that is hidden from, and does not interfere with the

E1: There should be a mechanism to allow one-sided ordering of
    execution segments. For example, suppose image I executes
    successive segments I1 and I2 and image J executes successive
    segments J1 and J2; there might be a need for I1 to precede J2
    without the need for J1 to precede I2.

E2: The mechanism should use a data item (tag), accessible on all the
    images, to identify the event. There shall be a type for variables
    used as these tags (probably an opaque derived type defined in the
    intrinsic module ISO_FORTRAN_ENV).

E3: Mechanisms shall be provided to post and test an event on any image, 
    and to wait for an event to be posted to an event variable on that 
    image.  Repeated posts to the same event increment a counter internal 
    to the tag and wait decrements the counter.  The statements 
    implementing event post and wait are image control statements. 
    The test operation may be implemented by an inquiry function, and 
    hence would not an image control statement.  
-------------- next part --------------
                                        ISO/IEC JTC1/SC22/WG5 N1986
                       WG5 letter ballot on N1983

                        John Reid, 10 July 2013

This is a WG5 letter ballot on N1983, the second draft DTS for 
TS 18508, Additional Parallel Features in Fortran.

N1967 has the same content as J3/13-293. It was prepared by the editor, 
Bill Long, following the meeting of WG5 and J3 in Delft. Details of
the changes made from the first draft, N1967, are given in J3/13-294. 

The basic requirements were changed during the meeting in Delft and 
are given in N1981, which supersedes N1930.

Please answer the following question "Is N1983 ready for forwarding to 
SC22 as the DTS?" in one of these ways. 

1) Yes.
2) Yes, but I recommend the following changes. 
3) No, for the following reasons.
4) Abstain.

This is an individual vote. Please send your vote to sc22wg5 at open-std.org 
to arrive by 9 a.m. (UK time) on 12 August 2013. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: N1983.pdf
Type: application/pdf
Size: 251210 bytes
Desc: not available
Url : http://mailman.j3-fortran.org/pipermail/j3/attachments/20130710/53d89876/attachment-0001.pdf 

More information about the J3 mailing list