(j3.2006) (SC22WG5.5146) Draft result of vote on draft TS
John Reid
John.Reid
Thu Dec 12 12:17:46 EST 2013
WG5,
Here is the third draft of the result of the ballot on N1996.
I have added Daniel Chen's vote and removed Bob Corbett's name from the
list of abstainers.
Best wishes,
John.
-------------- next part --------------
ISO/IEC JTC1/SC22/WG5 N1999-3
Result of the WG5 letter ballot on N1996
John Reid
N1997 asked this question
Please answer the following question "Is N1996 ready for forwarding to
SC22 as the DTS?" in one of these ways.
1) Yes.
2) Yes, but I recommend the following changes.
3) No, for the following reasons.
4) Abstain.
The numbers of answers in each category were:
0 for 1) Yes.
0 for 2) Yes, but I recommend the following changes.
10 for 3) No, for the following reasons (Bader, Chen, Cohen, Corbett, Long,
Maclaren, Muxworthy, Reid, Snyder, Whitlock)
0 for 4) Abstain.
The ballot has failed. J3 is requested to prepare a revised version that
takes the comments into account.
Here are the responses in detail
Reinhold Bader
3) No, for the following reasons:
* The resilience feature has not yet received sufficient attention,
* There still exist some problems with ancestor team coindexing,
* Some clarifying words about the event model as well as atomics
may still be needed,
* 13-359 points out a number of outstanding issues that need resolution.
Details on specific issues are given in the following. Unless explicitly
indicated otherwise, all [page:line] markers are references to N1996.
Section 5:
~~~~~~~~~~
(5A) Ancestor coindexing in CHANGE TEAM construct:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Section 5.4 specifies how data transfer between teams can be arranged
for inside a CHANGE TEAM construct. However, it is still not fully
clear for an object that is addressed via
A[outer :: i, j]
how the corank and cobounds of A that are necessary to establish
the coindex-to-image-index mapping are defined in case the referenced
team is a descendant of the team in which the coarray was established.
Based on e-Mail discussion on the Coarray-TS list, it seems that the
best solution may be to allow (oblige?) the programmer to specify this
via a RECODIMENSION statement inside the CHANGE TEAM block in the
above-mentioned case. An example that indicates how this may work is:
REAL, ALLOCATABLE :: a[:,:], b[:]
TYPE(team_type) :: outer, inner
ALLOCATE(a[nx, ny, *], b[*])
FORM TEAM (outer, ...)
CHANGE TEAM(outer)
: ! (X)
: ! initialize a and b using "outer"-local coindexing
FORM TEAM (inner, ...)
CHANGE TEAM (inner)
RECODIMENSION :: a[outer :: p:*], b[outer :: *]
:
a[outer :: i] = ... ! a has new corank and cobounds
b[outer :: j] = ... ! b retains original corank and cobounds
END TEAM
END TEAM
It would be useful to also permit a RECODIMENSION statement for local
coindexing. In the above example, statement (X) could then read
RECODIMENSION :: a[ny,p:*] ! same as a[outer :: ny,p:*]
which would improve support for achieving consistency of coindexing
between the team-local context in "outer" and the ancestor-team
context in "inner".
(5B) Comments on normative text in 5.1:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[9:4-5] mentions "image indices" in a context that actually refers to
coindexing.
[9:14] appears to partially duplicate text for semantics specified in
[9:36]; also it may cause confusion because the words "The current team
is ..." also appear as a definition in [9:3-4].
(5C) Image failure:
~~~~~~~~~~~~~~~~~~~
(1) It is desirable that the programmer can determine whether or not
the implementation supports continuing in the face of image
failure. How about requiring STAT_FAILED_IMAGE to be a
negative value if this support is not at all available?
(2) Imagine that a coarray code runs with four images on two
compute nodes, with two images per node. If the interconnect
between the two nodes fails, an implementation supporting
resilience may well continue executing all four images, but
each image pair will have a different assessment of which
images have failed. In my view, the draft TS is lacking some
specification that provides a minimal amount of consistency.
For example, one could specify that failure j implies that
the implementation decompose the initial set of images into
subsets Aj(1),...,Aj(Nj),Bj with the following properties:
(a) for each k in 1,...,Nj, the images in Aj(k) continue
execution, and consider all images outside Aj(k) failed.
(b) for a failure i that occurs after failure j, each Ai(k)
(k=1,...,Ni) must be a subset of some Aj(kk), and Bi must
be a superset of Bj. (I know that saying "after" is fraught
with peril, but determination of the temporal order in
which error detection is performed could surely be left to
the implementation).
(3) As example (A.2.1) shows, using the resiliency feature is not
conditioned on the use of teams. I suggest moving the
description of the feature to a section of its own.
(4) I'm wondering whether some additional words should be put
into normative text about the definition status of variables
whose values depend on references to coarrays on failed
images. Presently there is only the NOTE 5.6, but for an
implementation that can actually continue execution in the
face of a coindexed reference to a failed image this may be
insufficient.
(5) I suspect there are some problems with [30:30-41]; the first one is that
in [30:30-31] SYNC TEAM does not show up, presumably because the error
detection is restricted to the current team; however this leaves the
situation for SYNC TEAM undefined. The second one is that [30:39-41]
appears to mostly remove the synchronization properties of image control
statements for the non-failed images even in case the error condition is
STAT_FAILED_IMAGE. If this is intended, the resilience aspect of example
(A.2.1) will not work. However, I think even example (A.1.2) cannot, in
general, work properly in this case, for at least two reasons I can think
of:
* re-entering a team execution context via CHANGE TEAM may not have
appropriate synchronization properties (unless perhaps the spec refers
to the *new* team when saying "current team", but this is not fully
clear)
* deallocation of coarrays inside the team execution context should
surely follow the same rule as SYNC ALL, leading to a race condition
on the non-failed images.
So the following questions arise:
* was there a particular reason to remove the synchronization properties
of image control statements, in case STAT_FAILED_IMAGE occurs? If so,
why was it not applied to (DE)ALLOCATE?
* would it not be more appropriate to describe the effect of
STAT_FAILED_IMAGE for each image control statement individually,
retaining as much synchronization as possible? Otherwise I fear that
invalid data will flood the program while it is attempting to recover.
(6) With the most recent F08 interps the MOVE_ALLOC intrinsic may
now be an image control statement. Therefore it will be necessary
to add a STAT argument to this subroutine, because otherwise
code using it cannot be made resilient. Also, a MOVE_ALLOC that
uses coarray actual arguments cannot be PURE (but fixing that may
be outside the scope of the TS).
(7) Note 5.7 points out that image 1 plays a special role because
of the standard input being preconnected to that image; however
in the context of fail-safe execution this may not be that
relevant since the recommended practice is to specify input
files via command line arguments anyway. It may also be worth
pointing out that standard error and standard output will
probably get lost if image 1 is among the failed images; again
this does not necessarily adversely affect fail-safe execution if
the program's I/O is appropriately set up.
(5D) Note 5.2
~~~~~~~~~~~~~
The first line of that NOTE has the text "array A(0,N+1)" which
presumably should read "array A(0:N+1)". Furthermore, I assume
that array elements A(1) and A(N) are updated by the iteration
procedure, and therefore a second "SYNC TEAM (INITIAL)" statement
needs to be inserted just prior to END DO.
(5E) Team identity
~~~~~~~~~~~~~~~~~~
Given the quite complicated constraints on teams not being allowed
in a variable definition context, and also the addition of GET_TEAM,
it might be worth considering a definition of team that is more
decoupled from a concrete instance stored in a particular team
variable. A team might be characterized by
* the subset of images in the initial team
* the value of its ID
* the mapping of local image indices to initial-team image indices.
(based on this, a comparison operator might even be provided).
This would also allow prefabrication of teams via FORM TEAM
statements executed in the initial team that can subsequently
be used in nested CHANGE TEAM blocks, subject to inclusivity
rules, i.e. requirements that assure that it is always a superset
of images of any referenced subteam that invokes CHANGE TEAM.
It is unclear to me whether such prefabrication is permitted
under the present draft's provisions - if so, there appears
to be a lack of consistency in any case.
Section 6:
~~~~~~~~~~
Given the discussion on events on the coarray-ts mailing list
as well as the Editor's comments on clause 6 (13-359 I-6[a-c])
I think proper usage of events involves the following requirements:
(6A) on the programmer: the number of posts to an event in
otherwise unordered segments must always be guaranteed to
match against the same number of waits. A situation that illustrates
this is
Image 1 Image 2 Image 3
A2 A3
A1 Post ev[1] Post ev[1]
Wait(1) ev
B1
Wait(2) ev
C1
where the question is: how is B1 ordered against A2 viz. A3?
Given the present wording in N1996 [16:8-13] and NOTE 6.2, the following
interleavings of atomic event updates might occur:
case 1 case 2
Post ev[1] on Image 2 Post ev[1] on Image 3
Post ev[1] on Image 3 Post ev[1] on Image 2
Wait(1) ev on Image 1 Wait(1) ev on Image 1
case 3 case 4
Post ev[1] on Image 2 Post ev[1] on Image 3
Wait(1) ev on Image 1 Wait(1) ev on Image 1
Post ev[1] on Image 3 Post ev[1] on Image 2
the first two of which imply that B1 is ordered against both A2 and
A3, but there is no information available to the program which of the
four actually happened in any run!
So the answer is indeed: B1 may be ordered against either A2, or A3
or both. It follows that both Waits need to be performed by a program
that wants to ensure segment ordering against both posting images
(i.e., C1).
(6B) On the implementation: It must be guaranteed that event counts
as seen by EVENT WAIT and EVENT_QUERY on the image on which the
event is located will eventually see the updates resulting from
EVENT POST statements issued on any image irrespective of segment
ordering. See also the comments on the atomic examples below.
It may be appropriate to delete querying on remote events if a
stronger requirement is not considered desirable.
Because of (6A) and (6B), I tend to agree with the Editor's items
I-6a and I-6b, but believe that I-6c is not needed, except perhaps
for diagnostic purposes.
In particular, example (A.2.2) starts out violating (6A) and tries
to get around this by using MAX_COUNT; multiple producer programs
should use a different method (e.g. teams with one task per team
acting as a producer) in order to improve scalability. John Reid
has suggested a much better example (tree structure from a
multifrontal solver) that should be used as a replacement for (A.2.2).
Section 7:
~~~~~~~~~~
(7A) FAILED_IMAGES:
~~~~~~~~~~~~~~~~~~~
It may not be desirable to exit the team execution context to
obtain information about which images in the initial team
have failed. Therefore I suggest adding an optional DISTANCE
argument to the FAILED_IMAGES intrinsic.
(7B) GET_TEAM:
~~~~~~~~~~~~~~
Because of [11:4-5], the second example's statement
A [PARENT_TEAM :: 1] = 4.2
in [25:1] is non-conforming (the text [11:4-5] was introduced to
address my comment N1989/(A.2.3)). My conclusion is that at the
very least the DISTANCE argument should be removed from this
function. The second example might read
SUBROUTINE TT (A)
USE, INTRINSIC :: ISO_FORTRAN_ENV
REAL :: A[*]
TYPE(TEAM_TYPE) :: INVOKING_TEAM, NEW_TEAM
INTEGER :: I, ID
CALL GET_TEAM(INVOKING_TEAM)
ID = ... ! calculate team membership
FORM TEAM(ID, NEW_TEAM)
CHANGE TEAM(NEW_TEAM)
... ! process A on each team and define I
SYNC TEAM (INVOKING_TEAM)
... = A[INVOKING_TEAM :: I]
... ! further processing not involving A
END TEAM
END SUBROUTINE
In the above situation I'd consider it advisable having a separate
subroutine dummy of TYPE(TEAM_TYPE) anyway, so the usefulness of
GET_TEAM reduces to producing the value of the initial team.
(7C) Intrinsics in section 7.5:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In [26:11] and [26:21] brackets "()" should probably be added after
the intrinsic names.
Annex A:
~~~~~~~~
Example (A.1.2):
~~~~~~~~~~~~~~~~
Replace [36:15] by
"IF (this_image() <= images_used) THEN
read_checkpoint = .FALSE.
ELSE
read_checkpoint = .TRUE.
END IF"
Reason: images outside the working set will always need to read a
checkpoint once activated.
In [36:36], replace "SUBTEAM" by "TEAM" (renamed construct).
Example (A.2.1):
~~~~~~~~~~~~~~~~
In [37:36], add ")" after "num_images()"
In [37:37], add ")" at the end of the statement
Delete [37:48] and [38:1]. Image failure here may or may not imply that
the corresponding work item is lost. In any case, the program as written
here cannot re-send it. An appropriate comment could be added as a
replacement for the two deleted lines.
Example (A.2.2):
~~~~~~~~~~~~~~~~
See the discussion near the end of the comments on Section 6 above.
Examples for atomic usage:
~~~~~~~~~~~~~~~~~~~~~~~~~~
Section A.3.2.2 gives a number of examples that produce possibly
surprising results. I'd also like to see some that illustrate useful
expected behaviour. For example,
INTEGER(atomic_int_kind) :: x[*] = 0, z = 0
CALL ATOMIC_ADD(x[1], 1) ! (A)
IF (THIS_IMAGE() == 2) THEN
wait : DO
CALL ATOMIC_REF(z, x[1]) ! (B)
IF (z == NUM_IMAGES()) THEN
EXIT wait
END IF
END DO : wait ! (C)
END IF
NOTE 13.1 in the Fortran 2008 standard says that such use is processor-
dependent. However, I'd like to know the answers to the following
questions for the above example:
(1) Is the "wait" loop guaranteed to complete? If this is not the case,
I think some words should be added in the normative text defining
the atomic's semantics along the lines "The effect of the complete
sequence of executed atomic updates shall eventually
become visible to all images even if no segment ordering occurs."
Performance variations we must live with under the regime of QOI,
but the purpose the atomics were designed for should be fulfilled
by every implementation.
It seems to me that this may also be the root cause for Nick's
worries about circularity for the event model; the same principle
should therefore also apply to the count stored inside a
TYPE(event_type) variable (maybe only locally?); this is required
to have guaranteed progress in example (A.2.1), [37:46],[38:7].
(2) Assuming that SYNC MEMORY statements are added to the above
immediately before (A), and immediately after (C), is it guaranteed
that the segments preceding the first SYNC MEMORY on all images
are ordered against the segment following the second SYNC MEMORY
on image 2?
_______________________________________________________________________
Daniel Chen
3) No, for the following reasons:
1. From Van and Reinhold's comments, I think there is indeed an issue with
corank, cobounds and coindex mapping inside a CHANGE TEAM construct.
I think the RECODIMENSION as well as the coassociation proposal should be
further studied and considered.
The following are some minor comments to N1996.
2. [16:] 6.4.Should there be a constraint that is the same as C(604) for the
event variable in an EVENT WAIT statement?
3. [20-22] 7.4.7: SOURCE argument for all the collectives should be coarray.
N1996 only explicitly states it for CO_BROADCAST.
4. [22:] 7.4.9. It states SOURCE shall not be polymorphic. The same wording
should be added for RESULT argument.
_______________________________________________________________________
Malcolm Cohen
3) No, for the following reasons.
a. The design is still under active technical development; in particular,
- the team design has not reached consensus, with additional features and
changes being requested,
- the team design needs (at a minimum) much more explanation,
- the event design has been recently changed, and it is far from clear that
the new version is correctly described and sufficient for purpose.
b. Many technical and editorial problems and ambiguities as reported by others.
I continue to be of the opinion that an explicit formal memory model for atomics
would be a very good idea, but would not vote No purely on that alone.
_______________________________________________________________________
Robert Corbett
3) No, for the following reasons.
I am persuaded by the comments of the who voted
earlier that the draft TS is not ready to be forwarded
to SC22.
In additional to those comments, I have an editorial
comment, which by itself would not have caused me to
vote no.
The second sentence of Clause 3 seems to be out-of-place.
The sentence is obviously true, so obviously true that I
wondered why it was worth stating. I was told that it
was needed to make it clear that the name ISO_FORTRAN_ENV
in Clauses 3.3 and 3.4 referred to ISO_FORTRAN_ENV as
extended by the TS. Again, I thought that to be obvious,
but if it is worth stating, it should be stated explicitly.
I suggest deleting the second sentence of Clause 3 and
replacing Clauses 3.3 and 3.4 with
3.3
event variable
scalar variable of the type EVENT_TYPE(6.2) from the
intrinsic module ISO_FORTRAN_ENV as extended by this
Technical Specification.
3.4
team variable
scalar variable of the type TEAM_TYPE(5.2) from the
intrinsic module ISO_FORTRAN_ENV as extended by this
Technical Specification.
_______________________________________________________________________
Bill Long
No, for the following reasons.
I. Minor editorial fixes.
-------------------------
1) In 5.3, [10:17] delete "scalar". {The rule R504 for a team
variable already says "scalar", so it is redundant here.}
2) In 5.3 the paragraph at [10:22-23] effectively prohibits
deallocation of a team variable for an active team construct. This
seems to make [9:34] redundant. Propose to delete [9:34].
3) In 5.4 Note 5.2 line 1, "A(0,N+1)" -> "A(0:N+1)".
4) In 5.5 [11:15] delete "It is an image control statement." and
insert "The FORM TEAM statement is an image control statement." at the
beginning of [12:1]. The merge the paragraphs [12:1-2] and
[12:3-6]. {Move image control statement bit to para where we discuss
the meaning. Parallel to other subclauses describing statements that
are image control statements.}
5) 5.5 Note 5.4 line 1, replace "coarrays regarded" with
"corresponding coarrays on each image representing parts of a larger
array". {Avoid potential confusion about coarrays being global
objects.}
6) 5.7 Note 5.7 line 2, delete "on modern hardware". {The word
"modern" becomes dated, inconsistent with the nature of a standard.}
7) In 6.3 [15:29] replace "event variable's count" with "count of the
event variable". {Parallel wording to EVENT WAIT.}
8) In 7.1 [17:8] replace "intrinsics" with "intrinsic procedures".
{Subroutines and functions are pure, not 'intrinsics'.}
9) In 7.4, for the OLD arguments at [18:19], [19:6], [19:37], [20:10],
replace "shall be a scalar of type integer with the same kind as ATOM"
with "shall be a scalar and of the same type and kind as
ATOM". {Wording more like ATOMIC_CAS, and allows for future
possibility that additional types are allowed for ATOM.}
10) In 7.4.3 [19:27], replace "prior to the comparison" with "used for
performing the comparison operation". {Clearer and more like similar
wording in other examples.}
11) In 7.4.9 [22:33] replace "continues until" with "terminates
when". {Possibly clearer - current text is not specific about what
more might happen.}
12) In 7.4.13 [24:19], replace "The corresponding actual argument"
with "It". {The argument descriptions for intrinsic procedures are for
the actual arguments. See f2008 [325:5-6].}
13) In 7.5.2 [26:27] after "image index" insert "of the invoking
image". {Clarification}
14) In 8.11 [33:27-28] replace "function" with "subroutine"
twice. {From Dan email.}
15) Noted misc fixes in Reinhold's ballot at [36:15], [36:36], [37:36]
and [37:37], all of which appear valid.
II. More significnant fixes/questions.
--------------------------------------
1) In 2 Normative reference, do we need to include references to the
Corrigenda? If we do, how does this affect the Edits clause?
2) In 5.2 [9:34] could be clarified to begin "The team variable
specified in the CHANGE TEAM statement of the current change team
construct...shall not be deallocated." {It is possible for there to
be multiple team variables with the same value. Ones not appearing in
an active CHANGE TEAM statement should be OK to deallocate.} Note: See
I-2 above. If that is accepted, this edit is moot.
3) In 5.4 [11:1-4] replace the first sentence of the para with "If
<team-variable> appears in an image selector its value shall be the
same as the team variable specified in the CHANGE TEAM statement of a
currently executing change team construct or the initial team. The
image index computed using the specified cosubscripts is interpreted
as an image index in the team specified by <team-variable>." {The
wording about FORM TEAM and GET_TEAM is duplicated in [10:19-21].
Furthermore, the original text was unclear that the value relative the
the team is the image index.}
4) In 5.7 [12:24] Is the term "collective activity" well defined?
5) In 5.7, after Note 5.7, should we include a note saying that
continued execution can depend on the nature of the program/algorithm?
6) Subclause 6.5 might not be needed at all depending on the outcome
of the discussion on MAX_COUNT.
7) In 7.3 [17:28-29] This para is overkill. It is allowed, for
example, that the VALUE argument be a coarray, and there is no such
requirement in that case. You could also have a coarray STAT
argument. Needs to be restricted to the argument on which the
collective operation takes place.
8) In 7.4, in the descriptions of the ATOMIC_* subroutines, we use the
"becomes defined with" terminology frequently. In other parts of the
document we have moved to "is assigned". Do we want these changed as
well?
9) In 7.4.9 CO_REDUCE [22:16] the statement "and the function shall be
executed by all images of the current team" is not true. It is
allowed, for example, for just one of the images to do the whole
computation. We intend that, for any image that does execute the
function, it is the same function.
10) In 7.4.9 [22:17] Is it allowed for the RESULT argument to be
polymorphic? Seems not symmetric with SOURCE.
11) 7.4.11 [23:24] In EVENT_QUERY, there should be an ERRMSG argument
as well. Compare with the GET_xxx intrinsics.
12) In 7.4.13 [24:19-20], is the sentence "The corresponding
... ancestors." needed? The sentence is poorly worded, and redefining
what is actually intended here is already prohibited elsewhere.
Propose to delete rather than repair. If that is accepted, I-12 above
is moot.
III. Issues not yet resolved.
-----------------------------
1) The MAX_COUNT feature in EVENT POST has problems (expanded from
13-359).
The intention is that operations on the count variable of an event be
atomic. That is easy for a plain EVENT POST (atomic add 1) and EVENT
WAIT (atomic add -1). This also is the case for an EVENT POST with a
COUNT= specifier (atomic fetch-and-add 1) which would provide
potentially useful information to the executing image. Similarly, and
EVENT CLEAR statement could be implemented as (atomic and
0). Alternatively, an EVENT CLEAR could be implemented as an EVENT
WAIT with a CLEAR="yes' qualifier, for example. Or, with richer
semantics as an EVENT WAIT (UNTIL_COUNT = <scalar-int-expr>) form that
would wait until the count got to the indicated level and then
subtract the UNTIL_COUNT value from the event variable count and
complete. These alternatives need consideration, as they provide
useful functionality and can still be implemented atomically.
However, including a MAX_COUNT specifier in an EVENT POST statement
can lead to a race condition. This is fundamentally two operations - a
fetch of the current value, followed by a decision on whether to
increment. It is possible to get around this with repeated retries
with a compare-and-swap operation, but the implementation will be
significantly slower and potentially deadlock. Therefore, I think the
current MAX_COUNT= specifier is problematic and needs repair or
removal. Note that there is a special case that would work - a binary
only version that only sets the count to 1 if it is currently 0
(atomic compare-and-swap). As long as the user never executes a
'non-binary' event post on that event variable this could be usable.
That involves either restricting MAX_COUNT to be 1 if it is specified,
or to change the spelling to something line BINARY='yes', with the
default 'no'.
2) EVENT QUERY loose ends (from 13-359).
In 7.4.11 EVENT_QUERY, the COUNT argument is assigned the value 0 if
an error occurs. Not very informative. Perhaps count=-1 would be more
useful in the error case.
In 7.4.11 EVENT_QUERY, if the STATUS argument is not present and an
error condition occurs, does the program terminate? It appears
not. That is the same as for GET_COMMAND and friends with a STATUS
argument. But the opportunities for failure here are greater (EVENT
image is failed, for example). Should a valid value be
STAT_FAILED_IMAGE?
3) Deallocation of a saved coarray at the end of a CHANGE TEAM
construct (from 13-359).
Note 5.1 explains that an implementation is responsible for
deallocating coarrays at the end of an CHANGE TEAM construct. This is
not trivial, since a coarray with the SAVE attribute that is allocated
in a subprogram called will need to be tracked by the runtime in case
the subroutine is called inside a CHANGE TEAM construct. No suggestion
for a change - just a heads up to implementors.
4) Do we want a cobounds remapping facility? This would be a new
feature. Background and discussion follows.
In N1996, the TS 18508 draft from J3 meeting 202, the facility
provided by the modified image selectors allows references to coarrays
on images that are not part of the current team. This is enabled by
syntax that specifies a different team that is in effect for that
reference. The team has to be an ancestor of the current team, and
include the image specified.
The cobounds for a coarray can only be specified in a declaration or
allocate statement. Changing to a different team does not alter the
cobounds or corank of an existing coarray. A coarray has only one set
of bounds at a given time, and only allocatable coarrays can change
their cobounds during program execution.
The identification of the correct physical PE containing the coarray
being referenced using the new syntax involves two steps: Using the
specified cosubscripts and the current cobounds for the coarray, an
image index is computed. The image index is then converted to a
physical PE by a team-specific mapping.
Suppose a coarray,
REAL :: A(:)[N1,N2,*]
exists (either static, or allocatable and allocated with those
cobounds) on each image on entry to a CHANGE TEAM construct. For
statements executed during execution of the CHANGE TEAM construct:
Case 1: No team is specified in the reference:
X(:) = A(:)[i,j,k]
This reference is relative to the current team. The cobounds used to
compute the image index are the ones that existed when the CHANGE TEAM
construct began. If the computed image index is outside the range
1..num_images() for the current team, the reference is in error. If
the image index is in the valid range, the mapping between image
indices and physical PE for the current team is used to identify the
physical PE containing the referenced coarray. The computation of the
correct coarray location is unambiguous in this case, though the
selection of the values [i,j,k] might not be intuitive.
Case 2: An ancestor team, pteam, is specified in the reference:
X(:) = A(:)[pteam :: i,j,k]
This reference is relative to the team specified by the value of the
team variable pteam. The computation of the image index is exactly the
same as in Case 1, with the current cobounds of A(:) used. The value
of num_images() used in the range check for a valid image index is the
number of images in team pteam. The image identified has to be an
image that is part of team pteam; otherwise an error occurs. The
mapping between the computed image index and a physical PE location
for A is the one for team pteam.
For the team-modified image selector syntax to work, the
implementation would need to keep track of the mapping and
num_images() information for all ancestors of the current team, and
associate that with the team variable. This is probably the case
anyway. It is not necessary to keep track of cobound information
separately for each team - that information is tied to the coarray,
not the team.
As noted in N1983, in the comments on the previous TS draft from Van
Snyder, the correct values for the cosubscripts in Case 1 are not
intuitive unless the corank is one. The existing team-modified syntax
in Case 2 does not address that problem.
A facility enabled by a RECODIMENSION statement has been discussed on
coarray-ts to address this problem. A RECODIMENSION defines the
current cobounds for a coarray that exists during execution of the
construct and is associated with the previously existing coarray of
the same name. The cobounds and corank of the construct coarray may be
different from those of the existing coarray. The association is
similar to argument association. This is superior to actual argument
association in that a procedure call is not involved. How use of this
feature would affect the ability to access the corresponding coarray
on an image outside the current team (using a team-modified image
specifier) is not quite as clear.
Alternatively, a syntax similar to the associate construct, as
suggested by Van, could be employed. That has the advantage of using a
different name for the construct entity, which would permit use of the
original name for accesses outside the current team.
_______________________________________________________________________
Nick Maclaren
3) No, for the following reasons.
I regard comments A, B, H, I, J and K as the most serious, as they are
either not fixable by additional function or wording changes or certain
to cause massive problems.
Many of the comments in N1989 have not been addressed. These include
(with some slight modifications):
Teams
-----
Comment A
---------
5.2 and 5.3, p9:16-*, p10:*-35. It is still not clear whether
TEAM_TYPE objects have value or association semantics. C502 and C503
are not enough, because of the implicit copying implied by passing
assumed-shape arrays to explicit-shape or assumed-size ones, and the
wording (e.g. R502) says 'variable'. This is linked to the next point,
but is not the same.
However, I forgot the VALUE attribute and vector subscripts. Fortran
2008 12.5.2.3p4 abd 16.6.1.6p4 make it very clear that a VALUE dummy
argument and dummies corresponding to vector subscripted arrays NOT the
same variable as the actual argument. While it could be said that these
are variable definition contexts, they are NOT in the list in 16.6.5.
Is it permitted to have VALUE dummies, or vector subscript actual
arguments, and where is that stated in normative text?
Either the above loopholes must be closed, or TEAM_TYPE variables must
be stated to have value semantics (in which case forbidding assignment
is not needed). I cannot propose edits, as I have never discovered
what mental model other people are using.
Comment B
---------
5.3, p10:28-35. Executing a common CHANGE TEAM statement the same
number of times is not enough, because the variable could be a dummy
argument associated with a different team on different images. There
needs to be an explicit restriction (probably in lines 14-16) that all
variables must have been created by the same execution of the same FORM
SUBTEAM statement with the same team-id.
TYPE(TEAM_TYPE) :: a[NUM_IMAGES()]
DO i = 1,NUM_IMAGES()
FORM TEAM (i,a(i))
END DO
CALL Fred(a(i))
SUBROUTINE Fred (x)
TYPE(TEAM_TYPE) :: x
CHANGE TEAM (x)
...
I cannot find anywhere in the text that is forbidden, but it clearly
makes no sense. In particular, 5.3 p10:28-35 becomes nonsense if it
is allowed. This is simple to fix.
5.3 p10:19-21. After "intrinsic subroutine GET TEAM (7.4.13).", add:
"All members of the team specified by team-variable shall execute
the CHANGE TEAM statement, and team-variable shall specify the same
team on all images."
Comment C
---------
5.6 p11:17+. There is nothing said about when resources may be
released, and no mechanism for the user to free them. This is not
reasonable, and there needs to be some defined way for a programmer to
avoid memory leaks when using FORM SUBTEAM heavily. Note that allowing
deallocation is NOT enough, as cleaning up teams needs synchronisation,
just as creating them does.
Comment D
---------
7.4.15 p26:5-7. I can find no guarantee that the subteam id. is
assigned in a defined order, and hope that is not the case. The example
comments should say "Code for half of the images in the current team"
and "Code for the other half of the images in the current team".
Events
------
Comment E
---------
6.3 p15:34. This still makes no sense, as an image control statement
cannot occur within a segment! It should say something like "How
sequences of posts that are not ordered by other segment ordering rules
interleave with each other is processor dependent."
Comment F
---------
7.4.11 p23:25-36. It needs to say that EVENT_QUERY may be used in
segments that are unordered with respect to EVENT POST on the same
variable.
Collectives
-----------
Comment G
---------
7.4.9 p22:16. This makes no sense and does not address the comment
in N189, anyway. A reduction over N images needs only N-1 pairwise
operations. It would be far better to leave it completely open and
change:
", and the function shall be executed by all the images of the
current team."
to:
". It is unspecified on which images it will be called, how
many times and on which arguments."
New Substantive Points
----------------------
EVENT POST MAX_COUNT
--------------------
Comment H
---------
Upon thinking of how to implement these facilities, I realise that the
availability of MAX_COUNT causes a serious performance loss. Without
that, EVENT POST can be implemented by a simple fence and message sent
to the event owner. With that, it needs to wait for a response from the
event owner, which will often cause the posting image to block until the
owning image reaches an active coarray statement. This is also noted in
13-359.
Requiring a maximum count of 1 would have the same loss in performance,
but they would also reduce the model to one whose semantics are
understood. As I have said before, I would regard that as a price worth
paying.
In short, I think that MAX_COUNT is a very bad idea, as it combines
the disadvantages of both general and binary semaphores.
EVENT_QUERY
-----------
Comment I
---------
There have been multiple inconclusive Email debates on exactly what is
specified, with no consensus on what should be said in normative text.
I have been convinced by them that it is not possible to produce a
consistent specification for EVENT_QUERY without introducing a
synchronisation model by the back door. This is particularly serious
because of the EVENT_QUERY example in A.2.1 pp37-8.
One of the issues raised was whether programs like the following are
conforming:
Example event_1:
INTEGER :: x[*]
On image 1 On image 2
POST EVENT (q[2]) CALL EVENT_QUERY (q, n)
x[3] = 123 IF (n >= 2) THEN
POST EVENT (q[2]) WAIT EVENT (q)
x[3] = 456
END IF
Similarly, it is unclear whether the following program is required
to complete:
Example event_2:
On image 1 On image 2
POST EVENT (q[2]) DO
CALL EVENT_QUERY (q, n)
IF (n > 0) EXIT
END DO
Also, does the same answer hold if image 1 is the event owner?
These are NOT minor points, because they have a major impact on how
EVENT_QUERY can be implemented. In particular, if those examples are to
work, EVENT_QUERY has to be implemented using very similar mechanisms to
EVENT POST with MAX_COUNT=0. Even if we restricted EVENT_QUERY to local
operation, it would have to probe for incoming posts for example event_2
to work. Example event_1 would not add any extra inefficiency, but
would complicate the logic for synchronisation on many systems.
At this late stage, I think the only feasible solution is to omit
EVENT_QUERY entirely, pending a memory model.
Atomic Subroutines
------------------
Comment J
---------
A.3.2 is very welcome, and clarifies the current atomic subroutines
considerably. Unfortunately, it is not enough to avoid problematic
issues with the new atomic subroutines.
The underlying problem is that the word 'atomic' has many possible
meanings, has drifted over time, and not all of these make sense with
the new atomic subroutines. There is an official ISO dictionary, but I
have not been able to access a copy. Either Fortran needs to refer to
some reasonably authoritative and explicit definition or it needs to
define what it means. In particular, it has more-or-less the following
meaning:
An operation completes in its entirety or makes no change to system
state, without any other agent being able to see an intermediate
condition, but without ANY implication of data consistency.
This is one of the meanings used by Intel (see the Intel 64 and IA-32
Architectures Software Developer's Manual, volume 3, 8.1, Locked Atomic
Operations and 8.1.1 Guaranteed Atomic Operations).
http://www.intel.com/content/www/us/en/processors/
architectures-software-developer-manuals.html
However, there is a problem here, which is whether the term 'atomic'
implies 'coherence', which essentially means that updates cannot simply
get lost even if they occur in parallel. This was not needed when
atomic operations were used for interrupt handlers, but was rapidly
discovered to be critical for parallelism when multi-cpu computers
started to be used. The experts I have contacted have told me that
modern computer science convention is to assume it, but that it is not
implicit and any rigorous specification should state it explicitly.
The C++ standard does just that (see 1.10 Multi-threaded executions and
data races, paragraph 6).
Note that Intel atomic accesses are coherent by default, but incoherent
atomic accesses are possible if the above rules are followed but the
MTRR of the memory is set to WB (see 8.2.5 Strengthening or Weakening
the Memory-ordering Model).
With the existing atomic subroutines, the lack of coherence is not
observable, provided that it does not cause two simultaneous atomic
definitions to fail. However, without coherence, even simple use of
(say) ATOMIC_ADD to accumulate totals is likely to give the wrong
answer. See the next comment for a proposed solution.
Comment K
---------
Unfortunately, the above problem is made worse by the new atomic
subroutines, which are functionally equivalent to OpenMP's 'capture'
atomics. I enquired of the same experts and the reason that most papers
do not describe composite operations like update and capture is that
doing so is much harder than for simple loading and storing; in
particular, C++ does not have any such concepts in its memory model.
Take the following program:
INTEGER(ATOMIC_INT_KIND) :: x[*] = 0
INTEGER :: n = 0
On image 1 On image 2
CALL ATOMIC_OR(x[3],z'1',n) CALL ATOMIC_OR(x[3],z'1',n)
PRINT *, n PRINT *, n
Printing 0 and 0 could very reasonably be said to be a valid
optimisation, not least because the assignment to the OLD value is not
part of the atomic operation. Indeed, the same could be said even if
image 2 ORed z'2' into the value rather than z'1'.
Consider the following program:
INTEGER(ATOMIC_INT_KIND) :: x[*] = 0
INTEGER :: n = 0
On image 1 On image 2
CALL ATOMIC_OR(x[3],z'1',n) CALL ATOMIC_OR(x[3],z'2',n)
PRINT *, n
CALL ATOMIC_OR(x[3],z'4',n) CALL ATOMIC_OR(x[3],z'8',n)
PRINT *, n
This can obviously print 8 and 9, but is it allowed to print 0 and 3?
And, worse, is it allowed to print 4 and 12?
Fortran must do the same as C++ and say what it means, even if it does
not specify a memory model; allowing such lunacies as the above (and
they ARE plausible optimistions, even the second) is a recipe for
massive user confusion. A possible solution would be to modify C++'s
rule, and change sentence 2 of Fortran 2008 13.1 paragraph 3 from:
"The effect of executing an atomic subroutine is as if the
subroutine were executed instantaneously, thus not overlapping other
atomic actions that might occur asynchronously."
to
"The effect of executing atomic subroutines on a single atomic
object is as if the subroutines were executed in some unspecified
serial order, with none of the accesses to that object in any one
subroutine execution interleaving with those in any other."
I believe that is the bare minimum necessary for sanity.
_______________________________________________________________________
David Muxworthy
3) No, for the following reasons.
Clearly, consensus on the design has not yet been achieved. Whether
the eventual design can be implemented satisfactorily on multiple
platforms is still to be proved. The statement about inclusion in the
next revision of ISO/IEC 1539-1 (Introduction paragraph 5) should
refer instead to ???a future revision???.
______________________________________________________________________
John Reid
3) No, for the following reasons.
1.
Rather hurriedly in Delft, we added the option of a team variable appearing
in an image selector, e.g., a[parent::i,j]. The intention was to allow the
cosubscripts of a coarray declared in an ancestor to be interpreted in
exactly the same way in a change team construct as in the ancestor, for
example, when performing halo exchanges.
This does not work well if the coarray is a dummy argument because its
name and the names of its ancestors are unknown. J3 therefore added the
intrinsic GET_TEAM to place a copy of the value of the team
variable of the ancestor at a level DISTANCE in a local team variable.
It seems to me that a much better solution would be to specify the
distance directly in the image selector, e.g., a[distance::i,j]. It is
much simpler and there is far less scope for inconsistent setting of
team variables. I would like GET_TEAM to be removed entirely.
2.
We have added an optional DISTANCE argument to NUM_IMAGES. We need to do
this also for LCOBOUND and UCOBOUND so that the coshape of an ancestor
can be determined.
3.
Add a new subroutine FAIL_IMAGE() whose effect is to cause the executing
image to behave as failed. This is needed for the testing of a program
that is intended to continue execution in the presence of failed images.
An optional argument IMAGE might be added to give the effect of
communication between the executing image and image IMAGE having failed -
each would continue executing but see the other as failed.
4.
I support the concept of RECODIMENSION that Reinhold Bader suggests in his
ballot. I also support the view that Malcolm Cohen expressed in an email:
"I would prefer different syntax to be used when one intends to
re-codimension an array, perhaps
RECODIMENSION :: ...whatever
and this would not be a general specification statement, but part of the
CHANGE TEAM syntax which would then be
<change-team-stmt>
[ <recodimension-stmt> ]...
<block>
<end-change-team-stmt>
And rather than "modifying the attribute of an existing object" (horrible), it would be declaring a "construct entity that is associated with the local entity of the same name". We do already have construct entities that are associated with outer-scoped objects (via ASSOCIATE and SELECT TYPE) so this is not a new concept.
In any case one must write quite a bit of new text to specify how this is going to work, but making it a construct entity is probably easier than making the CHANGE TEAM construct into a scoping unit thus wheeling out host association (already complicated) and then adding more complication to it.
> I suppose a logical question would be whether this should also be
> allowed in a BLOCK construct. Perhaps that should be left as an
> integration issue.
No, it cannot be left as an integration issue. It should be either part of the CHANGE TEAM syntax (and described as a construct entity), or a normal specification-stmt in which case the CHANGE TEAM construct ought to be a scoping unit with a specification-part. Or some slight tweak of those major options."
5. (An edit)
[11:15-16] Consider the sentence "The value of team-id species the team
to which the executing image belongs." This is nonsense: the current team is the team to which the executing image belongs. Replace it by
"The value of team-id species the new team to which the executing image
will belong."
_______________________________________________________________________
Van Snyder
3) No, for the following reasons.
I have been on holiday in Asia for most of November. I have not had the
time to study the entire DTS in detail. Therefore, I comment here on
only one aspect of the DTS.
I remain unconvinced that teams have been correctly designed, but if so
they are not sufficiently well described. The phrase "image indices are
relative to the current team" in Subclause 5.1 does not adequately
explain what parts of a coarray are accessible in a subteam. The
mapping from coarray coelements accessible in the parent team, and their
cosubscripts, to coarray coelements accessible in the subteam, and their
cosubscripts, needs to be more explicitly explained.
More importantly, it is not possible to change the coextents of leading
codimensions of coarrays of rank greater than one when a subteam
commences execution. This means that teams are not very useful if one
has coarrays of corank greater than one.
Finally, it is not possible for a subteam to access more coelements than
the number of images in the subteam. This makes it difficult to handle
cross-boundary effects in, say, an elliptic PDE problem, without using
cosubscripts relative to a specific ancestor team. This is a
fundamentally bad idea, that is antithetical to one of the reasons
advanced for providing teams: software reuse.
Reinhold proposed a RECODIMENSION statement. This would presumably have
an effect on leading codimensions analogous to the effect of a DIMENSION
statement on leading dimensions of a non-coarray dummy argument of a
procedure.
The attached text file provides more detail concerning what I believe to
be the problems, and proposes a scheme of coarray coassociation similar
to the association established by the ASSOCIATE construct.
.......................................................................
1. Problem description
----------------------
Within a subteam created by a CHANGE TEAM construct, it is desired to
access a portion of a coarray belonging to the parent team, using
cosubscripts such that the range of accessible coelements, taken in
coarray coelement order, depends upon the subteam. It is undesirable to
require the subteam to be aware of the mapping from coelements of the
parent coarray to coelements germane to the subteam.
The phrase "image indices are relative to the current team" in Subclause
5.1 does not adequately explain what parts of a coarray are accessible
in a subteam. The mapping from coarray coelements accessible in the
parent team, and their cosubscripts, to coarray coelements accessible in
the subteam, and their cosubscripts, needs to be more explicitly
explained. If this is described elsewhere, it needs to be in Subclause
5.1.
For example, if one forms a subteam using 1+mod(this_image(),2) for the
<team-id>, it is not obvious that the coelements of coarrays accessible
in each subteam are the odd-numbered and even-numbered coelements of
coarrays in the parent team, taken in the parent team's coarray
coelement order (a concept we have not defined).
More importantly, it is impossible to change the coextents of the
leading codimensions of coarrays of rank greater than one when a subteam
commences execution. Suppose one has 100 images and a coarray with
coextent [1:10,1:10]. Suppose one wishes to divide the current team
into four subteams of 25 members each, each accessing a quadrant of the
coarray with coextent [1:5,1:5]. All subteams would access the coarray
with the leading coextent declared in the parent team, in this case
[1:10]. We have no concept of copointers, coassociation, copointer
corank remapping, cosections, or coassociation during procedure
reference or execution of an ASSOCIATE or SELECT TYPE statement,
analogous to pointer association, argument association, or construct
association for arrays. This means either that teams are not very
useful if one has arrays of corank greater than one, or a subteam must
be made aware of at least some properties of the mapping from the parent
team to the subteam, analogously to the way that subprograms having
explicit-shape dummy arguments need to be told what parts of leading
dimensions to use.
A third problem is that it is not possible to access coelements outside
the mapping for the current team, or for parts of a coarray to be
accessible in more than one subteam using subteam-relative cosubcripts.
For example, in the above problem, one might wish to divide the
[1:10,1:10] coarray into pieces with coextents [1:6,1:6], with the first
team having coelements [1:6,1:6], the second having [4:10,1:6], the
third having [1:6,4:10], and the last having [4:10,4:10]. This makes it
difficult to handle cross-boundary effects between regions of, say, an
elliptic PDE problem. The present DTS requires the use of cosubscripts
whose values apply to a specific ancestor team. This is a fundamentally
bad idea, that is antithetical to one of the reasons advanced for
providing teams: software reuse.
3. Proposal
-----------
An addition to the syntax of the CHANGE TEAM statement, analogous to the
ASSOCIATE statement, could specify coassociation.
For example, assuming s = 1+mod(this_image(),2) one might use the
following to associate A1 with the odd (even) coelements of A.
change team ( t(s), a1 => a[s:*:2] )
! Herein, A1 is a coarray that is coassociated with the
! odd-numbered coelements of A in subteam 1 and the even-numbered
! ones in subteam 2, of which there are n/2, and therefore
! cosubscripts of A1 in the range 1:n/2 access the expected
! coelements of A.
...
end team
The mapping from A to A1 is not necessarily the same as the mapping
established by the FORM TEAM statement. If it is necessary for the
mappings to correspond, that should be explicitly required. A STAT=
specifier value could indicate a mismatch.
In the case of a coarray of corank greater than one, one might compute
cobounds that depend upon the subteam id, and do something like
change team ( t2(s1,s2), c1 => c[i1:i2,j1:*] )
One could calculate the cosubscripts to handle the problem of a
cosection belonging to more than one subteam. For example, one subteam
might have i1:i2 == 1:6 while another has i1:i2 == 4:10. This would be
inconsistent with the proposition that the mapping shall correspond to
the one implied by the FORM SUBTEAM statements that created the team
variable.
Vector cosubscripted cosections would not be an insurmountable problem
here (until A1 or C1 is an actual argument corresponding to a coarray
dummy argument, which should perhaps be prohibited), because the
processor clearly could see the vector.
_______________________________________________________________________
Stan Whitlock
3) No, for the following reasons.
>From several different comments and discussions, I think there are issues
with corank, cobounds, and coindex mapping inside a change team construct.
The recodimension and the coassociation proposals also appear to need
further work.
More information about the J3
mailing list