(j3.2006) (SC22WG5.5405) TS revision and straw vote
John Reid
John.Reid
Thu Jan 1 13:40:51 EST 2015
WG5,
I attach the following documents
N2039 Response to the WG5 straw ballot on N2033 (Long and Reid)
This contains responses to the straw ballot comments, taking account of
emails on the coarray discussion list.
N2040 Draft TS 18508 Additional Parallel Features in Fortran (Long)
- supersedes N2033
This is a new draft TS, incorporating the edits in N2039 and one
additional minor edit from the editor.
N2041 Editor's report for WG5/N2040 (Long)
This contains the editor's report on all the changes from N2033.
N2044 WG5 straw ballot on N2040 (Reid)
This is a new straw ballot with deadline 17 January.
Note that the number of changes this time is far less than last time.
They address all the straw vote comments except those of Nick Maclaren.
Happy New Year!
John.
-------------- next part --------------
ISO/IEC JTC1/SC22/WG5 N2044
WG5 straw ballot on N2040
John Reid, 1 January 2015
This is a WG5 straw ballot on N2040, the seventh draft DTS for
TS 18508, Additional Parallel Features in Fortran.
N2040 was prepared by the editor, Bill Long, to accord with the
response in N2039 to the straw ballot result in N2038. Details of all
the changes made from the previous draft, N2033, are given in N2041.
Please answer the following question "Is N2040 ready for forwarding to
SC22 as the DTS?" in one of these ways.
1) Yes.
2) Yes, but I recommend the following changes.
3) No, for the following reasons.
4) Abstain.
This is an individual vote. Please send your vote to sc22wg5 at open-std.org
to arrive by 9 a.m. (UK time) on 17 January 2015.
-------------- next part --------------
ISO/IEC JTC1/SC22/WG5 N2039
Response to the WG5 straw ballot on N2033
Bill Long and John Reid
This paper contains responses to the comments in the WG5 straw ballot
on N2033 (see N2038) and a set of edits to N2033.
Reinhold Bader (A) wrote
"Now that the TS has the concept of stalled images, I think that image
control statements without a STAT= specification that involve a
failed image could now relatively easily be made to result in the
executing image becoming stalled, instead of terminating the program.
This would make development of fail-safe packages much easier,
because the fail-safety can be designed in a top-down manner i.e.
library code that synchronizes, allocates or deallocates must not
necessarily be modified."
Response
We see the inclusion of a STAT= specification in an image control
statement as desirable and believe that this will not have a big
effect on performance. This is in contrast to protecting against
remote references on failed images, for which there is no syntax
and which would have performance overheads. Furthermore, we now
have a fairly simple model of image control statements without a
STAT= specifier - if successful they form segment boundaries and
can be used for segment ordering and hence data consistency,
or they fail and the program aborts, making data consistency
issues irrelevant. Therefore, we do not favor the extension of
stalling to image control statements without a STAT= specification.
.......
Reinhold Bader (B) wrote
(B) Section 7.
It is not specified what happens if no STAT argument is specified.
Response
We suggest this edit (copied from [18:24-25])
[17:32+] Add new para
"If a condition occurs that would assign a nonzero value to a STAT
argument but the STAT argument is not present, error termination
is initiated."
.......
Reinhold Bader (D) wrote
(D) Collective intrinsics CO_BROADCAST (7.4.10) and
CO_REDUCE (7.4.13)
There still seems to be some missing text with respect to invoking
these intrinsics with objects of derived type that have POINTER
components.
Response
He is concerned about a pointer appearing to be associated with
a target on a remote image. This can already happen in Fortran
2008 and such a pointer is regarded as undefined, see 16.5.2.5
of the Standard:
"The association status of a pointer becomes undefined when
...
(2) the pointer is pointer-assigned to a target on a different image,"
No edits to N2033 are needed.
.......
Reinhold Bader (C) and (E) are relevant only if the change proposed in
(A) is accepted. Thus, there are no separate responses for these
comments.
.......
Tobias Burnus wrote
The DTS does not address finalization of CO_BROADCAST and CO_REDUCE for
derived types which have finalizers.
For CO_BROADCAST, simply adding a statement like the following should
be sufficient and implementation wise, it should be simple as one can
simply finalize it before the actual data transfer: In the description
of "A" append: "On all images of the current team but on the image
specified by SOURCE_IMAGE, A is finalized before it becomes defined.
For CO_REDUCE, the implementation will be more difficult; still, I
believe it makes sense to require finalization.
Response
For CO_BROADCAST, we already have the words [22:24] "A becomes defined,
as if by intrinsic assignment, ..." and the standard states in 4.5.6.3,
para 9: "When an intrinsic assignment statement is executed,
the variable is finalized after evaluation of expr and before the
definition of the variable." Therefore, no change is needed.
For CO_REDUCE, these edits are needed:
[24:22] After "to A" add "as if by intrinsic assignment".
[24:23] After "to A" add "as if by intrinsic assignment".
.......
Malcolm Cohen wrote
I agree with Robert Corbett's vote.
I am somewhat taken aback that we've suddenly added this new concept
(stalled images) with far-reaching effects (and more proposed in other
comments) at the last minute.
It needs to be clear that it is possible to implement the "reliability"
(failed/stalled/whatever image) features efficiently on a variety of
architectures. It should not require incompatible changes to an existing
coarray implementation (which the current draft certainly seems to do). I
have no problem with some "bells and whistles" potentially requiring extra
work, but a reasonably effective subset needs to be workable without heroic
efforts, and without affecting programs that do not use the feature.
Response
See our responses below to Robert Corbett's vote.
.......
Malcolm Cohen wrote
Re finalization, I agree with Tobias Burnus' comments that it would be good
for this to be spelled out in detail for CO_BROADCAST and CO_REDUCE. For
the latter it should say that the result of applying the function is
finalized, including the final function application, (the latter is as if
the output variable were assigned an expression that is the last function
reference). It should, I think, also be stated that the finalizations of
the intermediate function results are done on the image that actually
invoked the function, so that any deallocations are handled by the image
that did the allocations.
Response
See our responses below to Tobias Burnus' comments.
.......
Robert Corbett wrote
My primary objection is to the requirements given in the third
paragraph of Clause 5.9 [14:19-23]. I do not see how the specified
semantics can be implemented without compromising the performance of
codes that do not make use of the feature. I am not certain that
the semantics can be implemented at all in some common environments.
Response
The intention was that implementations be permitted not to support
transfer of control to the END TEAM statement, but the present wording
does not say this. We think these edits are needed to address this
comment and your other comments:
[14:9] Change "detect that an image has stalled" to "manage a stalled
image".
[14:19-25] Replace these two paragraphs by the following two
paragraphs:
"If an image, in a statement other than an image control statement or an
invocation of a collective or atomic subroutine, attempts to reference
or define data using an <image-selector> that identifies an image that
has failed, the executing image becomes a stalled image. If the
<image-selector> identifies the initial team or the processor does not
have the ability to manage a stalled image, the executing image remains
a stalled image for the rest of the execution of the program.
Otherwise, the executing image resumes execution at the
END TEAM statement of the construct after execution of all
finalizations and deallocations that would have occurred during the
normal completion of active procedures that were invoked within the
CHANGE TEAM construct.
While an image is stalled, other images can still access data
on that image. If an image is stalled in the initial team, it
participates in normal termination as if it had initiated normal
termination."
.......
Robert Corbett wrote
Clause 3.7 [5:41]
What does it mean for an image to have "encountered" an
<image-selector>? I know we use the usual meaning of a word when
we do not specify its meaning, but that rule is inadequate for this
case. For example, if an image executes a statement that contains an
<image-selector>, but that <image-selector> is part of an operand that
is not evaluated, has the <image-selector> been "encountered?" My
guess is that it has not, but I cannot tell that from the draft TS.
Response
Your guess is correct. Together with the rewrite of [14:19-25] in
our response to your first comment, these edits are needed:
[5:41] Change "has encountered" to ", in a statement other than an
image control statement or an invocation of a collective or atomic
subroutine, attempts to reference or define data using".
[32:14] Change "has encountered" to ", in a statement other than an
image control statement or an invocation of a collective or atomic
subroutine, attempts to reference or define data using".
.......
Robert Corbett wrote
Clause 5.9, paragraph 3 [14:20-23]
When does a stalled image transfer control to the END TEAM statement?
Can it happen immediately or must it wait until all other images that
are part of the same team have completed, failed, or stalled?
Response
In order to preserve symmetric memory, it would be necessary for the
stalled image to participate in coarray deallocations. Also, it is
intended that data on the stalled image remain available to executing
images. The edits needed are included in the rewrite of [14:19-25]
in our response to your first comment.
.......
Robert Corbett wrote
Clause 5.9, paragraph 3 [14:20-23]
Are the deallocations and finalizations subject to any requirements
w.r.t. the order in which they are performed? For example, during
normal execution, allocatable objects that are part of an instance of
an internal procedure will be deallocated before the allocatable
objects that are part of the related instance of the host procedure.
Is there any requirement that that ordering be respected by the
stalled image?
Response
Yes, the order should be respected. The edits needed are included in
the rewrite of [14:19-25] in our response to your first comment.
.......
Bill Long wrote
I recommend the following change.
N2033: [14:23] Delete ',without synchronization of coarray deallocations'.
Response
This edit has been included in the response to Robert Corbett.
.......
Nick Maclaren wrote
I agree with Robert Corbett and Malcolm Cohen about stalled images, but
believe that they have understated the issue. The requirement is to
handle the 'knock-on' effect of image A failing, image B getting stuck
as a consequence, and image C then needing to interact with image C. I
agree with the authors that the concept is essential if support is to be
provided for failed images, and that is one of the reasons that I have
consistently voted against the whole feature or abstained.
Response
I think you mean "... image C then needing to interact with image B."
As part of our response to Robert Corbett, we have added this sentence
"While an image is stalled, other images can still access data
on that image."
An implementation can treat a stalled image as being very like an image
that as initiated normal termination.
.......
Nick Maclaren wrote
I have implemented error recovery in run-time systems, have used and
worked on it in several contexts, and know that I am not smart enough to
specify it for a language like Fortran. Of the thousand or so language
and environment specifications I have seen, I have never seen one
specify this successfully, even for a single environment. It might be
possible in Haskell, but Fortran is not Haskell. From the lack of
convergence of these documents and the comments on the mailing list,
this TS seems to be failing in the ways that so many others have failed
before it. It is doubtful that adding this facility takes "full account
of the state of the art" (see the ISO Directives).
Response
Perhaps you were aiming for too perfect a system. It is not intended
that all possible failure scenarios be covered. For example, NOTE 5.9
explains that it might be impossible to recover from failure of image 1.
.......
Nick Maclaren wrote
I believe that there is no chance whatsoever that this issue can be
resolved, and WG5 still keep to the schedule agreed in Las Vegas (see
N2020 and N2024). Indeed, I doubt that it could be done with even a
year's delay. Solving this problem is not within the state of the art,
despite considerable efforts in a great many contexts over the past
half-century.
I believe that the whole feature of support for failing and stalled
images should be removed, possibly specified in another TS, and not
integrated until there is significant implementation and user experience
in a fairly wide variety of environments.
Response
We would like to remind you that support of failed images is not
required of the processor. It is our belief that agreeing to remove the
feature would lead to several "no" votes and that we have to "agree to
disagree".
.......
Nick Maclaren wrote
Many or most of the comments in N2013 on events have still not been
addressed, nor have some of ones on atomics and collectives. In
particular, there are assumptions of cross-facility coherence and
progress but no normative text requiring them - indeed, quite the
opposite. It is doubtful that the current TS is "consistent, clear and
accurate" (see the ISO Directives). This is extremely serious, as
adopting an inconsistent set of assumptions will make it almost
impossible to deliver the target specified in 1. Scope, paragraph 2,
even ignoring the problem of the schedule.
I do not believe that this issue is as intractable as the previous one,
because specifying data and control flow and progress are within "the
state of the art". However, I am doubtful that the facilities in TS can
be implemented efficiently without special hardware or operating system
support, while still delivering the consistency and progress that seem
to be assumed. However, even if there are no consistency problems to be
resolved, I do not believe that accepting these aspects of this TS is
compatible with keeping to the agreed schedule.
I believe that this area needs further clarity, even if not a polished
specification, before the TS should be accepted. I am not repeating the
relevant comments in N2013, because there is little point - there has
been little relevant change to the drafts.
Response
It would be very helpful to have explicit suggestions for edits. These
could then be considered for inclusion. We urge Nick Maclaren to provide
suggested edits through his national body.
.......
Dan Nagle wrote
I recommend the following change.
[27:14-15] change "a nonzero value" to "a positive value"
Error values are positive.
Response
We agree with this edit.
.......
John Reid wrote
I recommend the following changes.
[10:19] Delete "or be the value of a team variable for the initial
team".
Reason. Execution of FORM TEAM is always required.
Response
The error lies in the first part of this sentence: "The <team-variable>
shall have been defined by execution of a FORM TEAM statement in the
team that executes the CHANGE TEAM statement". It was intended that
other means of defining the team variable, including the use of
GET_TEAM, be permitted. We therefore suggest this edit:
[10:18] Change "The <team-variable> shall have been defined" to
"The value of <team-variable> shall be the value of a team variable
defined".
Further edits are needed to allow for the case where the team is the
initial team:
[10:20] Change "those defined" to "those of team variables defined".
[10:21] Change "team." to "team or be the values of a team variable for
the initial team."
[11:9] After "designator" add "and the current team is not the initial
team".
[11:12] After "TEAM_ID." add "If TEAM_ID= appears in a coarray designator
and the current team is the initial team, the value of <scalar-int-expr>
is ignored."
.......
John Reid wrote
I recommend the following changes.
[10:38], [12:21], [29:34], [34:8], [34:13]. At the end of the sentence
add "since execution last began in this team" (wavy underlined on page
34).
Reason. We need to allow for teams changing during the execution of the
program. At the October meeting, these words were added at [13:5],
[35:26], [35:37], and [36:1].
Response
We agree with these edits, except that the edit for [10:38] should be
[10:38] At the end of the sentence add "since execution last began in
the team that was current before execution of the CHANGE TEAM statement"
and this edit is needed at [10:41]:
[10:41] At the end of the sentence add "since execution last began in
the team that was current before execution of the corresponding
CHANGE TEAM statement".
Malcolm Cohen later drew our attention to the fact that the concept of
construct completion, used at [10:32] and in the new text for
[14:19-25] in our first response to Robert Corbett, has not been
defined. We therefore propose this further edit:
[10:23] At the end of the paragraph, add the sentence "A CHANGE TEAM
construct completes execution by executing its END TEAM statement."
..............
John Reid wrote
I recommend the following changes.
[13:1] Change "the team" to "team".
Reason. Definite article is wrong here.
[13:5] Remove space before period.
Response
We agree with these edits.
.......
John Reid wrote
I recommend the following changes.
[14:9] Change "detect that an image has stalled" to "manage a stalled
image".
[14:20] After "becomes a stalled image" add ". If the processor does
not have the ability to manage a stalled image, the executing image
becomes a stalled image for the rest of the execution of the program.
If the processor has the ability to manage a stalled image, the
executing image becomes a stalled image"
Reason. I think the intention is to allow implementations not to
support stalled images transferring control to the END TEAM statement.
Stalling will still happen and will need to be permanent.
Response
These edits have been superseded by the response to Robert Corbett.
.......
Van Snyder wrote
No, for similar reasons to Robert Corbett, Malcolm Cohen, and Nick
Maclaren. I am concerned that we have added incompletely
thought-through things at the last minute.
Response
See the responses to Robert Corbett, Malcolm Cohen, and Nick Maclaren.
Stan Whitlock
Yes, but I recommend the changes in Bill Long and John Reid's ballots.
Response
See the responses to Bill Long and John Reid.
-------------- next part --------------
ISO/IEC JTC1/SC22/WG5 N2041
Editor's report for WG5/N2040
Bill Long, 31 December 2014
Document WG5/N2040 was prepared by applying edits to WG5/N2033 based
on the responses to comments resulting from the ballot in
WG5/N2035. The comments from that ballot are compiled in WG5/N2038
with responses to those comments in WG5/N2039. The edits are based on
WG5/N2039. Additionally, one minor editorial fix is included at the
end of this paper. The edits from WG5/N2039 are ordered here based on
page and line number. Source references back to WG5/N2039 are included
as well as a summary of the reason for the edit.
Edits to WG5/N2033 based on comment responses in WG5/N2039.
============================================================
Clause 3 Terms and definitions
------------------------------
[5:41] Change "has encountered" to ", in a statement other than an
image control statement or an invocation of a collective or atomic
subroutine, attempts to reference or define data using".
{Source: Robert Corbett ballot comment response.}
{Reason: The existing definition of "stalled image" used the vague
term "encountered", and omitted key limitations to distinguish
stalled images from failed images.}
Clause 5 Teams
--------------
[10:18] Change the beginning of the sentence from "The <team-variable>
shall have been defined by execution of a FORM TEAM statement" to "The
value of <team-variable> shall be the value of a team variable defined
by execution of a FORM TEAM statement".
{Source: Comment from John Reid for [10:19], modified.}
{Reason: The beginning part of the sentence had syntax semantics for
team variables, whereas the ending part had value semantics. This
was inconsistent. Value semantics are correct. Edit for the
beginning of the sentence.}
{Ed: The original ballot comment removed the last part of the
sentence. This was the wrong direction for a change. The new edits
was substituted.}
[10:20] Change "those defined" to "those of team variables defined".
[10:21] Change "team." to "team or be the values of team variables
for the initial team.
{Source: Comment from John Reid for [10:19] extended}
{Reason: Need to use value semantics for team variables in this
sentence, and allow for the specified team being the initial team.}
[10:23] At the end of the paragraph, add the sentence "A CHANGE TEAM
construct completes execution by executing its END TEAM statement."
{Source: Added edit based on comments from Malcolm Cohen.}
{Reason: We subsequently say deallocations happen "when execution
of a CHANGE TEAM construct completes", but do not specify precisely
what that means.}
[10:38] At the end of the sentence add "since execution last began in
the team that was current before execution of the CHANGE TEAM statement"
{Source: Comment from John Reid, modified}
{Reason: CHANGE TEAM is similar to SYNC TEAM regarding segment
ordering, so have similar wording.}
{Ed: The original insertion, "since execution last began in this
team", is unclear in this case because the statement changes the
team. Edit modified to make it clear which team is meant.}
[10:41] At the end of the sentence add "since execution last began in
the team that was current before execution of the corresponding
CHANGE TEAM statement".
{Source: Comment from John Reid, extended}
{Reason: END TEAM is similar to SYNC TEAM regarding segment
ordering, so have similar wording.}
{Ed: This was an added edit - END TEAM was omitted from the
original ballot comment.}
[11:9] After "designator" add "and the current team is not the initial
team".
[11:12] After "TEAM_ID." add "If TEAM_ID= appears in a coarray
designator and the current team is the initial team, the value of
<scalar-int-expr> is ignored."
{Source: Comment from John Reid for [10:19] extended.}
{Reason: Account for the case that the current team is the initial
team.}
[12:21] At the end of the sentence add "since execution last began in
this team".
{Source: Comment from John Reid.}
{Reason: FORM TEAM is similar to SYNC ALL regarding segment
ordering, so make the wording parallel.}
[13:1] Change "the team" to "team".
{Source: Comment from John Reid.}
{Reason: Definite article is wrong here.}
[13:5] Remove space before period.
{Source: Comment from John Reid.}
{Reason: typo.}
[14:9] Change "detect that an image has stalled" to "manage a stalled
image".
{Source: Comment from John Reid.}
{Reason: The wording in the following edit relates to the case
where the processor can manage the stalled image.}
[14:19-25] Replace these two paragraphs by the following two
paragraphs:
"If an image, in a statement other than an image control statement or
an invocation of a collective or atomic subroutine, attempts to
reference or define data using an <image-selector> that identifies an
image that has failed, the executing image becomes a stalled image. If
the <image-selector> identifies the initial team or the processor does
not have the ability to manage a stalled image, the executing image
remains a stalled image for the rest of the execution of the program.
Otherwise, the executing image resumes execution at the END TEAM
statement of the construct after execution of all finalizations and
deallocations that would have occurred during the normal completion of
active procedures that were invoked within the CHANGE TEAM construct.
While an image is stalled, other images can still access data on that
image. If an image is stalled in the initial team, it participates in
normal termination as if it had initiated normal termination."
{Source: Comments by Robert Corbett and Bill Long, modified in
WG5/N2039.}
{Reason: Correct definition of a stalled image, clarify the steps
involved in managing a stalled image (if that capability is
supported by the implementation), and clarify what happens if an
image stalls while the current team is the initial team.}
Clause 7 Intrinsic procedures
-----------------------------
[17:32+] Add new para
"If a condition occurs that would assign a nonzero value to a STAT
argument but the STAT argument is not present, error termination
is initiated."
{Source: Comment from Reinhold Badar.}
{Reason: The semantics for STAT not present in an atomic subroutine
call was inadvertently omitted. Corresponding text at [18:24-25]
for collective subroutine calls is copied here.}
[24:22] After "to A" add "as if by intrinsic assignment".
[24:23] After "to A" add "as if by intrinsic assignment".
{Source: Comment from Tobias Burnus.}
{Reason: Need to address whether finalization occurs during
execution of CO_REDUCE. The edits tie this to the rules for
intrinsic assignment, avoiding additional edits in 4.5.6.3 "When
finalization occurs" in the base standard.}
[27:15] change "nonzero" to "positive".
{Source: Comment from Dan Nagle.}
{Reason: Error values are positive.}
{Ed: The original edit, [27:14-15] change "a nonzero value" to "a
positive value", did not account for the "processor-dependent"
between "nonzero" and "value".}
[29:34] At the end of the sentence add "since execution last began in
this team".
{Source: Comment from John Reid.}
{Reason: MOVE_ALLOC is similar to SYNC ALL regarding segment
ordering, so make the wording parallel.}
Clause 8 Required editorial changes
-----------------------------------
[32:14] Change "has encountered" to ", in a statement other than an
image control statement or an invocation of a collective or atomic
subroutine, attempts to reference or define data using".
{Source: Robert Corbett ballot comment response.}
{Reason: Same as for [5:41] above. This is the replication of that
text in Clause 8.}
[34:8] At the end of the sentence add "\uwave{since execution last
began in this team}".
{Source: Comment from John Reid.}
{Reason: ALLOCATE is similar to SYNC ALL regarding segment
ordering, so make the wording parallel.}
[34:13] At the end of the sentence add "\uwave{since execution last
began in this team}".
{Source: Comment from John Reid.}
{Reason: DEALLOCATE is similar to SYNC ALL regarding segment
ordering, so make the wording parallel.}
Additional Edit to WG5/N2033 for a minor editorial repair.
==========================================================
Clause 8 Required editorial changes
-----------------------------------
[35:26-27] Extend \uwave{} to include all of "since execution last
began on this team}.
{Ed: Incorrect LaTeX encoding caused only the first letter to have
the under wave.}
-------------- next part --------------
A non-text attachment was scrubbed...
Name: N2040.pdf
Type: application/pdf
Size: 331644 bytes
Desc: not available
Url : http://mailman.j3-fortran.org/pipermail/j3/attachments/20150101/e9c213c7/attachment-0001.pdf
More information about the J3
mailing list