(j3.2006) (SC22WG5.5468) Response to the previous straw ballot on the TS
John Reid
John.Reid
Thu Mar 12 05:19:35 EDT 2015
Dear all,
Bill and I have revised our response document in the light of decisions
made at the recent J3 meeting. Here is the final version.
John.
-------------- next part --------------
ISO/IEC JTC1/SC22/WG5 N2046
Response to the WG5 straw ballot on N2040
Bill Long and John Reid
This paper contains responses to the comments in the WG5 straw ballot
on N2040 (see N2045) and a set of edits to N2040.
Reinhold Bader wrote
[1:7] After "examples" add " that illustrate the semantics described"
Reason: Many of these examples are in the Annex.
Response
The following edit addresses this and a related comment from
Malcolm Cohen.
[1:7] Replace "that the examples ... conforming" with "the correct
execution of the examples in Annex A that illustrate the semantics
described in clauses 6 and 7". [See J3/15-122]
.......
Reinhold Bader wrote
[14:30] Replace "constuct" by "construct".
[17:7] Delete superfluous space after "ISO_FORTRAN_ENV".
[44:15], [44:17] Replace "subcauses" by "subclauses", twice.
Response
Agreed. The sentence containing the first edit was removed as
part of a separate edit. [See J3/15-123 and J3/15-147.]
.......
Reinhold Bader wrote
[14:30+] Add the following text:
"Deallocation of coarrays is delayed until the statement that
performs the deallocation on all active images of the current
team has synchronized these images."
Reason: Avoid a race condition for definitions/references to
such coarrays on the stalled image (cf. [31:31-34], [36:11-15]).
It may be appropriate to also add a note that such a statement
must be a DEALLOCATE or a invocation of MOVE_ALLOC, either of
which must have STAT= specified.
Response
In view of the no votes of Robert Corbett, Malcolm Cohen, and
Van Snyder, the concept of stalled images has been replaced
by continued execution using a processor-dependent value when a
data object is referenced on a failed image.
[See J3/15-147 and J3/15-151r1]
.......
Reinhold Bader wrote
[33:16+] Add missing bullet
"* extensions of image selector syntax and semantics provide the
capability to access coarray data across team boundaries;"
Response
This was included in a rewrite of that subclause. [See J3/15-156r1]
.......
Reinhold Bader wrote
[33:19-20] Replace "provide low-level primitives ... computation;" by
" provide the ability to perform non-trivial operations across image
boundaries on scalars of some intrinsic types in unordered segments;"
Reason: The text should describe what the atomics do, beyond the
already existing ones.
Response
Agreed, but a more precise wording is needed.
[33:19-20] Replace "atomic memory operations provide low-level
primitives ... computation;" by
"additional atomic subroutines for integer addition, compare
and swap, and bitwise computations;"
[See J3/15-128r2, and J3/15-156r1]
.......
Reinhold Bader wrote
[35:24-25] Replace by
"{In 4.5.6.2 The finalization process, replace the text of NOTE 4.48}
An implementation might need to ensure that when more than one coarray
must be deallocated by execution of a single statement, they are
deallocated in the same order on all images in the current team."
Reason: The term "event" now has a defined meaning that has nothing to
do with the NOTEs scenario.
Response
The editor, Malcolm Cohen, pointed out that "event" is used in many
other places as an ordinary English word.
.......
Reinhold Bader wrote
[36:35-36] Consider the statement
SYNC MEMORY
executed by all active images of the current team, one image of which
has failed. According to the semantics defined here and in [38:25-26]
error termination must be initiated on each executing image of the
current team; in particular this involves cross-image activity
that was not required by Fortran 2008. Was this intended? If not, is
it sufficient to make the following edit to [36:35]:
Replace "If" by "Except in a SYNC MEMORY statement, if" ?
Response
A SYNC MEMORY statement may be waiting for an outstanding memory
operation to complete. If this involves a failed image, it
will never complete. This is an error situation. If the user wishes
execution to continue, he or she needs to have added a STAT=
specifier. No change to the text is needed. However, it is the case
that SYNC MEMORY should not be in the list at [38:25]. An edit is
provided to fix this.
[38:25] Replace "SYNC ALL, or SYNC MEMORY" by "or SYNC ALL".
[See 15-125r2]
.......
Reinhold Bader wrote
[37:14] Before "FORM TEAM", insert "\uwave{EVENT POST, EVENT WAIT,}".
Reason: Similar to locks, events only impose one-way segment ordering,
and this ordering is already defined in [18:21-24], so a SYNC MEMORY
appears unnecessary. See 09-193r2 for the reasoning for LOCK/UNLOCK.
Response
Agreed [See J3/15-125r2]
.......
Reinhold Bader wrote
[37:18+] Add a new edit
"{In 8.5.2 Segments, edit the first sentence of NOTE 8.34 as follows}
The model upon which the interpretation of a program is based is that
there is a permanent memory location for each coarray and that all
images \uwave{on which it is established} can access it."
Response
Agreed [See J3/15-125r2]
.......
Reinhold Bader wrote
[38:13] Delete "on all images"
Reason: For each statement it is clear on which images it is
executed; this may be a subset of all images.
Response
Agreed, but J3 chose this edit
[38:13] Replace "on all images" by "on the involved images".
[See J3/15-125r2]
.......
Reinhold Bader wrote
[42:22] Replace "in the current team when the coarray was established"
by "in the most remotely removed current or ancestor team in which
the coarray is established."
Reason: The problem with the present wording is that the set of
images on which a coarray is established may change throughout
execution time (and also across images). To avoid ambiguity, I
suggest looking at the establishment at the point (and the image)
where the intrinsic is executed. This also seems appropriate
for assuring composability of the coarray team concept - a huge
UCOBOUND that cannot be addressed by any means in the local
context would not seem to make sense.
Response
The text is talking about the coarray as it now is. No change is
needed.
.......
Malcolm Cohen wrote
(a) I agree with Robert Corbett's vote. My recommendation is that the transfer
of control to the END TEAM statement should be available only for access to
failed image data from within the CHANGE TEAM construct itself.
(a2) 5.9 states
"Otherwise, the executing image resumes execution at the END TEAM statement of
the construct"
"the construct" lacks definition. There can be many CHANGE TEAM constructs, and
more than one of them can be active. Presumably what is meant is either
(i) the innermost such construct
or
(ii) the innermost such construct whose END TEAM statement has a STAT=
specifier.
This needs to be explicitly stated. I note that in the case of executing code
outside (but called from) a CHANGE TEAM construct, "innermost" has no meaning.
Response
In view of your no vote and those of Robert Corbett and Van Snyder, the concept
of stalled images has been replaced by continued execution using a processor-
dependent value when a data object is referenced on a failed image.
[See J3/15-147 and J3/15-151r1]
.......
Malcolm Cohen wrote
(b) The TS has merely scratched the surface of the semantics that are
being specified for stalled image handling; much more work needs to be
done to clarify what is supposed to happen (e.g. which variables become
undefined, etc.). Even for failed images some additional work appears
to be needed...
Response
It is intended that implementors should be able to support failed
images without losing any optimization opportunities that are
available without this. Since the point of failure within
a segment will be unknown, it seems simplest to specify that any
data object that might be defined or undefined by execution of the
segment will be undefined. We suggest the following edit:
[14:12+] Add "When an image fails during the execution of a segment,
a data object on a non-failed image becomes undefined if it might be
defined or undefined by execution of a statement of the segment other
than an invocation of an atomic subroutine."
[See J3/15-135r3]
.......
Malcolm Cohen wrote
(c) I do not agree with the syntax for specifying a team variable in an
image-selector, as we use double colons following type-specs and other related
attributes, which this is certainly not. A single colon would be acceptable.
Response
We have considered both single and double colon notation. The problem
with the single colon notation is that it visually looks like the
notation for an array section. While we currently do not allow
co-sections, it is a fairly obvious extension in the future. Earlier
coarray extensions had already allowed it.
In an email, Reinhold Bader wrote
"Alternatively, one could consider making the notation analogous to the
TEAM_ID one, say
type(team_type) :: ancestor
a[i, TEAM=ancestor] = ...
Some characters more to type in, but also more consistent."
We prefer this syntax. As well as being "more consistent", the meaning
is obvious to the reader of code - it does not rely on remembering the
meaning of the double colon. We suggest the following edits:
[11:4-6] Replace R624 and C509 by
"R624 <image-selector> <<is>> <lbracket> <cosubscript-list> <>
<> [, <team-identifier>] <rbracket>
R624a <team-identifier> <<is>> TEAM_ID = <scalar-int-expr>
<<or>> TEAM = <team-variable>"
[11:7] Replace "<team-variable>" by "TEAM =" and "it" by
"<team-variable>".
[11:14+] In NOTE 5.2, line 2, replace
"[ancestor::i]" by "[i,TEAM=ancestor]".
[11:14+] In NOTE 5.3, lines -4 and -3, replace
"[INITIAL::ME+1]" by "[ME+1,TEAM=INITIAL]" and
"[INITIAL::ME-1]" by "[ME-1,TEAM=INITIAL]".
[28:45] Replace
"[PARENT_TEAM::1]" by "[1,TEAM=PARENT_TEAM]".
[35:28-30] Replace R624 and C627A by
"R624 <image-selector> <<is>> <lbracket> <cosubscript-list> <>
<> [, <team-identifier>] <rbracket>
R624a <team-identifier> <<is>> TEAM_ID = <scalar-int-expr>
<<or>> TEAM = <team-variable>"
[35:33] Replace "<team-variable> or a TEAM_ID specifier if either"
with "a <team-identifier> if it".
[See J3/15-124]
.......
Malcolm Cohen wrote
(d) FAIL IMAGE is insufficiently specified.
- The syntax is "FAIL IMAGE <stop-code>". I see no purpose in using the
<stop-code> BNF rule here.
- "Execution of a FAIL IMAGE statement causes the executing image to behave as
if it has failed."
I think that should be "...become a failed image."
- " No further statements are executed by that image."
I think it would be clearer to state explicitly that image termination is not
initiated by this statement, e.g.
" Neither normal nor error termination is initiated, but no further statements
are executed by that image."
- "When an image executes a FAIL IMAGE statement, its stop code, if any, is made
available in a processor-dependent manner."
This is not only completely useless, but also missing any useful recommendation;
e.g. for STOP and ERROR STOP we recommend "formatted output to [ERROR_UNIT]".
Response
J3 decided to remove the stop code from the FAIL IMAGE statement. If
the user wishes to output a message he/she can do so in a write
statement ahead of the FAIL IMAGE statement.
[See J3/15-129r2]
.......
Malcolm Cohen wrote
(e) Clause 1 states
"This Technical Specification does not specify formal data consistency or
progress models. Some level of asynchronous progress is required to ensure that
the examples in clauses 6 and 7 are conforming."
- point 1: there are no examples in clause 6;
- point 2: I found no useful examples in clause 7, by which I mean any that are
bigger than 1 statement and that make any use of data consistency or
asynchronous progress;
- point 3: were there useful examples the question would not be whether they
were conforming, but whether they WORKED on any conforming implementation of the
TS.
Response
J3 adopted the following edit in response to these points
[1:7] Replace "that the examples ... conforming" with "the correct
execution of the examples in Annex A that illustrate the semantics
described in clauses 6 and 7". [See J3/15-122]
.......
Malcolm Cohen wrote
(f) Clause 1 continues
"Developing the formal data consistency and progress models is left
until the integration of these facilities into ISO/IEC 1539-1."
We need to get started on this straight away, not leave it to the last minute.
Response
A start was made on these issues [See J3/15-139]. Since they apply to the
features of Fortran 2008, it would seem appropriate to address them during
the integration phase for Fortran 2015 rather than within the Technical
Specification.
.......
Robert Corbett wrote
I am still concerned about the features described in Clause 5.9
I understand that allowing stalled images to resume execution
is a desired feature. I am not convinced that the feature as
described in the DTS can be implemented without imposing a
severe performance penalty. I understand that the ability to
resume stalled images is an optional feature. I think that
even an optional feature should be required to be implementable.
I would change my vote if a description of how the feature could
be implemented is provided, assuming that the proposed
implementation is reasonable. (Implementation via an interpreter,
for example, would not satisfy me.) I would like the proposed
implementation to be based on hardware and systems software that
is commonly available. A proposal for an implementation for
x86/x64 Linux would be fine. The description of TS. A separate
paper, not subject to approval would suffice.
One implementation proposal I shall not accept is that the
implementation should be the same as whatever the GCC
implementation of C++ does for exception handling. I spoke
with a member of Oracle's C++ team, and he said that Oracle's
implementation of C++ exception handling could not do
everything I told him the DTS requires.
Response
In view of your no vote and those of Malcolm Cohen and Van Snyder, the
concept of stalled images has been replaced by continued execution using a
processor-dependent value when a data object is referenced on a failed
image. [See J3/15-147 and J3/15-151r1]
.......
Robert Corbett wrote
The DTS imposes some implicit requirements on processors. For
example, some Fortran features require an implementation to
perform synchronization. An implementation of a CRITICAL
construct, a SYNC ALL statement, a parallel reduction, or
input/output is likely to involve synchronization. If an
image stalls on a data reference during the execution of a
CRITICAL construct within the scope of execution of a CHANGE
TEAM construct, I assume that the DTS assumes that a lock
held by the image as part of the synchronization done for
the CRITICAL construct must be released before execution of
of the stalled image resumes.
Response
Agreed. [See J3/15-135r3]
.......
Robert Corbett wrote
The DTS does not appear to impose a requirement that storage
allocated during execution of a stalled image be released before
execution of the stalled image resumes. Is the possible memory
leak permitted?
Fortran processors often acquire system resources during execution.
For example, some operating systems allow a process to use at most
a fixed number of locks and events. To avoid running out of the
system resources, the process must release resources it acquired
when it no longer needs them. Is it intended that the DTS require
that a process release such resources as are no longer needed when
an associated stalled image resumes execution, or is it a quality
of implementation issue?
Response
In view of your no vote and those of Malcolm Cohen and Van Snyder, the
concept of stalled images has been replaced by continued execution using a
processor-dependent value when a data object is referenced on a failed
image. [See J3/15-147 and J3/15-151r1]
.......
Nick Maclaren wrote
No, for the reasons given in N2038, N2013 and other votes. I need to
reiterate that neither response in N2039 even addresses my comments. I
believe that incorporating the TS into the main standard will cause
serious harm to Fortran, because the (semantic) difficulties cannot be
resolved (let alone specified unambiguously) in the time available.
Indeed, it is not clear even that they ARE soluble, because this TS is
specifying a feature that is beyond the state of the art, and has been
for half a century. I would be prepared to change my vote to abstain if
the decision to incorporate it were reversed.
Response
It is our belief that agreeing to delay to a later revision of the
Fortran standard would lead to several "no" votes. Failure to
standardize a resilience capability before compilers implement F2015
would lead to vendors implementing incompatible schemes, hurting the
goal of code portability.
.......
David Muxworthy wrote
3) No, for the following reasons.
The timescales specified in N2024 do not allow adequate time for the
designs in N2040 to be implemented and proved to be robust and
portable before being standardized. I would change my vote if "the
next revision" on page iv were to be changed to "a future revision".
Response
See response to Nick Maclaren above.
.......
Dan Nagle wrote
change "functions" to "subroutines" at [31:3]
change "function" to "subroutine" at [31:17]
Response
Agreed. [See J3/15-123]
.......
Anton Shterenlikht wrote
A.3.1
In the first example, why x and y are defined as coarray variables?
This fact seems to be completely unused.
Also, is it not possible for image P to read x_dot_y (line 8) from
image Q, before this variable has been defined on image Q in line 7?
Is this what Note 7.4 is saying?
In the second example, line 17, j_max is undefined. I think what
was meant is:
16 integer :: j_max, j_max_location
j_max = j
17 call co_max(j_max)
Response
While it is not necessary for x and y are defined as coarray variables
in the first example, the present code is correct and J3 prefers to
leave it unchanged.
It is not possible for image P to read x_dot_y (line 8) from
image Q before this variable has been defined on image Q in line 7
because the code implementing the collective on image P must wait
for the argument association on image Q to have occurred and this
must be after the definition on line 7. We suggest this edit to NOTE 7.4.
[20:28+] In line 3 of NOTE 7.4, after "images." add "A transfer from
an image cannot occur before the collective subroutine has been
invoked on that image."
We agree with your edit for the second example.
[See J3/15-130r1]
.......
Van Snyder wrote
I am concerned by Robert Corbett's comments.
I don't quite know what to do because I'm not expert in the area of
synchronization mechanisms. One concern that especially troubles me is
Robert's observation that there is no specification that locks held by a
stalled image be unlocked when the image resumes execution, and critical
sections in which stalled images are executing when they stalled be
considered to be completed. I don't know what other "gotchas" lurk in
similar areas.
My contacts on the Ada committee assure me that exception handling can
be done with very low overhead, but I have not asked them about
interactions between exception handling, locks, critical sections, and
synchronization.
I'm tempted to abstain, but the lack of description of the interaction
of resuming a stalled image with synchronization, locks, and critical
sections leads me to vote no.
Response
In view of your no vote and those of Malcolm Cohen and Robert Corbett, the
concept of stalled images has been replaced by continued execution using a
processor-dependent value when a data object is referenced on a failed
image. [See J3/15-147 and J3/15-151r1]
More information about the J3
mailing list