(j3.2006) (SC22WG5.5385) Straw vote on draft DTS
Sun Dec 7 14:46:03 EST 2014
Please answer the following question "Is N2033 ready for forwarding to
SC22 as the DTS?" in one of these ways.
2) Yes, but I recommend the following changes.
3) No, for the following reasons.
Yes, but I recommend the following changes.
N2033: [14:23] Delete ?,without synchronization of coarray deallocations?.
Tom Clune, and others since, have noted that this phrase increases the uncertainty of how the recovery of a stalled image is expected to be implemented. Additionally, it conflicts with a basic tenant of coarrays that the existence of a coarray should be consistent across the images where the coarray was allocated If a stalled image prematurely deallocates a coarray, accesses from an active image might produce nonsense results, or even fail. This would be an undesirable exception to our normal rules.
Additional general comments:
Nick explained the rationale behind the stalled image classification. I would just add one background note. Most of the modes of inter-image activity involve statements (image control statements or calls to intrinsics) that have an optional STAT= specifier or STAT argument. In those cases, an abnormal state can be detected by a programmer and explicitly acted upon with statements in the program. If the program fails to use these facilities (no STAT= specified, or omits the optional STAT argument) and an error condition occurs, the program aborts, as has long been the case. The one exception to this model is a simple reference or definition of a variable on a remote image using the image-selector syntax. There is no ?STAT? method available there, nor would it make much sense, since the designator that includes the image selector could be in many places of a complicated expression or statement. The stalled image facility addresses this case, plugging an otherwise serious hole.
There is substantial opinion that implementing stalled image recovery is not easy. I do not disagree. In simplest terms, it is equivalent to implementing the infrastructure to handle an exception handling mechanism. It is a bit simpler - the handler is basically internal to the runtime rather than user-specified, and if the relevant END TEAM statement lacks a STAT= specifier, the code would end up aborting anyway, so there is no need to do much before then. However, the basic process of unwinding the call stack (if there is one) that grew after the CHANGE TEAM statement execution is more or less the same as for an exception handler. Given that exception handlers already exist in other languages, and certainly at the system level, the argument that implementors do not know how to do this seems weak at best. I understand grumbling about hard work, not claims of inability.
The more general question of whether Fortran should include fault tolerance on a timely schedule at all is really a question Fortran?s future relevance in the HPC market place. And that is the only market where Fortran has a significant fraction of programming language mindshare. The need for this capability is in the 2018-2020 ?exascale? time frame. If we miss that window, we?re seriously disadvantaged. The Fortran 2015 standard (with compilers available ~2018) is our last opportunity to meet the schedule. Alternatives like MPI and SHMEM are actively making progress in this area, realizing the same target dates are looming.
The idea that vendors need to implement a facility like fault tolerance before including it in the standard is out of touch with the realities of modern-day compiler development. It might have been viable in the past, but today?s compiler vendors will implement a feature AFTER is it in the standard, not before. Not only is this an economic reality, but also a positive for program portability. In many cases from the past where vendors implement new facilities outside the standard, the features end up being ?extensions? that don?t go away but perpetually lead to non-portable code for programmers who use them. On platforms with multiple Fortran compilers, this is a recurring frustration.
Finally, Tobias raised, and Malcolm elaborated and provided details on the issue of finalization in the context of CO_BROADCAST and (especially) CO_REDUCE. This issue is a side effect of the introduction of intrinsic subroutines that allow INTENT(INOUT) arguments of types that specify finalization. This case was not envisioned (or relevant) when the current "22.214.171.124 When finalization occurs? was written. Modification to the TS to account for this would be in Clause 8. I see this as essentially an integration issue. While this is important, the TS process also does allow for subsequent modifications during integration, so I don?t see this as an issue that should block the TS from progressing to a vote.
Bill Long longb at cray.com
Fortran Technical Suport & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc./ Cray Plaza, Suite 210/ 380 Jackson St./ St. Paul, MN 55101
More information about the J3