(j3.2006) (SC22WG5.5385) Straw vote on draft DTS
Thu Dec 11 14:26:14 EST 2014
On Dec 7, 2014, at 11:46 AM, Bill Long <longb at cray.com> wrote:
I have not attempted to fully digest the issues involved in the vote on the draft TS, but I have one query and a few comments supportive of Bill?s response below.
> There is substantial opinion that implementing stalled image recovery is not easy. I do not disagree. In simplest terms, it is equivalent to implementing the infrastructure to handle an exception handling mechanism. It is a bit simpler - the handler is basically internal to the runtime rather than user-specified, and if the relevant END TEAM statement lacks a STAT= specifier, the code would end up aborting anyway, so there is no need to do much before then. However, the basic process of unwinding the call stack (if there is one) that grew after the CHANGE TEAM statement execution is more or less the same as for an exception handler. Given that exception handlers already exist in other languages, and certainly at the system level, the argument that implementors do not know how to do this seems weak at best. I understand grumbling about hard work, not claims of inability.
I will probably suffer for asking this, but just out of curiosity, would building such an infrastructure make it more likely that a future standard would support user-specified exception-handling? If so, then from a user perspective, this would be a nice stepping stone.
> The more general question of whether Fortran should include fault tolerance on a timely schedule at all is really a question Fortran?s future relevance in the HPC market place. And that is the only market where Fortran has a significant fraction of programming language mindshare. The need for this capability is in the 2018-2020 ?exascale? time frame. If we miss that window, we?re seriously disadvantaged. The Fortran 2015 standard (with compilers available ~2018) is our last opportunity to meet the schedule. Alternatives like MPI and SHMEM are actively making progress in this area, realizing the same target dates are looming.
I find this reasoning very compelling. I?ve been for some time touting the failed-image feature set as an example of Fortran leading the way in an area that everyone recognizes as important at the exascale. Letting this feature set slip beyond Fortran 2015 would likely push Fortran to the back of the pack just as more and more users are getting access to coarrays. We need for their early experiences with corrays to be positive and for the path forward to appear promising.
In my experiences teaching modern Fortran courses, it appears coarrays have breathed new life into a language whose demise had long been rumored. Earlier this year, students started asking me to move coarray parallel programming to the beginning of my classes and I now use coarrays throughout each class. (My new motto is ?serial code is legacy code.?) Furthermore, considering the installed base of the Cray Compiler Environment, the Intel compiler, and the upcoming GFortran 5.0 release, I can?t help but wonder if Fortran might soon (or already) be the most widely available PGAS language o the planet. Finding ways to capitalize on this momentum by anticipating trends and responding early with relevant features might go a long way toward ensuring the language?s future.
Will there be any possibility for implementors to leverage the MPI or SHMEM progress Bill is citing? GFortran/OpenCoarrys is already doing this in other areas: using MPI?s one-sided communication and collective communication to support the same in coarray Fortran.
> The idea that vendors need to implement a facility like fault tolerance before including it in the standard is out of touch with the realities of modern-day compiler development. It might have been viable in the past, but today?s compiler vendors will implement a feature AFTER is it in the standard, not before. Not only is this an economic reality, but also a positive for program portability. In many cases from the past where vendors implement new facilities outside the standard, the features end up being ?extensions? that don?t go away but perpetually lead to non-portable code for programmers who use them. On platforms with multiple Fortran compilers, this is a recurring frustration.
Moreover, it can be hard to get vendors to implement features even long after they are in the standard and GFortran appears to be the only implementor that is actively pushing beyond the standard by already supporting the collective communication features proposed in this draft TS. I don?t think it would be good for the viability of the language to wait for vendor implementations to precede standardization, especially in this timely area. The elephants in this room are the funding agencies that are putting massive financial, technological, and human resources behind the push to reach the exascale in a timely manner. Of necessity, other parallel programming models will tackle this problem and, as suggested above, I would hope that implementors can leverage their work if implementing their own solution proves cost-prohibitive. That ability to build on top of a range of communication libraries is to me one of the best features of coarray Fortran.
More information about the J3