(j3.2006) (SC22WG5.4935) [ukfortran] WG5 ballot on first draft TS 18508, Additional Parallel Features in Fortran
Bill Long
longb
Thu Mar 14 14:58:35 EDT 2013
On 3/14/13 9:36 AM, N.M. Maclaren wrote:
>> >What is proposed is very similar to the way we treat I/O errors. There
>> >is a mechanism for notification of a problem (STAT=, like I/O) and a
>> >way to identify where the error occurred (failed images index values;
>> >the I/O unit number is already available to the users). Unlike I/O
>> >where we have singled out some failure modes (end-of-file, for example),
>> >we did not specify particular modes of failure for images. In current
>> >experience, it is almost always a non-recoverable memory error, but I
>> >think we should wait for more data before being more specific. The
>> >current spec is intentionally minimal.
> The major difference is that I/O errors affect just one file, and the
> minor one is that many of them are actually recoverable (though not, at
> present, in Fortran). The killer about node failure is that they are
> necessarily NOT so localised.
>
I/O errors like reading past the end of file will affect just that file.
Errors related to hardware failure of a disk array might affect all of
the files used by the program.
Any incomplete data transfer into or out of a failed image is probably
corrupt, and the standard needs to written assuming that is the case.
How many other images are affected will depend highly on the nature of
the program.
I would note that we already have STAT= specifiers on existing
statements like SYNC ALL. These already provide a means to register a
failed image by defining the status variable with a processor-dependent
value. The new feature in the TS draft is to make that particular error
status equal to the value of a standard-defined named constant. This
change is motivated by the new capability of effectively changing the
number of images in the job, so it is potentially possible for the
program to actually do something about the problem.
Cheers,
Bill
--
Bill Long longb at cray.com
Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc./Cray Plaza, Suite 210/380 Jackson St./St. Paul, MN 55101
More information about the J3
mailing list