(j3.2006) (SC22WG5.4936) [ukfortran] WG5 ballot on first draft TS 18508, Additional Parallel Features in Fortran

N.M. Maclaren nmm1
Fri Mar 15 06:02:54 EDT 2013


On Mar 14 2013, Bill Long wrote:
>
>> The major difference is that I/O errors affect just one file, and the
>> minor one is that many of them are actually recoverable (though not, at
>> present, in Fortran).  The killer about node failure is that they are
>> necessarily NOT so localised.
>
>I/O errors like reading past the end of file will affect just that file. 
>Errors related to hardware failure of a disk array might affect all of 
>the files used by the program.

All right, "almost all I/O errors affect just one file".  Allocation
failure can mean that the run-time system has got corrupted, too.
We shouldn't allow rare, nasty cases to control what the standard says
about the vastly more common ones that can be handled.

Also, for better software engineering, quite a few people believe
that we should be specifying the state following easily recoverable
I/O errors.  But let's not discuss that one here.  My point is that
I/O and allocation errors are not really comparable to image failure.

>Any incomplete data transfer into or out of a failed image is probably 
>corrupt, and the standard needs to written  assuming that is the case. 
>How many other images are affected will depend highly on the nature of 
>the program.

And, as I said, it ALSO needs to say that standard output and standard
error are likely to be corrupt.  That's bad news for diagnostics!

But allowing this also affects any team of which that image is a member,
unless we constrain teams to be inactive objects except when explicit
actions are performed on them.  And every image is a member of the
original team.  So the consequences of this are not simple, and could
take many hours of consideration, discussion and drafting, probably over
several meetings.  That isn't good news for the schedule ....

>I would note that we already have STAT= specifiers on existing 
>statements like SYNC ALL.  These already provide a means to register a 
>failed image by defining the status variable with a processor-dependent 
>value.

I didn't much like that, if you recall.

>  The new feature in the TS draft is to make that particular error 
>status  equal to the value of a standard-defined named constant.   This 
>change is motivated by the new capability of effectively changing the 
>number of images in the job, so it is potentially possible for the 
>program to actually do something about the problem.

And that is what I think is a step too far, unless we also put the very
significant effort into defining exactly what the requirements are on a
processor, and state exactly how much a programmer can assume.

While there are no specified STAT values, any use of that facility
is obviously processor-dependent, and so it is the processor's job to
specify such constraints.  That was someone else's argument, perhaps
yours.  But, by specifying a value, the standard is taking on that
responsibility.


In summary, I am not adamantly against the facility, but I shall be
against it until and unless there is a clear and explicit statement
of exactly what the processor is required to do if it returns that
value, and exactly how much the program may assume if it gets it.
The answer "nothing and nothing" is acceptable, but it needs to be
in normative text, with my points above in informative text.

As I started by saying, this is NOT a minor addition.


Regards,
Nick Maclaren.






More information about the J3 mailing list