[J3] FORM TEAM statement NEW_INDEX= specifier & failed images

John Reid John.Reid at stfc.ac.uk
Sun May 12 14:51:29 EDT 2019


Nathan,
> 
> I think there's still a problem with the FORM TEAM statement in the
> program from C.6.8. Suppose the program is executed by 11 images, so 1
> is intended to be a spare. If image 9 in the initial team fails
> immediately before it executes the first FORM TEAM statement, then
> image 10 in the initial team, which executes FORM TEAM with a
> team-number == 1 and NEW_INDEX == 10 (== me), will have specified a
> NEW_INDEX= value greater than the number of possible images in the new
> team. (In general, it appears that if an image whose image index in
> the initial team is > 1 and < images_used fails in the "setup" DO
> construct before the FORM TEAM statement, a similar situation can
> occur).

Yes, this has not been allowed for but it is a low-probability event. 
Image 9 was active when image 1 referenced FAILED_IMAGES. Nevertheless, 
we should cover the case. We seem to need to test status after the FORM 
TEAM statement.
> 
> Additionally, if this is an error condition for FORM TEAM, per 11.6.9
> p5 ("If an error condition other than detection of a failed image
> occurs, the team variable becomes undefined"), the simulation_team
> team variable would be undefined---and I assume execution of
> subsequent CHANGE TEAM statement would result in undefined behavior?

Yes, we need a test after the FORM TEAM statement.

It looks as we need to set up an interp request.

John.


> 
> Best,
> 
> --
> Nathan
> 
> On Sun, May 12, 2019 at 7:38 AM John Reid <John.Reid at stfc.ac.uk> wrote:
>>
>> Nathan,
>>
>> Nathan Weeks via J3 wrote:
>>> Hi all,
>>>
>>> Thanks for the helpful clarification (and identifying where the standard
>>> is unclear). I'll note that this issue impacts the first failed-images
>>> example in section C.6.8 of the Fortran 2018 standard, so there is
>>> motivation for clarification in the standard itself.
>>
>> I think we were a bit hasty in choosing to assign failed images to new
>> teams in a processor-dependent manner. We definitely want the C.6.8
>> example to work. It was always a design objective that following an
>> image failure, it would be possible to form a new team of active images
>> and continue the calculation there. We don't want any failed images in
>> the team because we want to be able to test for newly failed images.
>>
>> Anyway, 11.1.5.2, para 5 says
>>
>> 5 Successful execution of a CHANGE TEAM statement performs an implicit
>> synchronization of all images of the new team that is identified by
>> team-value. All active images of the new team shall execute the same
>> CHANGE TEAM statement. On each image of the new team, execution of the
>> segment following the CHANGE TEAM statement is delayed until all other
>> images of that team have executed the same statement the same number of
>> times in the original team.
>>
>> It is clearly expected that all images of the team are active. The
>> adjective "active" is not used in the first and third sentences. It
>> should be deleted from the second, for consistency.
>>
>> To go back to your question:
>>
>> "What happens in the case where an image specifies both NEW_INDEX= and
>> STAT= in a FORM TEAM statement, and the image index specified for
>> NEW_INDEX= turns out to be greater than the number of images in the
>> new team due to image failure during the execution of FORM TEAM?",
>>
>> I think this is an error condition. Note that in C.6.8 the NEW_INDEX
>> values are carefully set.
>>
>> Cheers,
>>
>> John.


More information about the J3 mailing list