(j3.2006) (SC22WG5.4439) Comments to 10-166 (early coarray TR draft)

John Reid John.Reid
Tue Apr 19 10:56:50 EDT 2011


Tobias,

Thanks very much for this comment, which we will take into account when 
considering the coarray TR during the meeting in June.

I would like to draw your attention to the WG5 paper N1835. Do you agree that 
your comment is essentially in support of Bill Long's proposal 1? Do you have 
any comments on the other proposals?

With best wishes,

John.

> admittedly, it is probably a bad timing as everyone is interested in TR 
> 29113 and not other work items. However, I happened to have time to 
> glance at 10-166 (draft of coarray TR, dated 2010/02/18).
> 
> First, I spotted an "ALL STOP" which should be an ERROR STOP (in A.1.1).
> 
> Secondly, I miss a possibility to broadcast values to all (or to a 
> team); unless I have missed something even with TR one has still to do do:
> 
>   if (this_image()==1) then
>     ! READ input file
>     ! Distribute values:
>     do image = 2, num_images()
>       z[image] = z
>     end do
>   end do
>   SYNC ALL
> 
> (Or in the "IF" a "SYNC IMAGES(*)" and in ELSE a "SYNC IMAGES(1)"). I 
> think sending the value to each other image, image by image, is rather 
> slow if many images are involved. (Assume a calculation on 6k Blue Gene 
> processors or using the full 294,912 processors of the HPC system 600 
> metres from here.) On such systems, sending the configuration can then 
> take a significant amount of the total computation time. That time is 
> wasted especially as there is a dedicated collective network with 
> one-to-all broadcast functionality.
> 
> For reductions, the draft provides the most important ones. However, I 
> see again some unneeded communication as: "A collective subroutine is 
> one that is invoked on a team of images to perform a calculation on 
> those images and which assigns the value of the result on all of them" 
> (4.1.1). While that is often the desired result, one frequently needs 
> the result only at one image. Coming again back to calculation on a 
> many-processor system: Doing the collective operations in a tree-like 
> manner and sending it to a single reduction-master image is faster than 
> collecting it on all systems - especially since there is a barrier (team 
> synchronization) after the reduction, which could be avoided on all but 
> the one image which is interested in the reduction.
> 
> Tobias
> 
> PS: It would be nice if someone could save the three comments such that 
> they can be discussed, when the topic comes up again after TR 29113.
> 
> PPS: I hope I have found the latest draft.
> _______________________________________________
> J3 mailing list
> J3 at j3-fortran.org
> http://j3-fortran.org/mailman/listinfo/j3

-- 
Scanned by iCritical.



More information about the J3 mailing list