[J3] asynchronous coarray collectives?
Jeff Hammond
jehammond at nvidia.com
Sat May 6 19:46:40 UTC 2023
Also, I had time to look at the spec and your example is also illegal because of 11.1.7.1 :
The execution order of the iterations can be left indeterminate (DO CONCURRENT);
Your example code will deadlock if the compiler avails itself of this opportunity and the iteration order is not identical on all images. Regardless of whether this is a perverse implementation, it is a legal one.
I know you hate everything I do here and must provide negative feedback every time I mention the word asynchronous, but it would be nice if you could reply with suggestions that violate the standard is less than two ways in the future.
Jeff
On 6. May 2023, at 22.37, Jeff Hammond via J3 <j3 at mailman.j3-fortran.org> wrote:
External email: Use caution opening links or attachments
GCC and Intel compilers believe that your suggestion is invalid Fortran, Van. What compiler did you test with?
jhammond at nuclear:~$ ifx -c codc.F90
codc.F90(7): error #8890: A DO CONCURRENT or CRITICAL construct may not contain a call to a collective subroutine. [CO_SUM]
if (i.eq.1) call co_sum(A)
-------------------------^
codc.F90(8): error #8890: A DO CONCURRENT or CRITICAL construct may not contain a call to a collective subroutine. [CO_MIN]
if (i.eq.2) call co_min(B)
-------------------------^
codc.F90(9): error #8890: A DO CONCURRENT or CRITICAL construct may not contain a call to a collective subroutine. [CO_MAX]
if (i.eq.3) call co_max(C)
-------------------------^
compilation aborted for codc.F90 (code 1)
jhammond at nuclear:~$ gfortran -fcoarray=single -c codc.F90
codc.F90:7:34:
7 | if (i.eq.1) call co_sum(A)
| 1
Error: Subroutine call to intrinsic ‘co_sum’ in DO CONCURRENT block at (1) is not PURE
codc.F90:8:34:
8 | if (i.eq.2) call co_min(B)
| 1
Error: Subroutine call to intrinsic ‘co_min’ in DO CONCURRENT block at (1) is not PURE
codc.F90:9:34:
9 | if (i.eq.3) call co_max(C)
| 1
Error: Subroutine call to intrinsic ‘co_max’ in DO CONCURRENT block at (1) is not PURE
jhammond at nuclear:~$ cat codc.F90
subroutine stuff(A,B,C,D)
implicit none
real, intent(inout) :: A, B, C
real, intent(in) :: D(:)
integer :: i
do concurrent (i=1:3)
if (i.eq.1) call co_sum(A)
if (i.eq.2) call co_min(B)
if (i.eq.3) call co_max(C)
end do
print*,D
end subroutine stuff
On 6. May 2023, at 22.32, Jeff Hammond via J3 <j3 at mailman.j3-fortran.org> wrote:
External email: Use caution opening links or attachments
I was not aware that coarray collectives were pure. They have strong side effects. Are you sure they are pure?
If the do concurrent is execute in a random order on a single thread, which seems like a legal implementation, this will deadlock.
Finally, how does do concurrent inform the Fortran implementation to take advantage of network offloading?
On 6. May 2023, at 21.39, Van Snyder via J3 <j3 at mailman.j3-fortran.org> wrote:
External email: Use caution opening links or attachments
Try this:
subroutine stuff(A,B,C,D)
implicit none
double, intent(inout) :: A, B, C
double, intent(in) :: D(:)
integer :: I
do concurrent ( i = 1:3 )
select case ( i )
case ( 1 )
call co_sum(A)
case ( 2 )
call co_min(B)
case ( 3 )
call co_max(C)
end select
end do
print*,D
end subroutine stuff
Bill Long pointed out several years ago that a fork/join construct would be syntactic sugar for this.
On Sat, 2023-05-06 at 07:59 +0000, Jeff Hammond via J3 wrote:
How do I make a Fortran coarry program like this...
subroutine stuff(A,B,C,D)
implicit none
double, intent(inout) :: A, B, C
double, intent(in) :: D(:)
call co_sum(A)
call co_min(B)
call co_max(C)
print*,D
end subroutine stuff
...behave like this...
subroutine stuff(A,B,C,D)
use mpi_f08
implicit none
double, intent(inout) :: A, B, C
double, intent(in) :: D(:)
type(MPI_Request) :: R(3)
call MPI_Iallreduce(MPI_IN_PLACE, A, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD, R(1))
call MPI_Iallreduce(MPI_IN_PLACE, B, 1, MPI_DOUBLE, MPI_MIN, MPI_COMM_WORLD, R(2))
call MPI_Iallreduce(MPI_IN_PLACE, C, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD, R(3))
print*,D
call MPI_Waitall(3,R,MPI_STATUSES_IGNORE)
end subroutine stuff
...in the sense that it is possible for the network to execute the communication operations asynchronously relative to the print statement?
Do any compilers, e.g. Cray’s, automatically convert coarry operations to asynchronous communication and push the completion of those operations as far down as possible?
Jeff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.j3-fortran.org/pipermail/j3/attachments/20230506/54733401/attachment-0001.htm>
More information about the J3
mailing list