[J3] asynchronous coarray collectives?

While clearly not a Fortran feature today, for the future there is an idea
of adding an event_type event argument to the collectives recorded on
github at https://github.com/j3-fortran/fortran_proposals/issues/272


type(event_type) :: acall co_sum(x, event = a) ! "Immediate return"
like MPI_Iallreducecall do_something_independent(y)
event wait (a) ! can't use x until co_sum is done


> More specifically, Bill, I am looking for a way that a Fortran compiler
> could use network operations such as the following from Cray DMAPP, which
> might already be supported by Cray’s libpgas.
> https://support.hpe.com/hpesc/public/docDisplay?docId=a00113974en_us&page=dmapp_c_greduce_start.html
> https://support.hpe.com/hpesc/public/docDisplay?docId=a00113974en_us&page=dmapp_c_pset_wait.html
> I have not paid enough attention to libfabric so I can’t describe this in
> terms of Slingshot software, other than MPI.  I assume you know people who
> know the answers.
> How would reordering those statements make them execute simultaneously or
> offload to the network coprocessor?
> I recognize the typo and fixed it in the version of the code for which
> test results were reported.
> Hi Jeff,
> The declaration "double" is not going to get past any compiler I know
> about. Maybe DOUBLE PRESICION or REAL(8), ...
> The 3 calls and the print statement don't have data dependencies, so a
> compiler could re-order the statements.  If this were a real-life code, a
> case might be made.
> How do I make a Fortran coarry program like this...
> subroutine stuff(A,B,C,D)
>  implicit none
>  double, intent(inout) :: A, B, C
>  double, intent(in) :: D(:)
>  call co_sum(A)
>  call co_min(B)
>  call co_max(C)
>  print*,D
> end subroutine stuff
> ...behave like this...
> subroutine stuff(A,B,C,D)
>  use mpi_f08
>  implicit none
>  double, intent(inout) :: A, B, C
>  double, intent(in) :: D(:)
>  type(MPI_Request) :: R(3)
>  call MPI_Iallreduce(MPI_IN_PLACE, A, 1, MPI_DOUBLE, MPI_SUM,
>  call MPI_Iallreduce(MPI_IN_PLACE, B, 1, MPI_DOUBLE, MPI_MIN,
>  call MPI_Iallreduce(MPI_IN_PLACE, C, 1, MPI_DOUBLE, MPI_MAX,
>  print*,D
>  call MPI_Waitall(3,R,MPI_STATUSES_IGNORE)
> end subroutine stuff
> ...in the sense that it is possible for the network to execute the
> communication operations asynchronously relative to the print statement?
> Do any compilers, e.g. Cray’s, automatically convert coarry operations to
> asynchronous communication and push the completion of those operations as
> far down as possible?
> Jeff
