(j3.2006) (SC22WG5.4523) Coarray comments from Germany
John Reid
John.Reid
Mon Sep 12 09:35:58 EDT 2011
WG5,
Reinhold has sent me two coarray comments from Germany.
The first comes from Reinhold himself, who says "I've talked with Uwe
K?ster (HLRS), who has experience with the Cray implementation, as well
as Tobias Burnus; both share my serious doubts that the technical
content of N1858 is suitable (without a serious redesign effort).".
The second comes from Uwe, who wants an improved version of NOTIFY/QUERY
to be very high on the priority list.
Cheers,
John.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: comment_coarray_TS.txt
URL: <http://j3-fortran.org/pipermail/j3/attachments/20110912/98c32dbb/attachment.txt>
-------------- next part --------------
Coarray Fortran enables the programmer to formulate single sided
communication in a simple and intuitive way via the codimensions syntax.
Why is this important?
In a modern computer we see latencies for data access of various kinds.
These are memory and cache latencies, and much larger latencies in the
interconnection network.
Latencies are hindering for obtaining good performance because they limit
the bandwidth that is actually reachable for small-size messages.
In a given architecture latencies cannot be reduced. They can be avoided
by concatenating a bunch of single data to a stream of data with a
latency appearing only once at the begin of the stream.
Or they can be hidden behind other useful operations.
Requesting data by a consumer from a remote source typically means that
a latency appears twice, for sending the request (the remote address)
and transferring the data back. The advantage is that the consumer
can consume the data directly after arrival.
a = b[remote_proc]
allows for the immediate use of a after this instruction. Unless the
compiler can reorder the fetch of b[] to an earlier point of time
we have to wait for a long latency time assuming that b[] is already
defined in the remote memory.
Using the opposite direction
a[target_proc] = b
would imply nearly no latency for the image where "b" is residing.
The image target_proc is paying by the uncertainty about when the
data will arrive. A synchronizing call
sync images([target_proc,remote_proc])
ensures that "a" can be used on target_proc. But it requires more
than twice the latency.
A well formulated and well programmed parallel algorithm should
contain as few synchronization points as possible to ensure high
performance for a large number of active images.
The flow of information should go only in one direction in order to
decouple sender and receiver. This removes some latencies and
allows for pipelining.
If the order of the information transfer is not changed, a special
trailer at the end of the transmitted data can inform the target
processor about successful arrival of data.
The consuming processor may wait for the data or can do other
work in the meantime.
That is the purpose of NOTIFY --> QUERY pairs.
The sending processor informs via NOTIFY that it has initiated the
transmission and has transferred the data to the transmitting hardware.
The image target_proc recognizes the message as trailer of the data.
Without the NOTIFY --> QUERY dependence the one-sided communication
capabilities of Coarray Fortran are not complete. Unwanted
synchronization via "sync images" or "sync all" is needed.
Remark 1:
Because the notifying image could proceed to another context and
would produce other NOTIFYs in this new context for other purposes,
it will be necessary to differentiate between the different contexts.
Remark 2:
QUERY([proc]) will wait and block image proc in the case that
the image target_proc has not yet received the data.
This is very different from the behaviour QUERY([proc], READY=ready)
which will neither block image target_proc nor image proc.
I would recommend differing names, e.g. BLOCKING_QUERY for the first
case.
Uwe K?ster (kuester at hlrs.de)
More information about the J3
mailing list