(j3.2006) [ukfortran] (SC22WG5.4944) AW: Thoughts on Reinhold's thoughts
N.M. Maclaren
nmm1
Sat Mar 30 07:30:33 EDT 2013
On Mar 30 2013, Bader, Reinhold wrote:
>>
>> 'Outside' images could push data into the subteam, but it could not use
>> it, because there is no way to synchronise.
>
> My data_feeder example effectively uses double buffering and separates
> buffer exchange (a pure memory operation) via a partial synchronization
> operation in the ancestor team.
I don't see that it helps, unfortunately. At this level, it doesn't
make much difference whether an operation is a pure memory one or
involves computation.
>> Well, actually, there is,
>> but I am in two minds about whether it is a facility or a loophole that
>> needs closing.
>>
>> If 'outside' synchronises with 'inside' using consistent atomics and
>> SYNC MEMORY, should that be legal?
>
> If coindexing ancestor-inherited coarrays is possible. I believe it
> shouldn't be.
Now I am puzzled. That is almost exactly the case you are trying to
tackle in data_feeder. I didn't mean data transfer via consistent
atomics, but the ordering. The data transfer would be via coindexing.
As far as I can see, my suggestion provides precisely the facility
you want.
My concern is that the implementation needs to assume that ordinary
coarrays can change value in the same way that volatile variables can
(i.e. they can be modified by actions in other universes). For example,
SYNC MEMORY has to assume that any untouched coarrays or sections of
coarrays may have changed, which is NOT good for performance!
On the other hand, I don't see that as catastrophic, given that Fortran's
PGAS model is essentially identical to MPI's passive one-sided and, to
some extent, we already have that issue. What it does mean is that the
implementation CANNOT simply generate fast, untracked access to coarrays
and put all of the synchronisation in SYNC MEMORY. That won't fly.
Let's consider a push-driven model. At the very least, an
implementation will have to implement release in a SYNC MEMORY by
handshaking with all images on which the invoking image has updated
data, ensuring that they know what has changed AND have taken a safe
copy. Worse, that includes all LOCAL coarray data. This is because it
will have to implement acquire in a SYNC MEMORY by ensuring that all
data updated on its image is merged in, and it has flushed all cached
copies of data held on other images.
The effect of this is that a SYNC ALL on a team is likely to be as
expensive as a SYNC ALL on the whole image space, irrespective of the
size of the team. That's not pretty, and risks implementations taking
short cuts and getting it wrong.
Naturally, that doesn't necessarily apply to systems with hardware RDMA
support, because the data synchronisation is automatic. But it applies
to anything based on Ethernet, OpenIB or any other asynchronous message
passing interface.
Regards,
Nick Maclaren.
More information about the J3
mailing list