(j3.2006) (SC22WG5.3615) Preparing for the Tokyo meeting
N.M. Maclaren
nmm1
Tue Nov 4 13:08:58 EST 2008
This is not my message! It first wasn't whitelisted, and then I sent it to
the wrong address. Anyway, I am sending it again.
Nick.
Jim Xia wrote to the J3 list:
>
> Would you please share with us on what architectures the coarray will be
> less than helpful. I thought one strength of coarrays is they're
> architecture neutral. My imagination is limited by whatever machine
> architectures we're having today but I'm interested in learning its
> potential limitations in future, so I'd like to hear your opinion where
> you can foresee the coarray feature will fail.
I don't get the J3 list, so have only just seen this. I shall be in Tokyo,
with a pure coarray hat on, so please let us have some in-depth discussions.
I attach a copy of a long paper on implementation techniques that the UK
people and Bill have seen, and Bill and I have debated. Despite its length,
it glosses over the technical aspects, as we start getting into interrupt,
memory and device handler designs (hardware, firmware and operating system).
I should be very happy to discuss these, preferably over a drink or two!
My personal executive summary is that, if we exclude VOLATILE, the only
critical technical issue is what Fortran should say about progress. As
N1744 says, there are several places where things need saying explicitly,
but the issue there is wording rather than agreement on intent. Since
writing N1744, I have had discussions with Aleksandar, and have realised
that I underestimated its importance. Not that it is insoluble, more that
it needs a hard decision (and then some wording to express that decision).
I append (not attach) a short description of the issue, which may yet
appear in a paper.
The killer is that, if Fortran requires 'transparent' access to coarrays on
other images (i.e. that proceed irrespective of what that image is doing),
it is implementable using DEFINED hardware and software facilities only if
the hardware, operating system and compiler are all provided by the same
organisation (or ones that collaborate so closely as to be almost one).
Of course, that is assuming that people want reliable implementations.
On the other hand, if it is to be implementable using only facilities that
are defined in formal or informal standards, it will be almost unusable.
That's not nice, at all. That is precisely why MPI has specified what it
has, and why UPC and POSIX threads do not work as many people claim that
they do. And, yes, those problems arise in practice :-(
The worst systems are 'commodity clusters'. I should be very interested
to talk to Toon about this, but the problem is one of very low-probability
failures, because the generated code relies on undefined behaviour which
really does mean that, and not processor dependent behaviour. Memory race
conditions are the main (but not the only) one.
VOLATILE coarrays make these problems a hundred times worse - and, from my
experience with POSIX threads, RDMAs, OpenMP etc., I do mean a hundred times
and possibly more.
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1 at cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679
Explanation of the Progress Issue
---------------------------------
The question is whether images P and Q can communicate through a coarray
on image R, irrespective of what R is doing at the time. This is
extremely hard to implement on some systems, at least when R is in a
call to a companion processor, performing I/O or in a long-running
'pure' CPU loop.
For example:
PROGRAM Progress
INTEGER :: one[*] = 0
SELECT CASE (THIS_IMAGE())
CASE(1)
one[9] = 123+one[8]
SYNC IMAGES ( (/ 2 /) )
CASE(2)
SYNC IMAGES ( (/ 1 /) )
PRINT *, one[9]
CASE(8)
one[2] = 456+one[1]
SYNC IMAGES ( (/ 9 /) )
CASE(9)
SYNC IMAGES ( (/ 8 /) )
PRINT *, one[1]
END SELECT
END PROGRAM Progress
Consider a processor where an image services requests for coarray data
that it owns only when it reaches an image control statement; this is
common for MPI, and is also done by the reference implementation of UPC.
The above program will deadlock, because image 1 will not reach its SYNC
IMAGES until after images 8 and 9 have responded, and image 8 will not
reach its SYNC IMAGES until after images 1 and 2 have responded.
Obviously, that is a poor implementation of coarrays, but that is not
the point at issue. The question is whether it is a conforming
processor in the sense of 1.4 paragraph 2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: paper_4.txt
Type: application/unknown
Size: 40864 bytes
Desc: paper_4.txt
Url : http://j3-fortran.org/pipermail/j3/attachments/20081104/9ae351a9/attachment-0001.bin
More information about the J3
mailing list