(j3.2006) BLOCK-type statement

Van Snyder Van.Snyder
Fri May 30 15:19:26 EDT 2008


On Fri, 2008-05-30 at 08:01 -0600, Keith Bierman wrote:
> On May 30, 2008, at 12:00 AM, Aleksandar Donev wrote:
> 
> >
> >
> > Now, note that in practice, when they implement co-arrays (:-) no
> > compiler is likely to move any statements across SYNC MEMORY lines.  
> > But,
> 
> 
> One would have thought that systems with Transactional Memory could  
> potentially move statements across SYNC lines (if/when it can be  
> determined that the probability of a rollback is acceptably small).
> 
> <http://www.cs.wisc.edu/trans-memory/>

I'm pleased to see people thinking about this again.  This was the
central idea behind dataflow computing (look for Arvind and I-store).
I-store was central to the tagged-token dataflow computer "Monsoon" that
Greg Papadopoulus (CTO and VP of R&D at Sun) constructed at MIT for his
Masters project in 1983 (See Arvind, RS Nikhil, KK Pingali,
"I-structures: Data structures for parallel computing," Computation
Structures Group Memo 269, MIT EECS).  It was also behind the HEP,
Horizon, P RISC, J-star, T and Tera MTA.  Rishiyur Nikhil proposed a
"Von Neumann / Dataflow" architecture by adding three instructions to a
RISC core, and pairing it with an I-store and two context queues (ready,
and waiting for a transaction).  See RS Nikhil and Arvind, "Can Dataflow
Subsume Von Neumann Computing?" in Proc. 16th Annual International
Symposium on Computer Architecture, pp. 262--272, 1989 or
http://ieeexplore.ieee.org/iel4/5803/15479/00714561.pdf.

Notwithstanding that nobody has so far done it "right" this is probably
the only viable way to advance throughput with current silicon
technology.  Programming 80-core processors effectively, even using
coarrays, will be hell.  With split-phase memory transactions one can
exploit fine-grain parallelism far more easily and effectively.  To
exploit this, Fortran will need a fork/join construct, which can be
simulated in an ugly way with a CASE construct inside a DO CONCURRENT
construct, critical sections and/or locks and/or monitors for atomic
updates, and lazy argument evaluation semantics for pure functions.
Please don't pretend that fine-grain parallelism is easy and efficient
with p-threads.  P-threads are appropriate for medium-grain parallelism,
providing you are prepared to invest a similar amount of work and
travail that one endures when using MPI for coarse-grain parallelism.

A split-phase transaction and the ASYNCHRONOUS attribute would solve the
MPI problem at hand: If the program references or tries to change a
variable with the ASYNCHRONOUS attribute, the thread blocks.  Maybe
processors should simulate split-phase memory transactions on variables
that have the ASYNCHRONOUS attribute, until hardware (i.e., I-store)
does it for them.  Of course, fine-grain parallelism would be helpful to
keep busy doing something else.  That is the essence of dataflow
computing.

-- 
Van Snyder                    |  What fraction of Americans believe 
Van.Snyder at jpl.nasa.gov       |  Wrestling is real and NASA is fake?
Any alleged opinions are my own and have not been approved or
disapproved by JPL, CalTech, NASA, the President, or anybody else.




More information about the J3 mailing list