(j3.2006) (SC22WG5.3633) [ukfortran] [Fwd: Preparing for the Tokyo meeting]

Bill Long longb
Thu Nov 6 13:28:36 EST 2008



N.M. Maclaren wrote:
> On Nov 6 2008, Bill Long wrote:
>   
>>> While that is true, multi-user (shared) systems are NOT going away, and
>>> parallel applications are a right b*gg*r to schedule on such things.   
>>>       
>> Really?  All of our systems are configured as multi-user (shared) 
>> systems, and run parallel applications almost exclusively.
>>     
>
> I am, of course, not talking about whether the systems are accessible to
> many different users, but about when the system is running a mixture of
> tasks for different users at the same time.  

That's what I'm talking about as well.  It's rare that a single program 
uses all of one of our systems. 


> Usually including a mixture
> of active and inactive interactive sessions (with all the consequences that
> GUIs imply), and relatively modest background and batch jobs or a wide
> variety of characteristics.
>
> Such systems are the workhorses of most research establishments.
>
>   
>> Job 
>> scheduling for multi-processor systems (including vanilla clusters) has 
>> been a solved problem for years.
>>     
>
> Actually, it has been a provably intractable problem since the early 1970s!
> Anyway, I am not talking about the simple (if still nasty) case of how to
> assign jobs to systems, but about the lower level.
>
> Firstly, most systems have no way to prevent one job from hogging resources
> like memory, cache, TLB entries, and so on.  

That's a management / policy decision, not a software constraint.  
Certainly not an issue relevant to whether coarrays should be in the 
Fortran standard.


> There are rarely any facilities
> to say "give this process N cores or don't run it" 

The providers of things like PBS,  moab,  lsf, ...  will be very 
disappointed to learn that their products don't work.

> or "restrict this program
> to x% of the cache or TLB entries".
>
> Secondly, the thread schedulers are almost always optimised for interactive
> (GUI) work, and behave very badly for parallel programs (which normally want
> gang scheduling).  Specialised HPC systems are the main exception.
>
> Thirdly, there are some EVIL problems to do with interrupts and mixed
> workloads, but they are far too complicated to discuss via Email.
>
>   
>>> Some administrators forbid them, and jump had on users caught running
>>> them.  
>>>       
>> I can understand an administrator jumping on users for running serial 
>> codes on a cluster, on grounds that they are wasting resources since 
>> they can run codes like that on their desktop machines.   The problem of 
>> an administrator who bans parallel jobs from a parallel computer is 
>> probably handled best by the (un)employment office.  Certainly not an 
>> issue for WG5 or J3.
>>     
>
> You are assuming that the world is like Cray.  It isn't.  I am referring
> to the multi-core systems that are used for mixed workloads, of the sort
> I describe above.  And, yes, I managed Cray-like systems for a decade, so
> I can talk both languages.
>   

The world and Cray are not as different as they once were.  We use SUSE 
Linux on our systems - not all that unusual.

>   
>>> So serial Fortran will not go away any time soon, and some people will
>>> positively want coarray-free compilers (or a fixable mode). 
>>>       
>> I agree that having a compiler switch that allows the user to fix 
>> num_images at compile time (to 1, for example), or to disable the 
>> recognition of coarrays entirely, is a fine vendor feature.  That gives 
>> the user freedom to live in a serial-only world.  ...
>>     
>
> It's not specifiable.  Some coarray programs can survive that environment;
> others can't; and it isn't possible to write a clear specification of which
> can and which can't.
>   

What is not specifiable?  We have a compiler flag, -Xn where n is the 
number of images that will fix the number at compile time. That looks 
specifiable to me.  Sure, if a program explicitly references image 2 and 
is run with only 1 image, it will get an error.  But that is clearly 
specified in the standard as non-conforming. 

> What many people will want is a switch that says "give an error on any use
> of any coarray feature".
>   

That is what I meant by "disable the recognition of coarrays".  Our 
compiler switch is -hnocaf.  Such a switch is trivial for any vendor to 
implement.

>   
>>> Er, the vast majority of Fortran users have never USED or even SEEN a
>>> vector machine (and, no, I don't count SSE etc.)  You have, and I have,
>>> but the kiddies - including post-docs here :-) - I teach never have.
>>> Some have never even HEARD of them!
>>>       
>> Sorry, but SSE does count.  ...
>>     
>
> SSE does not count because:
>
>     a) It's not scalable. In almost any serious scientific program, you 
> need 64-bit reals, so SSE supports up to 2-way. Aw, gee. It is therefore 
> not relevant to a high-level discussion of scalable parallel features, such 
> as coarrays.
>   

Vectorization (either SSE or X2)  is not an alternative to coarrays.  
They coexist and complement each other.  SSE is still vectorization.  
And I would expect the length of the registers to get longer in the 
future as chip vendors try to boost performance at fixed clock rates.

>     b) Because of that, SSE optimisations are handled by all compilers in
> similar ways to instruction scheduling, and not like true vectorisation, as
> was used on the IBM 3090, Hitachi S-3600, many Fujitsus, almost all Crays
> and so on.
>
>   

I can't speak for IBM, Hitachi, or Fujitsu, but at least I know now you 
are not aware of what happens in Cray's compiler.


Cheers,
Bill

-- 
Bill Long                                   longb at cray.com
Fortran Technical Support    &              voice: 651-605-9024
Bioinformatics Software Development         fax:   651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120

            




More information about the J3 mailing list