(j3.2006) (SC22WG5.3633) [ukfortran] [Fwd: Preparing for the Tokyo meeting]
Bill Long
longb
Thu Nov 6 13:28:36 EST 2008
N.M. Maclaren wrote:
> On Nov 6 2008, Bill Long wrote:
>
>>> While that is true, multi-user (shared) systems are NOT going away, and
>>> parallel applications are a right b*gg*r to schedule on such things.
>>>
>> Really? All of our systems are configured as multi-user (shared)
>> systems, and run parallel applications almost exclusively.
>>
>
> I am, of course, not talking about whether the systems are accessible to
> many different users, but about when the system is running a mixture of
> tasks for different users at the same time.
That's what I'm talking about as well. It's rare that a single program
uses all of one of our systems.
> Usually including a mixture
> of active and inactive interactive sessions (with all the consequences that
> GUIs imply), and relatively modest background and batch jobs or a wide
> variety of characteristics.
>
> Such systems are the workhorses of most research establishments.
>
>
>> Job
>> scheduling for multi-processor systems (including vanilla clusters) has
>> been a solved problem for years.
>>
>
> Actually, it has been a provably intractable problem since the early 1970s!
> Anyway, I am not talking about the simple (if still nasty) case of how to
> assign jobs to systems, but about the lower level.
>
> Firstly, most systems have no way to prevent one job from hogging resources
> like memory, cache, TLB entries, and so on.
That's a management / policy decision, not a software constraint.
Certainly not an issue relevant to whether coarrays should be in the
Fortran standard.
> There are rarely any facilities
> to say "give this process N cores or don't run it"
The providers of things like PBS, moab, lsf, ... will be very
disappointed to learn that their products don't work.
> or "restrict this program
> to x% of the cache or TLB entries".
>
> Secondly, the thread schedulers are almost always optimised for interactive
> (GUI) work, and behave very badly for parallel programs (which normally want
> gang scheduling). Specialised HPC systems are the main exception.
>
> Thirdly, there are some EVIL problems to do with interrupts and mixed
> workloads, but they are far too complicated to discuss via Email.
>
>
>>> Some administrators forbid them, and jump had on users caught running
>>> them.
>>>
>> I can understand an administrator jumping on users for running serial
>> codes on a cluster, on grounds that they are wasting resources since
>> they can run codes like that on their desktop machines. The problem of
>> an administrator who bans parallel jobs from a parallel computer is
>> probably handled best by the (un)employment office. Certainly not an
>> issue for WG5 or J3.
>>
>
> You are assuming that the world is like Cray. It isn't. I am referring
> to the multi-core systems that are used for mixed workloads, of the sort
> I describe above. And, yes, I managed Cray-like systems for a decade, so
> I can talk both languages.
>
The world and Cray are not as different as they once were. We use SUSE
Linux on our systems - not all that unusual.
>
>>> So serial Fortran will not go away any time soon, and some people will
>>> positively want coarray-free compilers (or a fixable mode).
>>>
>> I agree that having a compiler switch that allows the user to fix
>> num_images at compile time (to 1, for example), or to disable the
>> recognition of coarrays entirely, is a fine vendor feature. That gives
>> the user freedom to live in a serial-only world. ...
>>
>
> It's not specifiable. Some coarray programs can survive that environment;
> others can't; and it isn't possible to write a clear specification of which
> can and which can't.
>
What is not specifiable? We have a compiler flag, -Xn where n is the
number of images that will fix the number at compile time. That looks
specifiable to me. Sure, if a program explicitly references image 2 and
is run with only 1 image, it will get an error. But that is clearly
specified in the standard as non-conforming.
> What many people will want is a switch that says "give an error on any use
> of any coarray feature".
>
That is what I meant by "disable the recognition of coarrays". Our
compiler switch is -hnocaf. Such a switch is trivial for any vendor to
implement.
>
>>> Er, the vast majority of Fortran users have never USED or even SEEN a
>>> vector machine (and, no, I don't count SSE etc.) You have, and I have,
>>> but the kiddies - including post-docs here :-) - I teach never have.
>>> Some have never even HEARD of them!
>>>
>> Sorry, but SSE does count. ...
>>
>
> SSE does not count because:
>
> a) It's not scalable. In almost any serious scientific program, you
> need 64-bit reals, so SSE supports up to 2-way. Aw, gee. It is therefore
> not relevant to a high-level discussion of scalable parallel features, such
> as coarrays.
>
Vectorization (either SSE or X2) is not an alternative to coarrays.
They coexist and complement each other. SSE is still vectorization.
And I would expect the length of the registers to get longer in the
future as chip vendors try to boost performance at fixed clock rates.
> b) Because of that, SSE optimisations are handled by all compilers in
> similar ways to instruction scheduling, and not like true vectorisation, as
> was used on the IBM 3090, Hitachi S-3600, many Fujitsus, almost all Crays
> and so on.
>
>
I can't speak for IBM, Hitachi, or Fujitsu, but at least I know now you
are not aware of what happens in Cray's compiler.
Cheers,
Bill
--
Bill Long longb at cray.com
Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120
More information about the J3
mailing list