(j3.2006) (SC22WG5.3630) [ukfortran] [Fwd: Preparing for the Tokyo meeting]
Thu Nov 6 03:42:28 EST 2008
On Nov 6 2008, Bill Long wrote:
>> While that is true, multi-user (shared) systems are NOT going away, and
>> parallel applications are a right b*gg*r to schedule on such things.
>Really? All of our systems are configured as multi-user (shared)
>systems, and run parallel applications almost exclusively.
I am, of course, not talking about whether the systems are accessible to
many different users, but about when the system is running a mixture of
tasks for different users at the same time. Usually including a mixture
of active and inactive interactive sessions (with all the consequences that
GUIs imply), and relatively modest background and batch jobs or a wide
variety of characteristics.
Such systems are the workhorses of most research establishments.
>scheduling for multi-processor systems (including vanilla clusters) has
>been a solved problem for years.
Actually, it has been a provably intractable problem since the early 1970s!
Anyway, I am not talking about the simple (if still nasty) case of how to
assign jobs to systems, but about the lower level.
Firstly, most systems have no way to prevent one job from hogging resources
like memory, cache, TLB entries, and so on. There are rarely any facilities
to say "give this process N cores or don't run it" or "restrict this program
to x% of the cache or TLB entries".
Secondly, the thread schedulers are almost always optimised for interactive
(GUI) work, and behave very badly for parallel programs (which normally want
gang scheduling). Specialised HPC systems are the main exception.
Thirdly, there are some EVIL problems to do with interrupts and mixed
workloads, but they are far too complicated to discuss via Email.
>> Some administrators forbid them, and jump had on users caught running
>I can understand an administrator jumping on users for running serial
>codes on a cluster, on grounds that they are wasting resources since
>they can run codes like that on their desktop machines. The problem of
>an administrator who bans parallel jobs from a parallel computer is
>probably handled best by the (un)employment office. Certainly not an
>issue for WG5 or J3.
You are assuming that the world is like Cray. It isn't. I am referring
to the multi-core systems that are used for mixed workloads, of the sort
I describe above. And, yes, I managed Cray-like systems for a decade, so
I can talk both languages.
>> So serial Fortran will not go away any time soon, and some people will
>> positively want coarray-free compilers (or a fixable mode).
>I agree that having a compiler switch that allows the user to fix
>num_images at compile time (to 1, for example), or to disable the
>recognition of coarrays entirely, is a fine vendor feature. That gives
>the user freedom to live in a serial-only world. ...
It's not specifiable. Some coarray programs can survive that environment;
others can't; and it isn't possible to write a clear specification of which
can and which can't.
What many people will want is a switch that says "give an error on any use
of any coarray feature".
>> Er, the vast majority of Fortran users have never USED or even SEEN a
>> vector machine (and, no, I don't count SSE etc.) You have, and I have,
>> but the kiddies - including post-docs here :-) - I teach never have.
>> Some have never even HEARD of them!
>Sorry, but SSE does count. ...
SSE does not count because:
a) It's not scalable. In almost any serious scientific program, you
need 64-bit reals, so SSE supports up to 2-way. Aw, gee. It is therefore
not relevant to a high-level discussion of scalable parallel features, such
b) Because of that, SSE optimisations are handled by all compilers in
similar ways to instruction scheduling, and not like true vectorisation, as
was used on the IBM 3090, Hitachi S-3600, many Fujitsus, almost all Crays
and so on.
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1 at cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679
More information about the J3