[J3] [EXTERNAL] Questions about DO CONCURRENT and locality

Ondřej Čertík ondrej at certik.us
Mon Jul 13 12:21:16 EDT 2020


Malcolm,

Just wanted to say thanks for the email below explaining the history and motivations behind the decisions in the current standard. It is very helpful. I haven't replied here yet, because we are still debating the details at:

https://github.com/j3-fortran/fortran_proposals/issues/62

And I currently need more time to study and debate the original proposal and the example in it to see if there is any issue left that might need to be addressed by the standard, or not.

Ondrej

On Mon, Jul 6, 2020, at 8:47 PM, Malcolm Cohen via J3 wrote:
> Hi Ondrej,
> 
> 
> The “simple edit” to the standard changes the semantics for existing 
> code. That is a non-starter, unless the standard is badly broken as is. 
> There is absolutely not universal agreement that the standard is broken.
> 
> 
> The user merely needs to specify the locality explicitly to be “shared”.
> 
> 
> To compare and contrast with OpenMP, if one fails to specify the 
> locality in OpenMP, one nearly always gets SHARED. This is maximally 
> error-prone as data races abound. Fortran 2008 (this is NOT a Fortran 
> 2018 feature) made the default “intelligently shared or private”; this 
> gets the “right answer” more often than OpenMP, but at the cost of not 
> always being able to parallelise a loop (though other optimisations are 
> enabled – DO CONCURRENT is not only about parallelisation, though that 
> is certainly a major thing).
> 
> 
> And even though F2008 didn’t parallelise as many loops as OpenMP would 
> have, it got more of the cases where LOCAL locality was needed right 
> (OpenMP always parallelised, but often gave the wrong answer if the 
> user accidentally omitted a PRIVATE clause).
> 
> 
> The issue of not being able to parallelise loops as well as both 
> vendors and users wanted is what led us, after a lot of discussion, to 
> add explicit locality specs in F2018.
> 
> 
> So, in my opinion, we already discussed the issues raised by this 
> paper. We discussed them at great length. Our solution (explicit 
> locality) in is F2018 already. I don’t think there is any such thing as 
> a “perfect solution” to these issues (different people have different 
> goals and requirements, and there are always trade-offs), but I do 
> think that what we have is pretty good. Someone who prefers the OpenMP 
> “pedal-to-the-metal and no brakes” approach can just add 
> DEFAULT(SHARED) to the DO CONCURRENT statement; yes, that does require 
> writing something marginally different, but it is hardly a huge 
> imposition.
> 
> 
> Cheers,
> 
> -- 
> 
> ..............Malcolm Cohen, NAG Oxford/Tokyo.
> 
> 
> *From:* J3 <j3-bounces at mailman.j3-fortran.org> *On Behalf Of *Ond?ej 
> ?ertik via J3
> *Sent:* Tuesday, July 7, 2020 1:31 AM
> *To:* J3 Mailinglist <j3 at mailman.j3-fortran.org>
> *Cc:* Ondřej Čertík <ondrej at certik.us>
> *Subject:* Re: [J3] [EXTERNAL] Questions about DO CONCURRENT and 
> locality
> 
> 
> Hi Tom,
> 
> On Mon, Jul 6, 2020, at 10:15 AM, Clune, Thomas L. (GSFC-6101) wrote:
> > 
> > Hi Ondrej,
> > 
> > If I understand correctly, merely “guaranteeing” that the loop can be 
> > parallelized in insufficient. The compiler needs to “know” whether to 
> > treat any given variable as SHARED or LOCAL. Small modifications of the 
> > example here can result in a different requirement in that regard to 
> > allow correct parallelization. 
> > 
> > In this example the user could specify SHARED(A,B,T,K,L) which would 
> > enable parallelization. (But would do unspeakable things if provided 
> > the data from your cases 4 & 5.)
> > 
> > I’m not sure what more you can be asking for. The compilers cannot 
> > determine the proper locality at compile time due to insufficient 
> > information, and cannot do so _efficiently_ at run time. Additional 
> > information must be provided by the programmer. 
> 
> I agree with this assessment, and I think the original proposal:
> 
> https://j3-fortran.org/doc/year/19/19-134.txt
> 
> that we are discussing here is trying to ensure that is the case in the 
> standard itself. The proposal says:
> 
> "A processor implementing a parallel execution model of this construct
> with the semantics required by the text of the Fortran 2018 standard
> must capture the value of A(J) and substitute that value for T(L(J))
> conditionally when K(J)==L(J). When there are multiple references to
> and definitions of elements of T, the number of these conditional
> forwardings grows as the product of the number of references and
> definitions. The induced overhead reduces the efficacy of parallel
> execution."
> 
> Which seems to suggest that the standard is currently requiring the 
> compiler to do runtime checks, and as you said and I agree, this cannot 
> be done efficiently at runtime.
> 
> The proposal is then suggesting a simple edit to the standard to 
> resolve the issue.
> 
> 
> I didn't the proposal, so I hope I understood it correctly.
> 
> Ondrej
> 
> 
> *Disclaimer*
> 
> The Numerical Algorithms Group Ltd is a company registered in England 
> and Wales with company number 1249803. The registered office is: 
> Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. 
> Please see our Privacy Notice 
> <https://www.nag.co.uk/content/privacy-notice> for information on how 
> we process personal data and for details of how to stop or limit 
> communications from us.
> 
> This e-mail has been scanned for all viruses and malware, and may have 
> been automatically archived by Mimecast Ltd, an innovator in Software 
> as a Service (SaaS) for business.
>


More information about the J3 mailing list