[J3] Proposals for new intrinsic procedures

Jeff Hammond jehammond at nvidia.com
Fri Jan 6 07:14:38 UTC 2023


A common use case is computing offsets into an array containing blocks of non-constant size. LLNL physics devs mentioned they use it regularly.

An intrinsic is definitely faster than what most users can write. Parallelizing scan is nontrivial eg http://www.mgarland.org/papers/2016/scan/.

You can also ask the OpenMP folks why they added it to OpenMP 5. It has been in C++ STL and MPI for decades at this point.

Jeff

Sent from my iPhone

On 6. Jan 2023, at 0.08, Long, Bill F <william.long at hpe.com> wrote:


External email: Use caution opening links or attachments

Do we still have the requirement that a new feature have a "use case"?   Other than being part of sending HPF to extinction, I'm not aware of any use for prefix_sum. (And would an intrinsic be faster than a roll-your-own function?)

Cheers,
Bill

________________________________
From: J3 <j3-bounces at mailman.j3-fortran.org> on behalf of Clune, Thomas L. (GSFC-6101) via J3 <j3 at mailman.j3-fortran.org>
Sent: Thursday, January 5, 2023 2:54 PM
To: Jeff Hammond <jehammond at nvidia.com>; General J3 interest list <j3 at mailman.j3-fortran.org>
Cc: Clune, Thomas L. (GSFC-6101) <thomas.l.clune at nasa.gov>
Subject: Re: [J3] [EXTERNAL] Proposals for new intrinsic procedures


I’m misremembering the terminology.  (Only needed this capability on one project.)   A scan can be inclusive or exclusive, with the former apparently being the more common usage.  And one can be derived from the other rather trivially.      Not immediately finding what these are called for the special case of sums …



Given that MPI has not seen a need to implement both, it seems plausible that just one is sufficient.



  *   Tom









From: Jeff Hammond <jehammond at nvidia.com>
Date: Thursday, January 5, 2023 at 3:30 PM
To: j3 <j3 at mailman.j3-fortran.org>
Cc: "Clune, Thomas L. (GSFC-6101)" <thomas.l.clune at nasa.gov>
Subject: Re: [J3] [EXTERNAL] Proposals for new intrinsic procedures



Parallel prefix_sum / ex/in-scan are available in many programmable models, including MPI, OpenMP 5, and C++17. There is clear evidence that these are useful and implementable in parallel in shared (both CPU and GPU) and distributed memory.



I am aware of no prior art for postfix. I I’ve never even heard this term before today.



Can you please share the applications currently implementing their own postfix_sum in parallel that would benefit from this intrinsic?



Jeff

Sent from my iPhone



On 5. Jan 2023, at 22.06, Clune, Thomas L. (GSFC-6101) via J3 <j3 at mailman.j3-fortran.org> wrote:

External email: Use caution opening links or attachments



Brad,



If we do `prefix_sum()` then, IMO, we should also do `postfix_sum()`.



And then what about `prefix_cosum()` and `postfix_cosum()` .  Slippery slopes and all that …



I believe such intrinsics go to JOR, but am far from certain.    For the IEEE-754 I would hope the initial request could just be one paper, as it should focus on the “why”?  Edits for the individual (new) IEEE functions might be separate papers for more pragmatic reasons.



A separate paper for the (misnamed) scan procedures.







Cheers,



1.       To



From: J3 <j3-bounces at mailman.j3-fortran.org> on behalf of j3 <j3 at mailman.j3-fortran.org>
Reply-To: j3 <j3 at mailman.j3-fortran.org>
Date: Thursday, January 5, 2023 at 2:35 PM
To: j3 <j3 at mailman.j3-fortran.org>
Cc: Brad Richardson <everythingfunctional at protonmail.com>
Subject: [EXTERNAL] [J3] Proposals for new intrinsic procedures



Hi all,



Would proposals for new intrinsic functions be acceptable to submit for

the upcoming meeting? Specifically I've been asked to write up the

papers for:



* prefix sum (commonly known as scan, but of course that name's already

taken), both inclusive and exclusive

* All functions recommended by IEEE-754, most of which Fortran already

has, but the whole list is reproduced below



exp

expm1

exp2

exp2m1

exp10

exp10m1

log

log2

log10

logp1

log2p1

log10p1

hypot

rSqrt

compound

rootn

pown

sin

cos

tan

sinPi

cosPi

tanPi

asin

acos

atan

atan2

sinh

cosh

tanh

acosh

atanh



Some points of order questions also. Which subgroup is in charge of the

intrinsic procedures so that I can coordinate with them? Would it be

best to write a separate paper for each new procedure, put them all in

a single paper, or some scheme for grouping certain subsets? And whilst

we don't have final version of F2023 to draft the edits against, would

it be worthwhile to go ahead and detail out what they will look like?



Thanks in advance for any feedback.



Regards,

Brad




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.j3-fortran.org/pipermail/j3/attachments/20230106/cb0093b9/attachment-0001.htm>


More information about the J3 mailing list