[J3] [EXTERNAL] Re: Custom Memory Allocator
Reuben D. Budiardja
reubendb at ornl.gov
Tue Nov 19 14:37:43 EST 2019
Hi Vipul,
On 11/16/19 11:57 AM, Vipul Parekh via J3 wrote:
> On Fri, Nov 15, 2019 at 7:16 PM Reuben D. Budiardja via J3
> <j3 at mailman.j3-fortran.org> wrote:
>> .. we do want to avoid
>> giving up the advantages of allocatable. And some OpenMP runtime
>> surprisingly works better with Fortran allocatable vs pointer. Hence the
>> question. .
>
> Hi Reuben,
>
> Re: "some OpenMP runtime surprisingly works better with Fortran
> allocatable vs pointer", would you know if this performance gain is
> due to OpenMP runtime taking advantage of what the Fortran standard
> offers in section 8.5.7, "An object is contiguous if it is .. an array
> allocated by an ALLOCATE statement," Meaning it presumes the "Fortran
> allocatable" are all CONTIGUOUS in its handling where it does not with
> "Fortran pointer"?
I don't know if that's the case, since I don't have inside knowledge of
what the OpenMP compiler / runtime is doing. We can only observe it from
the application perspective.
It seems that also from the profiler we observe that when a Fortran
pointer is used, there are more data-transfer done by the OpenMP runtime
when entering TARGET region. This is a performance hit, that somehow
doesn't happen when we use Fortran allocatable.
> And if so, would you benefit from using CONTIGUOUS attribute with
> objects in Fortran (dummy arguments perhaps?) whose memory is
> allocated using cudaMallocManaged?
We tried that and it doesn't help. One workaround is to pass the pointer
to a subroutine with assumed-shape dummy. In that case it seems that
inside the subroutine it just looks like a regular array to the compiler
that it can do better optimization.
Best,
Reuben
More information about the J3
mailing list