[J3] [EXTERNAL] Re: Custom Memory Allocator

Reuben D. Budiardja reubendb at ornl.gov
Tue Nov 19 14:37:43 EST 2019

Hi Vipul,

On 11/16/19 11:57 AM, Vipul Parekh via J3 wrote:
> On Fri, Nov 15, 2019 at 7:16 PM Reuben D. Budiardja via J3
> <j3 at mailman.j3-fortran.org> wrote:
>> .. we do want to avoid
>> giving up the advantages of allocatable. And some OpenMP runtime
>> surprisingly works better with Fortran allocatable vs pointer. Hence the
>> question.  .
> Hi Reuben,
> Re: "some OpenMP runtime surprisingly works better with Fortran
> allocatable vs pointer", would you know if this performance gain is
> due to OpenMP runtime taking advantage of what the Fortran standard
> offers in section 8.5.7, "An object is contiguous if it is .. an array
> allocated by an ALLOCATE statement,"  Meaning it presumes the "Fortran
> allocatable" are all CONTIGUOUS in its handling where it does not with
> "Fortran pointer"?

I don't know if that's the case, since I don't have inside knowledge of 
what the OpenMP compiler / runtime is doing. We can only observe it from 
the application perspective.
It seems that also from the profiler we observe that when a Fortran 
pointer is used, there are more data-transfer done by the OpenMP runtime 
when entering TARGET region. This is a performance hit, that somehow 
doesn't happen when we use Fortran allocatable.

> And if so, would you benefit from using CONTIGUOUS attribute with
> objects in Fortran (dummy arguments perhaps?) whose memory is
> allocated using cudaMallocManaged?

We tried that and it doesn't help. One workaround is to pass the pointer 
to a subroutine with assumed-shape dummy. In that case it seems that 
inside the subroutine it just looks like a regular array to the compiler 
that it can do better optimization.


More information about the J3 mailing list