[J3] Problem with contiguous
Bill Long
longb at cray.com
Tue Dec 11 02:57:29 EST 2018
> On Dec 10, 2018, at 4:49 PM, Van Snyder via J3 <j3 at mailman.j3-fortran.org> wrote:
>
> I have a routine that's more than 1000 lines.
OK
>
> It needs more than a hundred work arrays.
Yikes.
>
> The routine is invoked for many problems of many different sizes, but I
> know the maximum size (at run time, not compile time).
>
If the max size is known at the beginning of the run, one option is to create a module with declaring the arrays as allocatable, with a contained INIT routine that is called before your routine is called the first time. The INIT routine allocates the arrays. At least you pay the allocation cost only once.
> Originally, the arrays were pointers, local to that routine, explicitly
> allocated, for each problem. This was before pointers could be declared
> to be contiguous.
>
> Then, they were allocatable, local to that routine, explicitly
> allocated, for each problem.
>
> Then, they were automatic, local to that routine.
>
Local automatic arrays are typically “allocated” on the stack, which is usually a low cost activity. You might check compiler options to make sure you are using the stack and not the heap for the automatic arrays.
> A profiler told us we were were spending a significant fraction of run
> time, about 8% overall and much more for that routine alone, in the
> storage allocator.
>
> So we made the routine internal to the one that repeatedly calls it with
> different problem sizes, made the work arrays automatic in the calling
> routine, with extents dependent upon the maximum problem size, and
> accessed them by host association. Because they're automatic, they're
> simply contiguous. Because they're in the calling routine, outside the
> loop of calls to the internal one, the storage-allocator cost was
> dramatically reduced.
Similar concept to the module solution.
>
> Then, we had to make the routine stand on its own again, so we added
> hundreds of assumed-shape dummy arguments for the work arrays.
Seems ill-advised on many performance grounds. I’d prefer the module option and get rid of all the extra assumed-shape arguments.
>
> The work arrays are still automatic in the calling routine (now
> routines), still declared something like
>
> real(rp) :: Sps_Path(max_f,n_sps)
>
> where max_f is the maximum problem size, and n_sps is the same for every
> problem. We need to compute column-wise dot products, so we don't want
> to exchange the dimensions, which would make the whole array contiguous
> because n_sps is the same for every problem, but this is not helpful
> because rows are never contiguous.
>
> The routine is referenced
>
> call one_frequency ( ..., sps_path(:npf,:), ... )
>
> where npf is the current problem size. The original code, in which
> Sps_Path etc. were accessed by host association, had sps_path(:npf,:) in
> every array reference, but now they're whole-array references with a
> requirement that the actual argument is just the relevant section. It
> would be costly and error-prone to change them back.
Seems like you have made costly changes multiple times before.
Cheers,
Bill
>
> We noticed a performance degradation because the dummy arguments are not
> contiguous.
>
> They can't be declared CONTIGUOUS -- because they're not. Well, we
> could declare them CONTIGUOUS at the expense of copy-in (and sometimes
> copy-out too). VALUE would make the expense more obvious, but the
> copying expense would overwhelm the cost increment incurred by
> discontiguity.
>
> But every section with stride = 1 in the first dimension is contiguous.
> For some four-dimensional arrays, it's every such section in the first
> three dimensions that are contiguous.
>
> I suspect others have had the same or a similar problem.
>
> Many years ago, I proposed a different notation for partly-contiguous
> assumed-shape arrays:
>
> real(rp), intent(in) :: sps_path(::1, :)
>
> meaning "it's contiguous in the first dimension, but columns are not
> necessarily contiguous one to another."
>
> More generally:
>
> real, intent(...) :: A(::1, ::1, :)
>
> meaning "it's contiguous only in the first two dimensions, but not after
> that," etc. There was a constraint that a dimension could not be
> declared to be contiguous unless all previous dimensions are contiguous.
> For example
>
> real, intent(in) :: A(:,:,::1)
>
> would be prohibited (because it cannot be correct that an array is
> discontiguous in the first two dimensions but contiguous in the third
> dimension).
>
> We rejected that notation in favor of CONTIGUOUS. which says the entire
> array is contiguous.
>
> PROPOSAL:
>
> Change CONTIGUOUS to
>
> R802 <attr-spec> <<is>> ...
> <<or>> CONTIGUOUS [ ( <scalar-int-constant-expr> ) ]
>
> with a constraint that the value of <scalar-int-constant-expr> shall not
> exceed the rank of the array.
>
> It would mean "the array is contiguous through the first
> <scalar-int-constant-expr> dimensions. If <scalar-int-constant-expr>
> does not appear, it means the same as we have now: contiguous through
> all dimensions.
>
> For the case of my four-dimensional arrays known to be contiguous in the
> first three dimensions, I also know that the first two dimension's
> bounds are constants (1:2). Another proposal (that was rejected) was to
> allow to mix explicit and assumed shape (and contiguity) in one
> declaration, so I could declare an array
>
> complex(rp) :: IncOptDepth(2,2,::1,:)
>
> This array is contiguous in the first three dimensions, but in addition
> the processor knows, by looking at the declaration, that the first two
> extents are constant. Therefore, there is some hope that when such an
> array is an argument, e.g. to MATMUL:
>
> b = matmul(incOptDepth(:,:,i,j), c)
>
> that the processor would optimize it. We noticed a performance
> difference between this and
>
> b = matmul(incOptDepth(1:2,1:2,i,j), c)
>
> so some processor was optimizing it. In the current context, to mix the
> two proposals, I would write
>
> complex(rp), contiguous(3) :: IncOptDepth(2,2,:,:)
>
> More generally, it is useful to be able to specify explicit extent after
> assumed extent, e.g.,
>
> real(rp) :: Buf(:,2)
>
> in which an explicit-extent dimension following a discontiguous
> assumed-extent one does not imply contiguity.
>
> But mixing assumed and explicit extent was a different proposal,
> requiring much more surgery, included here only for reference.
>
>
Bill Long longb at cray.com
Principal Engineer, Fortran Technical Support & voice: 651-605-9024
Bioinformatics Software Development fax: 651-605-9143
Cray Inc./ 2131 Lindau Lane/ Suite 1000/ Bloomington, MN 55425
More information about the J3
mailing list