[J3] Problem with contiguous
Van Snyder
Van.Snyder at jpl.nasa.gov
Mon Dec 10 21:49:34 EST 2018
I have a routine that's more than 1000 lines.
It needs more than a hundred work arrays.
The routine is invoked for many problems of many different sizes, but I
know the maximum size (at run time, not compile time).
Originally, the arrays were pointers, local to that routine, explicitly
allocated, for each problem. This was before pointers could be declared
to be contiguous.
Then, they were allocatable, local to that routine, explicitly
allocated, for each problem.
Then, they were automatic, local to that routine.
A profiler told us we were were spending a significant fraction of run
time, about 8% overall and much more for that routine alone, in the
storage allocator.
So we made the routine internal to the one that repeatedly calls it with
different problem sizes, made the work arrays automatic in the calling
routine, with extents dependent upon the maximum problem size, and
accessed them by host association. Because they're automatic, they're
simply contiguous. Because they're in the calling routine, outside the
loop of calls to the internal one, the storage-allocator cost was
dramatically reduced.
Then, we had to make the routine stand on its own again, so we added
hundreds of assumed-shape dummy arguments for the work arrays.
The work arrays are still automatic in the calling routine (now
routines), still declared something like
real(rp) :: Sps_Path(max_f,n_sps)
where max_f is the maximum problem size, and n_sps is the same for every
problem. We need to compute column-wise dot products, so we don't want
to exchange the dimensions, which would make the whole array contiguous
because n_sps is the same for every problem, but this is not helpful
because rows are never contiguous.
The routine is referenced
call one_frequency ( ..., sps_path(:npf,:), ... )
where npf is the current problem size. The original code, in which
Sps_Path etc. were accessed by host association, had sps_path(:npf,:) in
every array reference, but now they're whole-array references with a
requirement that the actual argument is just the relevant section. It
would be costly and error-prone to change them back.
We noticed a performance degradation because the dummy arguments are not
contiguous.
They can't be declared CONTIGUOUS -- because they're not. Well, we
could declare them CONTIGUOUS at the expense of copy-in (and sometimes
copy-out too). VALUE would make the expense more obvious, but the
copying expense would overwhelm the cost increment incurred by
discontiguity.
But every section with stride = 1 in the first dimension is contiguous.
For some four-dimensional arrays, it's every such section in the first
three dimensions that are contiguous.
I suspect others have had the same or a similar problem.
Many years ago, I proposed a different notation for partly-contiguous
assumed-shape arrays:
real(rp), intent(in) :: sps_path(::1, :)
meaning "it's contiguous in the first dimension, but columns are not
necessarily contiguous one to another."
More generally:
real, intent(...) :: A(::1, ::1, :)
meaning "it's contiguous only in the first two dimensions, but not after
that," etc. There was a constraint that a dimension could not be
declared to be contiguous unless all previous dimensions are contiguous.
For example
real, intent(in) :: A(:,:,::1)
would be prohibited (because it cannot be correct that an array is
discontiguous in the first two dimensions but contiguous in the third
dimension).
We rejected that notation in favor of CONTIGUOUS. which says the entire
array is contiguous.
PROPOSAL:
Change CONTIGUOUS to
R802 <attr-spec> <<is>> ...
<<or>> CONTIGUOUS [ ( <scalar-int-constant-expr> ) ]
with a constraint that the value of <scalar-int-constant-expr> shall not
exceed the rank of the array.
It would mean "the array is contiguous through the first
<scalar-int-constant-expr> dimensions. If <scalar-int-constant-expr>
does not appear, it means the same as we have now: contiguous through
all dimensions.
For the case of my four-dimensional arrays known to be contiguous in the
first three dimensions, I also know that the first two dimension's
bounds are constants (1:2). Another proposal (that was rejected) was to
allow to mix explicit and assumed shape (and contiguity) in one
declaration, so I could declare an array
complex(rp) :: IncOptDepth(2,2,::1,:)
This array is contiguous in the first three dimensions, but in addition
the processor knows, by looking at the declaration, that the first two
extents are constant. Therefore, there is some hope that when such an
array is an argument, e.g. to MATMUL:
b = matmul(incOptDepth(:,:,i,j), c)
that the processor would optimize it. We noticed a performance
difference between this and
b = matmul(incOptDepth(1:2,1:2,i,j), c)
so some processor was optimizing it. In the current context, to mix the
two proposals, I would write
complex(rp), contiguous(3) :: IncOptDepth(2,2,:,:)
More generally, it is useful to be able to specify explicit extent after
assumed extent, e.g.,
real(rp) :: Buf(:,2)
in which an explicit-extent dimension following a discontiguous
assumed-extent one does not imply contiguity.
But mixing assumed and explicit extent was a different proposal,
requiring much more surgery, included here only for reference.
More information about the J3
mailing list