[J3] Problem with contiguous
Van Snyder
van.snyder at jpl.nasa.gov
Tue Dec 11 14:37:50 EST 2018
On Tue, 2018-12-11 at 13:25 -0500, Vipul Parekh wrote:
> For the situation
> you describe, a minimal working example will be most useful.
This isn't a working example, but it illustrates the problem.
Solution 1 (host association):
real(rp) :: Sps_Path(max_f,n_sps)
...
call one_frequency ( ... )
contains
subroutine One_Frequency ( ... )
...
sps_path(1:npf,sps_i) = ... ! LHS is contiguous here
Solution 1 has good performance, but One_Frequency isn't reusable.
Solution 2 (dummy argument with actual argument section):
real(rp) :: Sps_Path(max_f,n_sps)
...
call one_frequency ( ... sps_path(1:npf,:), ... )
contains
subroutine One_Frequency ( ... Sps_Path ... )
...
real(rp), intent(in) :: Sps_Path(:,:)
....
sps_path(:,sps_i) = ... ! LHS is NOT contiguous here, and the
! performance difference is measurable.
Multiply this by a hundred arrays, used in a thousand statements, and
the performance difference adds up. But One_Frequency is reusable.
One might be tempted to say "You idiot! Use solution 3" (dummy argument
with whole-array actual argument):
real(rp) :: Sps_Path(max_f,n_sps)
...
call one_frequency ( ... sps_path, ... )
contains
subroutine One_Frequency ( ... Sps_Path ... )
...
real(rp), intent(in), contiguous :: Sps_Path(:,:)
....
sps_path(1:npf,sps_i) = ... ! LHS is contiguous here.
That might be wonderful advice if you're writing new code from scratch.
But that's not always possible, especially when somebody gives you a
module, written in the style of Solution 2, and says "Use this in your
model. You have two weeks to get the run time under fifteen hours."
Using the whole of sps_path in one place where the first dimension ought
to have a 1:npf section can introduce mysterious errors that are
difficult to find. Just guessing, but I suspect that was the reason not
to use solution 3 in our case. Whole array, or whole column section, is
more reliably correct than an explicit part-of-a-column section.
Assume you have already learned from profiler runs that contiguity makes
a difference, but you can't declare Sps_Path to be contiguous in
solution 2. You suspect it would make a difference if you could somehow
declare that every column is contiguous, but each column is not
contiguous to the next one. A few experiments, such as Solution 4:
real(rp), intent(in), target :: Sps_Path(:,:)
real(rp), pointer, contiguous :: Sps_Path_Col(:)
...
sps_path_col => sps_path(1:npf,sps_i)
sps_path_col = ... ! LHS is contiguous here
verify the conjecture. It's better than solution 3, but not as good as
solution 1.
If you have a thousand lines, and a hundred variables, and two weeks,
it's difficult to meet the deadline and get performance somewhere
between solution 1 and solution 2.
The reason that solution 4 isn't as good as solution 1 is probably that
TARGET and POINTER can subvert much of the improvement gotten by
CONTIGUOUS. That was one of the reasons why I proposed, many years ago,
that it ought to be possible to annotate TARGET with a list of names of
pointers that might be associated with it, and to annotate POINTER with
a list of targets with which it might be associated:
real(rp), intent(in), target(sps_path_col) :: Sps_Path(:,:)
real(rp), pointer(sps_path), contiguous :: Sps_Path_Col(:)
...
sps_path_col => sps_path(1:npf,sps_i)
sps_path_col = ... ! LHS is contiguous here
but that proposal was also rejected.
One might suggest that these column operations should be hidden in
myriad small subroutines, but that transformation isn't feasible with a
two-week schedule, and procedure references might subvert improvement
gotten by CONTIGUOUS.
One might suggest solution 5:
associate ( sps_path_col => sps_path(1:npf,sps_i) )
sps_path_col = ... ! LHS is contiguous here
but that transformation might need to be done for every use of sps_col
(because the references are scattered all over the procedure), unlike
the pointer solution. It also can't be done with a tight deadline, and
might well double or triple the subprogram size.
We invented CONTIGUOUS for a good reason. It's still not sufficiently
precisely tuned to address the problems it could.
More information about the J3
mailing list