# (j3.2006) Loop fusion vs. vectorization

Bill Long longb
Mon Oct 5 17:46:37 EDT 2009

```
Van Snyder wrote:
> I have a routine for bilinear interpolation of a 2-d array zOld onto a
> 1-d path z through the (x,y) basis for zOld, coded thus.  Assume proper
> declarations, where sizes of xNew, yNew, z, hx, hy are the same, and the
> shape of zOld is (size(xOld),size(yOld)).
>
>   ! Hunt for xNew(:), yNew(:) in xOld(:), yOld(:)
>   call hunt ( xOld, xNew, hx )
>   call hunt ( yOld, yNew, hy )
>   do i = 1, size(z)
>     ix = hx(i)
>     iy = hy(i)
>     dx = ( xNew(i) - xOld(ix) ) / ( xOld(ix+1) - xOld(ix) )
>     dy = ( yNew(i) - yOld(iy) ) / ( yOld(iy+1) - yOld(iy) )
>     z = &
>       & ( 1.0 - dy ) * (( 1.0 - dx ) * zOld(ix,iy) + dx * zOld(ix+1,iy)) + &
>       & dy * (( 1.0 - dx ) * zOld(ix,iy+1) + dx * zOld(ix+1,iy+1) )
>   end do

Did you mean z(i) = .... here?

>
> Again assuming properly dimensioned variables, and assuming a proposal I
> made to allow subscripting a rank-k array zOld with a rank r array H,
> where the extent of the first dimension of H is k, giving a rank r-1
> object having the shape of the last r-1 dimensions of H, I could write
> this as
>
>   integer :: H(2,size(z))
>   call hunt ( xOld, xNew, h(1,:) )
>   call hunt ( yOld, yNew, h(2,:) )
>   dx = ( xNew - xOld(h(1,:)) ) / ( xOld(h(1,:)+1) - xOld(h(1,:)) )
>   dy = ( yNew - yOld(h(2,:)) ) / ( yOld(h(2,:)+1) - yOld(h(2,:)) )
>   z = &
>     & ( 1.0 - dy ) * (( 1.0 - dx ) * zOld(h) + &
>     &                 dx * zOld(reshape((/h(1,:)+1,h(2,:)/),(/2,size(z)/))) + &
>     & dy * (( 1.0 - dx ) * zOld(reshape((/h(1,:),h(2,:)+1/),(/2,size(z)/))) + &
>             dx * zOld(reshape((/h(1,:)+1,h(2,:)+1/),(/2,size(z)/))) )
>
> I write it the first way for two reasons
>
> (1) because I have to
> (2) because in the old days, caches were small enough and loop overhead
> was large enough to recommend fusing the loops.
>
> Again assuming the proposal, would version (2) vectorize sufficiently
> well to make it preferable to version (1), or would I still be better
> off fusing the loops as in version (1)?

I would not expect too much different in vectorization, and consecutive
array assignments with matching shapes can be automatically fused by the
compiler. Promoting dx and dy to arrays could cause them to be actually
stored, as opposed to being replaced by compiler temps when they are
scalars - this could hurt performance.  More important, compared to
version 1, I consider version 2 to border on unreadable.

Cheers,
Bill

>

--
Bill Long                                   longb at cray.com
Fortran Technical Support    &              voice: 651-605-9024
Bioinformatics Software Development         fax:   651-605-9142
Cray Inc., 1340 Mendota Heights Rd., Mendota Heights, MN, 55120

```