(j3.2006) Static dataflow analysis, runtime undefined variable reference checking.

Van Snyder vsnyder
Fri Oct 30 19:41:27 EDT 2009


Runtime checking for references to undefined variables is useful, but
when such an event occurs, it doesn't tell me the path by which I
arrived at such a reference.  It would be useful if the compiler could
give me a backtrace of basic blocks, perhaps with a length limited by
some absurdly large constant (and ideally with cycles not repeated).
This helps with repairing undefined variable references, but not with
the "what definitions reach this reference" or "what references does
this definition reach" questions that arise during maintenance (reducing
the spectrum of answers to such questions was one of the motivations for
construct-scope variables).  Some (most?) compilers have an option (or
default) to produce a call tree traceback with line numbers when a fatal
runtime error occurs.  An option to produce a basic-block traceback for
all fatal runtime errors, not just the "reference to undefined variable"
error, would be useful.

I suspect most compilers do static dataflow analysis for purposes of
optimization.

I would find it very useful to have reports derived from static dataflow
analysis, especially something like the following:

        "On the path to the reference to variable X at line 1234 in
        foo.f90 consisting of basic blocks 1, 5, 23, the variable X is
        not assigned a value."
{and it would be useful to have an option that prints a listing with
basic block numbers, or alternatively to report each basic block as a
line-number range}.

        "On all paths from the entry bar to subroutine foo at line 123
        in foo.f90 to the reference to variable X at line 1234 in
        foo.f90, the variable X is not assigned a value."
{and then enumerate all the paths as in the previous example.}

        "On all paths from the assignment to variable X at line 1234 in
        foo.f90 that return from the subroutine or terminate the
        program, the value of the variable X is not referenced."
{and then enumerate all the paths using their BB numbers, as in the
first example.}

A message of the form
        "On some path from the assignment to variable X at line 1234 in
        foo.f90 that returns from the subroutine or stops the program,
        the value of the variable X is not referenced"
would be actively unhelpful.  John Rice (a professor of computer science
at Purdue) had a Fortran 66 program named ELLPACK, which produced
elliptic PDE solvers.  He ran it through a static dataflow analyzer
called DAVE (by Lee Osterweil).  It produced 3000 pages of "Fatal Error"
messages.  He set a grad student to work on it for a semester.  All the
messages were of this kind.  No changes were made in the code as a
result of this waste of time.

Global dataflow analysis, taking into account all kinds of association,
would be best, of course, but analyses only of variables not in common
and not obtained by use association would be useful.  Even excluding
local variables on paths where they are actual arguments associated with
dummy arguments having unspecified intent would leave a useful result in
most cases.

I understand that in some cases involving the SAVE or TARGET or
DIMENSION attributes the analyzer might have to throw up its hands in
despair and report "Gee, I just don't know."

A printed listing would be useful, but an interactive tool that allows
one to click on a variable definition and see all references that
definition reaches, or click on a variable reference and see all
definitions that reach that reference, would be very valuable for
maintenance, especially of unfamiliar code.  If there are paths reaching
the reference on which the variable has no value, it would be nice if
they turned red.   If all paths from a definition to a return or stop
don't reference the variable it would be nice if they turned red.  It
would be useful even if it only worked for local scalars without the
SAVE and TARGET attributes.

Of course, such tools would be useful for other languages, but might not
produce much useful information about most C programs because of C
pointer semantics.

Does your compiler (or some other tool) report this?  Does it collect
enough information but not report this?  Is there a prospect it might
report it?  Is there a prospect it might collect and report it?  How
much would it cost to make either or both such tools, or augment your
compiler to produce such "batch" reports?  How much would a license to
use them cost?

-- 
Van Snyder                    |  What fraction of Americans believe 
Van.Snyder at jpl.nasa.gov       |  Wrestling is real and NASA is fake?
Any alleged opinions are my own and have not been approved or
disapproved by JPL, CalTech, NASA, the President, or anybody else.



More information about the J3 mailing list