For this project, you must complete just one of three different program analyses. Each analysis uses a different tool and implementation language. You may choose to implement whichever analysis you like. Keep in mind that your experiences with the analysis and implementation tool could be quite beneficial for your final project, so choose something that interests you. You may choose between
The call graph of a program is a directed graph where the nodes of the graph correspond to the functions within the program and an edge exists from A to B if B is called from within A. For example, the C code below has the given call graph.
int a() { b(); }
int b() { c(); }
int c() { a(); }

A dynamic call graph contains only those nodes and edges that occurred
within a specific execution of a program. Thus, if getInput()
returns 0 below, the dynamic call graph is as presented.
int a()
{
if (getInput() != 0)
b();
else
c();
}

You will use a tool called Valgrind to monitor the execution of a program in order to determine the dynamic call graph for that execution. The final result of your analysis will be an association between the observed functions and the set of functions called from within them. For example, the code
void a(){
if ( b() )
c();
else
d();
}
int b(){
c();
return 0;
}
void c(){
d();
}
void d(){
}
yields the output:
Valgrind is a binary instrumentation framework for developing dynamic analyses. It simulates binary programs and allows integrated tools to manipulate and observe the simulation on a very fine grained level. Many such example tools are provided with the main infrastructure. You will use Valgrind to monitor program executions and build the dynamic call graph, displaying the graph in the specified format upon completion of the program. The Valgrind Manual provides information on the core infrastructure and on writing new tools.
The latter provides in depth instructions on how to start. By modifying the analyzed program's behavior in the instrumentation phase, you will cause the program to construct its own call graph when it executes. This is similar to the aspect oriented programming that you previously saw in class. You will need to work with a very small portion of the VEX IR for instrumentation.
Note that the main distribution of Valgrind is presently lacking the file autogen.sh, which it requires to compile new tools. We have conveniently provided this for you.
-g in gcc). You may look at the built in tools to see how they retrieve such information as function names. The simplest tool to examine and understand is Lackey.main body of the program. You should ignore these and leave them out of your final memory graph.Nullness analysis is a means of determining whether or not a variable holds a null value at a given point in code. Methods exist for performing a nullness dataflow analysis with varying levels of soundness, precision, and complexity. For this project you will implement a particular, simple intraprocedural nullness analysis focussing on local variables, over either Java or C code, using an instance of a dataflow framework.
The particular properties of the nullness analysis you will perform are not complex, and they can be applied to either C or Java code. Your goal is to determine whether or not scalar variables are null, non-null, or possibly null at any given program point. You are not expected to perform any additional analyses, e.g. alias analysis, so the facets of either language that we must consider shall be limited. You will perform your analysis such that, for any dereference of a variable, your analysis provides an error if that variable is conservatively null and provides a warning if the nullness of the variable cannot be statically determined at that point. Obviously, this can only apply to variables used as references or pointers.
Because this analysis will be intraprocedural, we can only ascertain facts about the initial nullness of local variables. We assume no knowledge for any global variables or formal arguments, so they may possibly be null but are not known to be null. Furthermore, because we are only maintaining information for scalars, you need not determine the nullness of values gotten by dereferencing local variables. Thus, we know nothing about values gotten through pointer indirection, fields, or arrays. In each case, you may always assume that the value there could be either null or non-null.
Obviously, nullness propagates via assignment, thus in
1) Object x = new Object(); 2) Object y = null; 3) x = y;after statement 3,
x has a nullness of "null" that propagated from y.
Your analysis should also be primitively predicate aware. That is, if a path
condition for an if statement is of the form
"x == null" then this constraint affects both
branches of the if statement. On the true branch, x must be known
as null, and on the false branch, x must be known to be non-null. You need
only handle the basic constraints of a variable being equal or not equal to null. Thus, in
1) if ( null != y ) 2) y.someMethod(); 3) else 4) ...; // If y dereferenced, known null -> ERROR 5) y.anotherMethod(); // Possibly null dereference -> WARNINGby the constraint
y != null on the first branch,
we know that y must not be null on line 2, so that method call
is safe. The nullness of y at 4 is also constrained by the
condition so that y is known to be null there. Finally, line 5
succeeds statements in which y could have been null or non-null,
so we conservatively know nothing about its nullness at this point.
Also, note in the above example how dereferencing a variable known to be null causes an error, while dereferencing a variable that might be null causes only a warning.
A final caveat of the analysis is that it must be sensitive to a variable's dereference history. Specifically, note that after a variable has been dereferenced, execution continues only if the dereference was successful. Thus, after a variable has been dereferenced while holding a value, that value cannot possibly be null. For instance, in
1) x.someMethod(); 2) y = x.someField;we know at line 2 that
x must not be null because otherwise the
execution would not have continued on to that statement.
You will use one of two dataflow engines to implement this analysis. You can either use Soot, which uses Java to analyze Java bytecode, or you can use CIL, which uses OCaml to analyze C source code. Each engine and language has its own technicalities; the choice is yours as to which you use, and the TA can provide some assistance as necessary. Your analysis should allow the inspection of all functions/methods a C source file or Java class and display a list of errors for known bad dereferences and warnings for possibly bad dereferences. e.g.
Soot is a Java bytecode analysis and transformation framework. You can make use of it to perform informative analyses by creating transformations that simply use its analysis engine and report the results. The Soot Survival Guide provides an all encompassing tutorial on how to use it effectively, and the Soot API provides all the resources you'll need to complete the project.
To help you start, I am providing you with the classes NullWarner
and NullnessTagger. These add a new transform to the Soot
framework and invoke it upon analyzed classes. By invoking a
flow analysis of your devising within the transform, you are able
to use it to "tag" all statements with error information that you can then scan
and print out. Assuming that you have all of Soot's libraries on your
classpath, you can then run the analysis on a compiled Java class, e.g.
Test.class, as such:
.bashrc:
thisCIL is another intermediate representation system, but it holds an IR for C that can then be used for transforming into alternative C source code. While CIL operates on C, it is written in OCaml, and so are new analyses written for it. The main CIL homepage provides an explanation of how CIL works, as well as extensive MLDoc for using its structures. The provided skeletal file has additional hints and pointers for both OCaml and working with CIL.
The skeletal file applies a simplification transform into a 3 address code
as described in the
CIL Manual. This
should make the process easier. You can perform dataflow analysis in CIL
by creating an appropriate Transfer struct and using it with an appropriate
Dataflow functor.
CIL compiles in your additional module, the file I provided, in order to enable the analysis within CIL's framework. First, however, you must configure CIL to do so. Before compiling CIL, download the provided module to a working directory, say "/my/path/to/work", then in CIL's main directory configure it via
plainCilPrinter to print out the structural form of the components.There are no corrections at this time.
A minimal code basis is available here to get you started:
The project must be turned in by 29 February 2008. You should turn in all of your new or modified Java, OCaml, or C source files. Additionally, include a "readme" file with your name(s), your choice of project, and any other information that you feel is pertinent. You may email your projects directly to the TA at wsumner@cs.purdue.edu, or you may deliver them by hand in CD format.
If you are having trouble submitting your files, talk to the TA for help before the deadline.