[Checkers] IGJ And Type Inference

Wed Mar 19 12:59:23 EDT 2008

Hi guys again,

I haven't studied the literature available about type inference and  
the different inference tools out there yet, so I will simply post  
what my intuition and goals are for sanity check.  Please correct my  
understanding.

I would like to proceed with the following points:

1. I believe that any type inference tool need to work on both: source  
and bytecode.  The bytecode is necessary for analysis on third-party  
libraries, whose licenses may not allow you to reverse-engineer as  
decompilation is a form of reverse-engineering.  The source is  
essential for users to annotate their own (potentially partially  
annotated) code, where annotation should go into the source directly.   
However, we can use the bytecode as intermediate representation and  
figure out a way to insert the resulting annotations back in the  
source.  I assume that this mainly done within IDEs, and eclipse  
provides a very nice mutable AST tree that allows for tool to modify  
the tree.

2. I believe that it is highly desirable (yet not quite important)  
that the type inference tool does not load the classes.  Requiring to  
load the classes to be annotated would prevent the type inference tool  
to be able to annotate the JDK and the libraries the tool is depending  
on. It will also force the user to setup the classpath and the jni  
native library paths which may not be required for inference (if we  
supply the user with a flag for default annotations for those  
libraries).

In another note, my initial assumption is that the core of the type  
inference is a constraints solver with the following inputs:
1. Relations between symbols and the operations that are done on them  
(e.g. method 'add()' was called on symbol 'myList', method 'setX'  
contains an assignment to field 'X', etc).
2. A set of constraints (e.g. can only call mutable method on mutable  
receivers; if no mutable method is called on a type it's immutable,  
etc).
And then generates the least restrictive types that satisfy all the  
constraints. The type inference framework should provide a  
representation that could be easily constructed to generate the graph  
of the first kind.

> Jaime had a choice between building on what Matthew Tschantz had  
> started,
> or writing a new system entirely from scratch.
I will talk further with Jaime about his experiences with Soot and see  
how to refractor his code to allow for other type inferences.

> It's a requirement to process classfiles.  At the time Matthew started
> Javarifier, ASM and Soot looked like the best choice for processing
> classfiles.
ASM and Soot by first glance gives you a view of bytecode in a very  
low level view. Soot is designed for performing optimizations, thus  
all the representations are geared towards that.  ASM is more general  
allows one to analyze the bytecode in almost anyway they like.  While  
we might still need them to actually parse the files, it might be nice  
to have a intermediate framework that efficiently generates a graph  
representation that satisfy the first condition.

>  WALA: http://wala.sourceforge.net
>    IBM "T.J. Watson Libraries for Analysis" of bytecode.
>    WALA is a subset of IBM's DOMO program analysis infrastructure.
>    Seems like a good choice for new projects (as of late 2006).
>    Should be solid, since it is used by commercial projects within  
> IBM.
I haven't gone wrong with using an IBM library.  I will look at it.

> Another possibility might be write an inference tool that builds  
> somehow on
> the checkers framework or at least on the JSR 308 compiler  
> infrastructure.
> I don't know whether this approach makes sense.

The JSR308 infrastructure can be easily extended for a type inference  
tool for source, especially that it already parses JSR308  
annotations.  I don't find that desirable necessarily because of the  
classpath limitation described above.

However, if the type inference framework is done rightly, the  
relations graph is independent from (yet close to, for performance  
reasons) the parser, so we can have ASM, Soot, eclipsec, and javac  
front-ends (anything one likes).

- Mahmood