[Checkers] Thoughts on Type Inference: Part 1

Thu Sep 11 00:36:04 EDT 2008

Greetings,

I wanted to share with you (and document) my thoughts on type  
inference for IGJ, mainly to ensure that we are in the same page (i.e.  
please correct my understanding) and to document some of our decisions  
in the future.

__ My main lesson of my recent studies __
- Examine the API more thoroughly before making any decisions.
- Prototyping reveals some limitations that simple reading does not  
expose.

__ Existing Static Analysis Frameworks and rewriting __
There are many static analysis frameworks for bytecode analysis: WALA,  
Soot, ASM, etc.  The bytecode analysis is ideal in cases that do not  
require source rewriting.  In the case of source rewriting, many  
source details gets de-sugered and lost when compiled to bytecode.   
Some frameworks do not support generics.

So we decided to do our type inference analysis for IGJ on source.   
This limits our frameworks of choice.  Source analysis is more  
difficult than bytecode analysis; as Java rules are very complex,  
especially with Java 5 features (generics, (un-)boxing, var-args,  
enums, etc.).  We desire two main features in a framework:
- AST, type and symbol information about the java code.
- A rewriting engine to manipulate java code (mainly add annotations).

I have examined three frameworks that allow for Java source code  
analysis:

1. Eclipse
. Eclipse contains jdt.core plug-in that provides an AST interface.   
The plug-in can be used as a stand-alone headless component; not to be  
confused with the Eclipse GUI.  The plug-in provides an AST with  
necessary type and binding information and allows for code rewriting.
. Advantage: Eclipse interface is clean and integrates well with the  
Eclipse GUI, which is a popular interface.
. Disadvantages:
  - The JDT API doesn't allow for bytecode-source interpolation  
analysis .  One can leverage any supplementary analysis done in  
bytecode to the rest.  While this may not be necessary for IGJ now, I  
anticipate that it'll be a limitation in the future.
  - JDT API differs from javac API and thus has a learning curve.   
This is bad, especially if we make our framework supplementary to the  
Checker Framework.
  - JDT code doesn't recognize type annotations yet.  More Eclipse  
support for JSR 308 is desired to support partial type inference but  
not necessarily required.

2. Netbeans
. I honestly cannot figure out how Netbeans work!  There are some  
documentations out there, but do not know of a single user of Netbeans.
. Netbeans refactoring engine uses the same API as the compiler, and  
Netbeans only provide some utility methods and usage cache.
. Danny presided over the 1st workshop on Refactoring Tools, where the  
API was presented.
. Opinion: Personally, I do not think it's worth it to depend on  
Netbeans specific API as we might as well simply use javac directly.   
Or at least only use the self-contained utility classes that do not  
require Netbeans framework (pending some licensing issues).

3. Javac
. javac provides readonly interface to the java code.  We might need  
to either develop our own re-writing engine or find another one.   
Hopefully, this not quite difficult as only adding annotations.
. Advantages:
   - Same framework and API for Checker Framework.  We can use a lot  
of our analysis and utility methods.
   - Permits interpolation between bytecode and source, which we may  
do in the future.
. Disadvantages:
   - Need to work on rewriting engine.

__ Decompilation __
When running on bytecode, we are planning (at least for the time  
being) to decompile the classfiles and run the inference tool on the  
generated source.  I am planning to use jad, the most referenced  
decompiler online.  It has the following limitations (http://www.kpdus.com/jad.html#bugs 
)

> # In certain cases when decompiling classes which contain inner  
> classes, Jad cannot reliably sort out the extra arguments added to  
> class constructors by Java compiler. In those cases Jad can generate  
> constructors with an incorrect number of arguments or can fail to  
> declare some local variables as final.
>
> # In those rare cases when Jad is unable to fully decompile  
> constructs like labeled blocks with breaks or nested loops with  
> inter-loop break/continue statements it generates the source with  
> labels and goto statements which reflects program's control flow and  
> displays the message "Couldn't fully decompile method <name>". Also  
> when Jad couldn't reconstruct all try-catch-finally statements it  
> displays the message "Couldn't resolve all exception handlers in  
> method <name>".
>
> # Currently Jad makes no use of the Java class hierarchy  
> information. Consequently, Jad always chooses java.lang.Object as a  
> common superclass of two different classes and inserts auxiliary  
> casts where necessary.
>
> # Jad doesn't handle inlined functions well.

Also, we need to be a bit more cautions about native methods.  We can  
replace them with skeleton methods but indicate to the inference tool  
that they are un-analyzable methods.

Regards,
Mahmood