[Checkers] [Fwd: ISSTA 2008: Your paper has been accepted [resent]]

Mon Mar 31 10:58:39 EDT 2008

Forwarding in case this wasn't sent to everyone. It's also in SVN under 
papers/checkers-framework/reviews-issta2008.txt.

- Matt

-------- Original Message --------
Subject: ISSTA 2008: Your paper has been accepted [resent]
Date: Mon, 31 Mar 2008 14:20:17 +0100
From: issta2008 at easychair.org
To: mpapi at csail.mit.edu

[Resent due to unexpanded macros.  My apologies for the resulting 
confusion!  - Andreas]

Dear author,

Thank you for your submission to the ISSTA 2008 conference.
The program committee met on March 29-30 and considered the
100 submitted papers.  I am happy to report that your paper has
been accepted for inclusion in the ISSTA 2008 technical program.

All papers went through a thorough review process, involving at least
three reviewers.  The quality of submissions has been very high this
year, and we have accepted 26 papers out of 100 submissions.
Congratulations!

The reviews of your paper are enclosed.  Please consider them
carefully when preparing the camera-ready version of your paper.
Procedures on how and by when to do this will be sent to you shortly.

ISSTA takes place in Seattle on July 20-24, 2008.  We look forward to
seeing the final version of your paper and your presentation at ISSTA!

All the best,

Andreas Zeller - ISSTA 2008 PC Chair

---------------------------------------------

Paper: 84
Title: Practical Pluggable Types for Java

-------------------- review 1 --------------------

This paper introduces the Checkers Framework, a framework for
implementing "pluggable type systems" as annotation processors.
Checkers works with JSR308 annotations; it offers flow-sensitive
(albeit intra-procedural) type qualifier inference; Four pluggable
type systems have been implemented using Checkers, and they have been
tested by running on real programs.

Java annotations provide a non-intrusive way for rich, user-defined
constraints to be placed upon programs. Even though annotation
processiong tool (APT) has been available since Java 5 for such
checkers to be hooked into the Java compilation process, it is a
primitive interface and difficult to use.  This paper is one in a
series to provide frameworks such that annotation checkers can be
implemented more easily.

Given that there has been other frameworks with the same
pluggable/extensible type system goal in mind (e.g. JavaCOP, JQual,
and to some sense extensible compiler frameworks such as JastAdd or
Silver), the strength of this paper is in its evaluation of the
Checkers framework. It provides implementations of four different
annotation-based contstraint systems, evaluates them by running on
real programs, and provides data on the number of errors and false
positives caught. It is clear the authors put a lot of effort into the
evaluation.

The authors deliberately chose to go with a pure Java approach,
instead of using a declarative syntax for stating type checking rules
as other frameworks have done. This choice has the advantage of
Checkers being tightly integrated with the Java language environment
-- an advantage one cannot argue against. I do, however, find the
paper does not quite convince me of the other two claimed advantages.

1. The type system implemented can be arbitrarily complex, because of
the expressiveness of Java.  The four examples demonstrated in the
paper, as far as I can see, are all possible with at least JavaCOP
(the need for JSR308 annotations aside).  If I am mistaken, it is a
good idea then to point out in the paper exactly where the extra
expressiveness is beneficial.

Additionally, type rules are generally defined to be syntax-directed
and functional -- and it is questionable whether allowing rules that
cannot be specified in a syntax-directed and functional manner is a
good idea at all.  I am really not sure how rich the rule-specification
language needs to be.

Furthermore, the advantage of using a declarative language for rule
specification is that you can make sure rules are always defined to
satisfy certain properties. For example, JavaCOP rules are all checked
such that the modular type checking property of Java is maintained. I
imagine this is not possible with Checkers.

As an aside: I think an interesting challange in pluggable type
systems is how you can help rule implementers make sure they don't do
something really silly: like creating cyclic type lattices, or
creating rules where progress and preservation are broken, or how you
can make sure two separately developed pluggable type systems interact
well.  These tasks are probably easier tackled when you have a
declarative rule language.

2. Ease of use.  It is probably easier to implement annotation
checkers using Checkers than using the Java APT. However, I found the
paper's demonstration of ease of use a bit lacking.

First, authors use Figure 2 to demonstrate the size of the type
systems implemented with Checkers. This table shows the number of
*methods* needed in the visitor and type factory for each
checker. This is more an indication of how simple these type systems
themselves are, rather than how simple it is to implement these type
systems.  A closer look at the actual implementation of these methods
(downloaded from the website provided by authors) show that these
methods can be quite involved.  Additionally, Figure 2 (nor elsewhere
in the paper) does not mention the classes extending the abstract
class BaseTypeChecker, which has to be provided by each checker. This
again could be rather complex.  For instance, looking at IGJChecker
(extends BaseTypeChecker), it is not immediately clear to me what
rules are being implemented.

Comparatively, rules in JavaCOP and JQual, both of which use
declarative rule-based syntax, are quite easy to read.

Some smaller comments:

I do not think the comparison to other nullness finding tools in
Section 5.1 serves your purpose. You listed how many errors were found
by FindBugs, JLint, and PMD. However, as you state, these tools do not
require annotations on programs. So the comparison simply does not
provide readers with any information as to how well either the
Checkers Framework, or your implementation of the nullness checker,
works.  I would have much liked to see a comparison to the number of
annotations that you DO have to provide for the program to be checked
by other checkers.  For instance, JQual places major emphasis on its
qualifier inference capabilities.  Does your tool, and the NNEL
default, allow less annotations than JQual?

Related work: it is a bit deceiving to put JastAdd in the "Null
pointer dereference checking" section of the Related Work
section. JastAdd is a framework for building extensible compilers. In
fact, I am pretty sure using JastAdd to implement non-transforming
type checkers like the ones shown in the paper is quite simple.
Furthermore, the latest version of JastAdd handles Java generics
features fairly completely.

You state that JavaCOP is not publicly available as of Sept. 2007.
I'm not sure when JavaCOP became publicly available, but I was able to
download it and inspect the examples (but not run) from
http://www.cs.ucla.edu/~smarkstr/javacop/ on 3/12/2008.  You might
want to check it out, for comparison.

-------------------- review 2 --------------------

This paper first presents the design of the Checker Framework.
A type system designer uses the Checker Framework in defining custom 
type qualifiers for Java and creating compiler plug-ins that
check programs that use those qualifiers.  The definitions of
type qualifiers conform to the JSR 175/308 annotation syntax which
enables backward-compatible.  The compiler plug-ins are
themselves Java programs that extend three framework classes---a
visitor class, a type factory class, and a compiler interface
class.  Those classes provide basic functionalities---such as
checking for assignment of a superclass object to a subclass object, 
flow-sensitive
local type inferencing, type hierarchy management, and so on---which are
commonly used by type checkers. These
pre-encapsulated functionalities make instantiation of the Checker 
Framework relatively easy.

The brunt of the
paper (7 of 11 pages) describes four case studies that involve four type
checkers produced using the Checker Framework---a nullness checker, an 
interning checker, and two mutation
checkers.  Results of annotating and type checking several real software 
programs (most of them developed in house for this and other research 
done by this group) are reported. The paper discusses the types of 
errors that they found and how they fixed the errors, and also the 
sources of false positives (warnings of potential type errors that can't 
actually occur).  The paper also discusses how easy the checkers were to 
define and use. The source code of the framework, the source code of the
checkers, and the related documentation are accessible on the
Internet.

The Checker Framework appears to be an extremely well engineered 
framework, which can be used to quickly produce useful type checkers. 
However, there is nothing particularly novel in *how* it is engineered. 
And the idea to qualify types for custom type checking is not new.  The 
first 3 pages are essentially a tool description. The paper does not 
describe new techniques.  A new default type convention is introduced: 
NNEL (non null except locals).  But that is a relatively modest 
variation of the two more standard conventions.

The case studies are interesting and contribute anecdotal information 
relating to usability, effectiveness of checking, etc.  It is too bad 
that more quantitative information was not collected and presented to 
support many claims in the discussions of the results.

While the paper is well written, poor organization makes for difficult 
reading.  There are too many forward references in early pages to things 
that will come later in the paper.  Many of the examples in the first 
half of the paper would make sense/be easier to understand if presented 
later, after the reader sees some example type qualifiers and their 
semantics.

Another presentation problem is a tendency to jump to details without 
providing sufficient background/context.  For example:  The reader needs 
more context to understand comments such as: "for instance to avoid 
false positive warnings for methods like Collections.toArray, 
Properties.getProperty, and Class.cast."

Also: "A total of 55 trivial refactorings were required."  Required for 
what?  For fixing errors?  To compensate for "type-system weaknesses", 
"tool weaknesses"?  Also, is this the total across all of the example 
programs that were checked?  And were there other (non-trivial) 
refactorings?  If not, it should be stated that "A total of 55 
refactorings were required to ..."  Then, its fine to say that all were 
trivial.

Again, details without the information needed to understand them abound 
in Sections 7 and 8.  The paper should be self contained.  The paper 
should provide the definitions from [34] and [38] needed to appreciate 
the discussion of results.

Data should be provided to support claims for ease of use (in 4.1). 
Statements about time spent annotating vs understanding causes of 
warnings and about small number of causes accounting for most false 
positives should be backed by numbers.  Similarly, for claims that 
annotated code is easier to understand.

Low level details of the errors found are not particularly illuminating. 
  Giving a few representative examples is useful.  But there is no 
reason to go to the level of which method each error was in and what 
specific values/expressions were not interned/properly annotated/etc. 
and in how many branches, etc.

Some minor typos/comments:

On page 2, in the second paragraph of section 3, missed word "is"
in "The key purpose of the Checkers Framework [is] to apply the rules
for a type system..."

On page 3, do you really mean "for reasons of simplicity and usability" 
to qualify "are not expressible in the source code"?  Expressibility is 
usually meant as an absolute:  something is expressible in a notation or 
it is not.  I would instead say "even ones that are not easily expressed 
in the source code" ... or "... are not simple to express ..."

On page 4, in "extract annotations from, and insert them in": the "in" 
should be "into"

What is "a featureful ... utility"?

On page 4-5, the terse descriptions of the programs utilized are not 
very informative.

On page 5, in the last second paragraph, duplicated word "the" in
"...As another example, [the] the checker's advisories revealed dead
code..."

On page 6, "non-null" should be capitalized when it starts a sentence.

On page 9, what is the "?" notation?

Authors need to force hyphenation in several places.

-------------------- review 3 --------------------

Summary:
This paper presents a framework for pluggable type qualifiers.
It allows addition of custom type qualifiers as follows. A designer
defines the qualifiers and adds a compiler plug-in (i.e., a Checker)
which enforces their semantics.  Programmers can annotate their
programs and use the plug-in to check for errors (i.e., violations
of the specified type semantics).
The paper presents 4 checkers (a Nullness checker, an Interning
checker, a simpler Mutation checker and a more complex Mutation
checker). It presents results of running these checkers on a large
code base: the checkers uncover many significant, non-trivial
program errors.

This is a well-written paper, and I would like to see it published.
The strengths of the paper are the following:

First, it presents a practical framework for pluggable types. The
framework works on the complete Java language, it is expressive,
and it is integrated with the Java compiler.

Second, the framework is definitely useful. The paper argues usability
convincingly, and presents strong empirical evaluation which shows
that the framework can be used to reason about non-trivial properties
and uncover real errors.

Third, the system is publicly available which will likely lead to new
research; I can easily see researchers using the framework to check
for certain properties (e.g., security-related properties).

The weaknesses of the paper (in my opinion) are the following:

First, there is a significant annotation burden. The framework is
only a checker, without inference capabilities, and requires a lot
of work by the programmer. Independent checks cannot be easily
performed because writing annotations requires close knowledge of
the code (i.e., annotations are best written by the programmer).
More effort should be spent on scalable inference.

Second, it is not clear whether the system expresses properties
beyond supertype-subtype relationships. The paper claims that it
does, but all the example instantiations are lattices.

Third, and most important, the general ides of pluggable types is
not new. It is a great idea, very useful in practice, and I support
research and development of checking and inference frameworks. The
biggest issue of this paper is that it is very similar to JQual
(ref. 25). As mentioned earlier, the pluggable types here appear
to handle only supertype-subtype relationships which was exactly
the model in JQual and CQual. Moreover, JQual is superior because
it performs type inference and generally requires substantially
smaller number of annotations. The current framework is more
practical than JQual â€“ it handles generics and is tightly integrated
with the Java compiler (I believe however, that these are engineering
issues, and JQual could be easily enhanced to handle them).

That said, I still support this paper strongly: the practicality
and usefulness of the framework as well as the case study could be
of great value to the ISSTA community.

One comment: when describing the different instantiations and the
types in each instantiation, please state the types and their
semantics clearly. For example, in the description of Javari,
types QReadOnly, Assignable, and Mutable are never defined.

-------------------- review 4 --------------------

Some comments from the PC meeting:
  -The main issue for all reviewers was how the paper refused to compare 
meaningfully with past work (JavaCOP, JQual) because this might expose 
disadvantages. The authors need to be more forthcoming in their 
evaluation and especially avoid misleading statements.

  -A minor point raised was that the first contribution claimed in the 
paper is for the JSR, which seems an invalid contribution: the JSR is 
only discussed extremely briefly in the paper and also the JSR has an 
author who is not an author of this paper.

-------------------- review 5 --------------------

This paper presents the Checkers framework, a system to implement
pluggable types for Java.  Checkers differs from previous work
in that it relies on standard Java facilities from four
JSRs,  so the implementation is compatible with the standard Java
toolchain.  The paper also presents some observations and insights on
various type systems.

I think the paper has two broad contributions, which can be evaluated
independently:
1) An implementation of pluggable types using facilities from the Java tool
set, and
2) Experiences and observations with a few type systems implemented
using the system.

Regarding contribution 1:

Overall, there are clear advantages Java standard APIs to implement
this kind of tool, as motivated in the paper.   I don't see
any deep insights or surprising conclusions from the implementation;
the discussion is shallow and does not dive into details regarding
interesting implementation challenges or problems.   The main contribution
here is a proof point that an implementation is possible, and
there are apparently no "gotchas" hiding in the weeds.

The authors exploit the richer annotations provided
by JSR 308 (originally submitted by one of the authors of the paper),
and the annotation processing APIs providing by Java 6.  This seems like
a better implementation strategy than previous work published before
such APIs existed, which had to invent Java AST APIs in order to
implement checkers.

I think a main weakness of the paper is that it does not critically compare
the pros and cons of the Checkers frameworks against other approaches. 
I believe previous
systems such as JavaCOP and JQUal provider higher-level (if less expressive)
declarative languages for expressing similar checkers.  The Checkers 
framework appears
more general and expressive, but it's not clear whether this makes writing
checkers substantially more complex.  I would have liked to see clear 
comparisons
of writing checkers in this system as compared with the equivalent 
checkers in
previous work.

The paper does not present code examples showing checkers in this system.
It is not clear how complicated writing
a checker is, particular a flow-sensitive GEN-KILL checker as
described in 3.4.  Figure 2 shows some statistics on checker size, but
without code examples, it is difficult to evaluate these numbers.
It is not clear how Checkers compares to the previous work regarding
ease of writing checkers, since it is not clear how to relate the
statistics in Figure 2 to type checkers from previous work.

On a presentation note .... the paper makes claims such as "It was easy"
before presenting empirical evidence to back up the claim; it would
be better to present the data first and then draw conclusions.

Regarding contribution 2:

The paper presents some case studies with four different checkers,
with anecdotes describing difficulties encountered.   These stories
are valuable and shed light on the practical implications of using
these type qualifiers.    I found these to be the most interesting
part of the paper.

The NNEL type checker seems like a good design, and seems novel.

The intern checker is a good idea; while intern-ing can usually be enforced
for user types with a factory pattern, the intern checker has the
advantage of working with types from the standard libraries.

The discussion of JavaCOP in the related work section is unnecessarily
dismissive, and contains several incorrect statements or 
misrepresentations:
1) "JavaCOP has not been evaluated -- only fragments of checkers
have been written, and those checkers have never been run on any
programs".   Looking at the OOPSLA 2006 paper, I see experimental results
on several real codes with several checkers, so I conclude this statement
is false.
2) JavaCOP is not available (as of Sept. 2007) ... remedied by a footnote
etc ... the ISSTA deadline was January 2008, so it is inappropriate to
criticize work based on data from Sept. 2007.
3) The paper fails to mention that the JavaCOP system in fact implemented
several of the same checkers reported in this paper.
In fact it seems that only the flow-sensitive null checker could be
not easily be expressed in JavaCOP?  There is no objective comparison
with JavaCOP in terms of ease of writing checkers.

Normally I trust that innocent problems with related work discussions
can be fixed.  However, JavaCOP seems to be the among the most closely 
related work,
and the authors are clearly familiar with it, so I'm troubled by
the misrepresentations listed above.

The remainder of the related work section is adequate.