Null dereference verification for Java programs using NPEDetector

Introduction

NPEDetector classifies each dereference in a Java program as being safe or as having the potential to result in a null pointer exception. NPEDetector is primarily described in the following two research papers: Null dereference verification via over-approximated weakest pre-conditions analysis in OOPSLA 2011, and Two techniques to improve the precision of a demand-driven null-dereference verification approach in Science of Computer Programming, 2015.

NPEDetector is distributed as an importable Eclipse project. It is based on a modified version of WALA 1.3.4. It requires Eclipse IDE 3.6.0 or later, and JDK 1.5 or later. NPEDetector is distributed as open-source under the MIT License.

Running the virtual machine

For ease of use, NPEDetector, the modified version of Wala it is based on, as well Eclipse and JDK have been packaged as a VirtualBox VM image made available here. This VM image requires at least 10 GB of free disk space, and preferably a machine with 8 GB RAM. The user-name to login upon booting up the VM is "npedetectvm", and the password is "abcd1234".

Running the NPEDetector tool as an Eclipse project on sample benchmarks

After starting the virtual machine, start ./eclipse from the folder /home/npedetectvm/eclipse.
Choose the folder /home/npedetectvm/NPEDetector as the workspace. It contains the NPEDetector tool project, Wala projects and a toy Java project called NPETest.
Some projects in eclipse will show build errors, but they do not affect the tool execution, hence they may be ignored.

We discuss below how to run the NPEDetector tool on the toy program NPETest, as well as on a real benchmark bcel. For ease of understanding, the Eclipse installation is provided with run configurations for all of these. The different modes and analysis options are explained in Section 5. The user should refer to Section 4 for understanding the output of the tool.

Running the tool on a toy program - NPETest

The code for NPETest has been provided as a project in the same workspace. The correpsonding .jar file is present in the /home/npedetectvm/NPEDetector/ folder. The user is encouraged to look into the NPETest code for better understanding of the tool output (see class test.NPETest1). The following two run configurations have been provided for NPETest:

Whole Program : The run configuration NPEDriver_NPETest, analyzes the program in "wholeprogram" mode, using "wholeprogram" analysis option. Run NPEDetector using the NPEDriver_NPETest configuration. The tool will print the output on the console, as well as produce an output file master.log in the folder /home/npedetectvm/NPEDetector/WALA_output/NPETest.
Demand-Driven : The run configuration NPEDriver_NPETest_DD, analyzes the program in "demand-driven" mode, using the "wholeprogram" option. This configuration also produces its output into the same file as mentioned above.

Running the tool on a real program - bcel

The bcel.jar file is present in the /home/npedetectvm/NPEDetector/ folder. The run configuration NPEDriver_bcel analyzes the bcel.jar file in "wholeprogram" mode with the "wholeprogram" analysis option. The tool prints the output on the console, and produces an output file master.log in the folder /home/npedetectvm/NPEDetector/WALA_output/bcel.

Understanding the outputs

The NPEDetector outputs the file master.log. The output file contains the following information:

Important statistics of the analysis, such as the total time consumed, peak memory utilized, NPE analysis processing time, etc.
Then it lists all the unsafe and safe dereferences as Wala IR statements. Every line contains 10 fields demarcated by an asterisk(*). The key fields of interest are fields 3, 8, 9 and 10. The descriptions of these fields are as follows:
1. Field 3 represents the time taken to analyze the dereference due to this statement.
2. Field 8 is the instruction number of the dereferencing statement in the Wala IR.
3. Field 9 is the actual Wala IR statement containing the dereference. It which could be a method call statement, field read/write statement or array read/write statement.
4. Field 10 is the name of the method containing the dereference
Next the log file categorizes the unsafe dereferences in different categories e.g., Null Assigns, Iteration based unsafe dereference, etc.
Then the methods that weren't called anywhere in the program are listed.
Then information about the library methods that were analyzed is provided.

Running the tool on any .jar file

Section 3 demonstrated the method to run NPEDetector for specific examples. In order to run NPEDetector on other any Java program, the class analysis.core.NPEDriver in the project NPEBugDetection needs to be run in Eclipse as a standard Java application, with the following input parameters :

-appJar "Path-to-the-jar-file"
-dataDir "Data-directory-path"
-mode "wholeprogram | demanddriven"
-NPEAnalysisOpt "wholeprogram | thinsliced"
-outputdir "Output-directory-path"
-srcCaller "Method-name-in-Bytecode-syntax"
-tgtStmt "WALA-IR-Line-number"

NPEDetector analyzes the jar file specified using the -appJar option.

For running successfully, the tool also requires certain configuration files as input. These files can be placed in a folder, and the absolute path to this folder is provided to the tool using the -dataDir option. The necessary files are :

Entry points file: This file should list the classes (at least one) in the input jar file that contain "main" methods of interest (referred to as entry points). The file need not include all the classes containing "main" methods (e.g., classes written for testing purposes could be ignored). The analysis will only process the methods that are transitively reachable from the specified entry points. The entry-points file should be named as "entryPoints_<jar-file-name>".txt and should have the following format. The first line in the entry-points file should specify the number of classes listed in the file. The subsequent lines should contain the qualified names of the entry point classes (one per line) in the Java bytecode format. For a sample, see the entry-points file /home/npedetectvm/NPEDetector/NPEBugDetection/dat/entrypoints_NPETest.txt.

Exclusion files. The tool uses several exclusion files to make the analysis (and WALA libraries) more precise and efficient. These include JavaRegression60Exclusions.txt, pureExclusions.txt, skipList.txt and analyseList.txt. The JavaRegression60Exclusions.txt specifies a set of package names that will be ignored by the WALA analyses (see Wala Tips and Tricks for more details). The other files are specific to our analysis and are described in both the papers that are referred to at the beginning of this document. Samples of the exclusion files can be found in the directory /home/npedetectvm/NPEDetector/NPEBugDetection/dat/.

The analysis supports two modes of operation:

wholeprogram mode in which all the dereferences in the specified jar files are analyzed.
demanddriven mode, in which only the dereference that is specified as an argument is analyzed.

The -mode option should be used to specify the mode of the analysis.

In case the "demanddriven" mode is chosen, two other options, namely, -srcCaller and -tgtStmt, are necessary. The -srcCaller option should be used to specify the qualified name and signature of the method containing the dereference, in the Java bytecode format. (See this resource for the bytecode syntax of Methods).
The WALA IR line number of the dereference to be analyzed should be specified using the -tgtStmt option. Refer to the WALA tutorials to learn how to get the WALA IR line number of a statement. Also, see the run configuration PDFWalaIR, which emits the WALA IR of a given method as a PDF document (currently the configuration displays the IR of the test.NPETest1.printlen()) method.

The analysis supports two analysis "options" :

wholeprogram analysis : Runs a more precise, unoptimized analysis.
thinsliced analysis : Runs a less precise, but optimized analysis.

The -NPEAnalysisOpt option is used to specify the type of analysis required. It should be noted that the "wholeprogram" mode and "wholeprogram" NPE analysis options are not related to each other, and the naming is an unfortunate coincidence. Any mode can be paired with any analysis option.

The directory to contain the outputs of the analysis can be specified using the -outputdir option.