Testing and analysis of web applications using page models

Introduction

Motivation

Code-based analysis of web applications is challenging because: In fact, as a result of these challenges, there are very few real tools or even research approaches for end-to-end white-box analysis of web applications. What we mean by an "end-to-end" analysis is analysis that: It is easy to see that pages affect control-flow; for e.g., a user could click different links on a page. More subtly, they affect data-flow also. For instance, the server could populate the options in a drop-down list using values in a server-side set S; now, if the user selects one of these options from the drop-down list and transfers control back to the server, a precise analysis should conclude that the option sent by the user is an element of the set S and not an arbitrary value.

Our approach and tool

With the aim of enabling end-to-end white-box analysis of web applications, we propose an approach that automatically translates each page-specification in the web application (which is usually in an HTML-based scripting language) into a "page model". A page model is a method in the same language as the server-side code. Basically, the page model conservatively over-approximates control-flows and data-flows that are possible due to the page under all possible server-side states. Links in a page become calls to server-side request-processing routines. The page model includes code that simulates randomly the user's choices, such as which option to choose from a drop-down list or which link to click.

Certain modifications are also performed on the server-side code by our approach. These are described in Sec. 4.1 of our paper.

We have implemented our approach as a tool. Our tool has been implemented in the context of J2EE applications. JSP (Java Server Pages) is a prevalent page-specification language, and is a part of the J2EE standard. Our tool takes as input a web application, and translates the JSP page specifications to page models in pure Java. The tool also performs the modifications to the server-side code mentioned above. The generated page models in cojunction with the modified server-side code results in a standard non-web program in the language of the server-side. Therefore, this program is amenable to white-box analysis using standard, off-the-shelf analysis tools that do not specifically address the complexities of web applications.

Our tool handles a rich set of J2EE features such as scope and session management, HTML forms and links, JSP tags, EL expressions and scriptlets. However, certain aspects, e.g., adding necessary "import" statements to resolve package references, are not automated, neither in the page models no in the server-side code. These are relatively simple to perform by a developer, but require more effort to automate. For our experiments, we manually make these changes within the programs emitted by our tool.

Experiments

We demonstrate the versatility and usefulness of our approach by applying two off-the-shelf analysis tools on the (non-web) programs generated by our tool.

Functional property checking using JPF

The first analysis tool we tried is JPF (Java Path Finder), which is a combined concrete- and symbolic-execution (i.e., concolic execution) based model-checker for checking functional properties of programs. The page models generated by our approach enable JPF to exhaustively explore different user choices, either concretely (links, forms, drop-down lists, etc.) or symbolically (text inputs), and to traverse application paths that involve visits to multiple pages in a sequence. The functional properties we checked were challenging properties that need traversal of many pages in a sequence; e.g, multiple registrations using the same email ID should not be allowed, and if the shopping cart is filled and then emptied, the total value shown in the cart should become zero.

Fault localization using Zoltar

The second off-the-shelf tool we consider is Zoltar, a dynamic-analysis based fault localization tool. We used this tool to find (seeded) bugs in web applications. We were able to run Zoltar in a completely off-the shelf manner, which would not have been possible on web applications as-is due to their tiered architecture, and due to the instrumentation that would have been required in the server-side framework and libraries.

Static slicing

In our submitted paper we also report results from applying static slicing on the programs generated by our approach using the slicing tool Wala. These experiments were the simplest to perform by us, and are easily reproducible. However, due to the complexities in packing and building all the packages that constitute the Wala tool into the VM, we omit this experiment from this artifact submission.

Assumptions and limitations

Summary of our contributions

Note about reviewer information: Our artifact does not collect or send out any information about reviewers.

The rest of this README file is organized as follows. Section 2 describes the web applications that we have selected as benchmarks. Section 3 describes how to run our tool on web applications to produce analyzable pure-Java programs that incorporate the generated page models. Section 4 describes how to run the JPF model checker to check functional properties on the benchmark programs. Finally, Section 5 describes how to run Zoltar on the benchmark programs to perform fault localization.

Selected benchmarks

BenchmarkJSP page specifications Server-side Java classes
Trainers Direct (TD) 18 30
Royal Odyssey (RO)55 3
Music Store (MS)31 34
Help Desk (HD)27 3
iTrust (IT)9 19

The table above provides information about the five real web applications that we have selected for our experiments. Our primary criteria for selecting these benchmarks were:

TD, MS, and RO are e-commerce applications. HD is a help-desk appilcation for tasks such as entering complaint tickets, managing an address book, etc. IT is a medical records management application. Of these benchmarks, HD and IT are frequently downloaded, with 995 and 7292 downloads, respectively, since January 2010.

The first four benchmarks above were amenable to be translated by our tool with very minimal code modifications. IT was different, because many of its pages use Javascript, and also because they use JSP expressions, which are now deprecated in favor of the newer EL expressions standard. Our tool handles only EL expressions. Therefore, we had to manually remove Javascript scripts from pages, and had to manually translate JSP expressions to EL expressions. IT is also very large (223 JSP pages and 365 Java classes). Therefore, to restrict the scope of our manual changes, we chose a smaller module within this application and analyzed only this module. The statistics given in the table above pertain only to this module. The usage of Javascript in these pages was minimal. Still, their removal means that the precision and/or correctness of some of our analyses could have been impacted to a certain extent for this benchmark.

Running our translation tool

We have packaged our artifact as a VirtualBox VM image. The image is available here for download: ISSTA2017_artifact.ova. It requires at least 12 GB of free disk space, and preferably a machine with 8 GB RAM. The user-name to login upon booting up the VM is "issta2017artifact", and the password is "abcd1234".

It is not possible to directly analyze the programs generated by our approach using analysis tools. This is because of the manual changes required on top of the generated code, as mentioned above. Therefore, we have directly placed the manually modified source files in the projects where JPF and Zoltar are to be run (see below). In this section, as samples, we discuss how our translator can be run on two of the benchmarks, namely, TD and IT.

  • cd to /home/issta2017artifact/translation. Run run.sh.
  • The generated page models and modified server-side code are available within the sub-folders IT and TD, respectively.
  • One can cd into each of the sub-folders above, and type ant to build the generated code. There will be build errors, due to reasons discussed above such as missing import statements. These will need to fixed manually, in general; it is not necessary for the artifact evaluator to do this, because we have already placed the manually modified code in the JPF and Zoltar projects.

    Experiments using JPF

    Introduction

    In this experiment, we run the concolic execution tool that comes with JPF (i.e., Symbolic Path Finder) on the non-web programs generated by our tool, to detect functional errors of the sort that manifest only when users visit specific sequences of pages with specific kinds of inputs on each page. The idea is to use JPF to visit page sequences exhaustively by simulating link clicks, to exhaustively select all possible drop-down list options in each page, and to enter symbolic values into text boxes to simulate all possible inputs to text boxes. The objective then is to see if the property is violated or not, up to a given bound bP on the number of pages to visit sequentially in any single run of the application. Note that JPF cannot be applied directly on web applications in an end-to-end manner, unless a non-web program is produced from a given web application as in our approach. This is because web applications have complex features such as client-server communication, control-flow and data-flow through generated HTML pages, and runtime support provided to the server-side code by the web server framework.

    We check 19 functional properties, across the five selected benchmarks. The properties for benchmarks TD, RO, MS, and HD were identified by graduate students who were not aware of our work, by actually using the web applications and observing its behavior. The properties for IT were identified by us directly from its user documentation. The breakup of the properties among the benchmarks is as follows: 5 properties for TD, 5 for MS, and 3 each for HD, RO, and IT. We have provided intuitive descriptions of all the properties in the file /home/issta2017artifact/jpf/property-descriptions.txt.

    Due to the exhaustiveness of the search, and due to the symbolic values given to text boxes, a key soundness guarantee of this experiment is that if JPF does not find a violation of a property, then no actual execution of the web application (that visits no more than bP pages) can violate the property.

    Baseline

    To serve as a baseline, we also provide a simplified variant of our approach, which simulates a previous approach. This variant also involves running JPF on translated applications (i.e., non-web programs), to use JPF to exhaustively traverse page sequences upto the bound bP. The difference from our approach is that the baseline ignores page contents, and focuses on concolic execution of server-side alone. Whenever a page is visited, it is modeled as simply sending symbolic values for all request parameters from the page. That is, effectively, all drop-down lists are considered to contain arbitrary options. This has the effect of increasing the number of false-positives in reported property violations. This baseline simulates previous approaches such as [ref. 14, 27] in our paper. We call this baseline Weave (after the approach [ref. 14], whose name is Weave).

    Steps to be followed by the reviewer to check all properties using JPF

    1. Start a terminal window inside the VM, and cd to the directory /home/issta2017artifact/jpf/.
    2. Run the script run.sh in this directory. This will analyze all 19 properties (i.e., 5 properties for TD, 5 for MS, and 3 each for HD, RO, and IT), one by one, using our approach and using both baseline approaches mentioned above. The on-screen messages that appear while the script is running indicate how many properties have been checked so far.
    3. Once the script run.sh completes, open the output file summary-table.html in the same directory to view the summarized results.
    The generated summary html file contains two tables. The first table (titled PageModel) contains results from our approach for all 19 properties. The second table contains results from the Weave baseline for all 19 properties.

    Both tables have the same format. A fragment of first table (i.e., from our approach) is shown below.

    PropertyViolations Found?Num. page sequences Num. unique page sequences Time (in s) Page Bound
    propHD1No881.6425
    propHD2Yes1751.4015

    Each row in the table pertains to one of the 19 properties. The columns in the table have the following meanings, respectively:

    1. The short-name of the property. (The letters "HD" in the name indicate that the property is for the benchmark HD.)
    2. Whether JPF reports a property violation or not in any of the page sequences that it traverses while checking this property.
    3. The total number of page sequences traversed (i.e., total number of runs of the benchmark explored). Note that all these runs of the benchmark occur as part of a single run of JPF, as enabled by its backtracking approach.
    4. The total number of unique sequences of pages traversed, across all runs.
    5. Time taken (in seconds) for all the runs.
    6. The value of the page-length bound bP that we used while checking this property. Note that we chose this bound manually by identifying the minimum length page sequence that would need to be traversed in order to be able to check the property.
    Note that multiple runs of the benchmark may correspond to the same unique page sequence. This is because the drop-down list option selected in two corresponding pages in the two runs could differ, or the symbolic string entered in a text box in two corresponding pages in the two runs could cause control to go through different paths in the server-side code.

    The results

    We summarize the results of functional property checking as follows. Eight of the 19 properties were found to be violated by our approach. These are propHD2, propIT2, propMS3, propMS4, propTD1, propTD2, propTD3, propTD4 (please see the file /home/issta2017artifact/jpf/summary-table.html). Six of these violations were reproducible by us (all except propIT2 and propMS3). This is evidence for the low false positive rate of our approach. The Weave baseline produces reports five extra violations on top of the 8 reported by our approach. These are for properties propTD5, propMS1, propMS5, propRO1, and propRO2. These five reports are necessarily false positives, as explained in our paper.

    Detailed output files and other artifacts

    Experiments using Zoltar

    Zoltar is a dynamic-analysis based fault localization tool. Our objective in this experiment was to use Zoltar to identify bug locations in web applications. Note that Zoltar is not applicable as-is on web applications, for the reasons discussed earlier in Section 1. Therefore, we applied Zoltar on the translated (non-web) programs produced by our tool. We first setup the entire experiment as follows:
    1. Manually created a set of test-cases for each benchmark. A test-case is nothing but a sequence of pages to visit, along with a tuple of input choices within each page that is visited, and a final oracle check of a desired property. All our test cases were passing test cases. We devised our own text-format to represent test-cases.
    2. Manually modified the page-models generated by our tool to follow the provided test inputs instead of choosing links and drop-down list options randomly.
    3. Seeded bugs in each benchmark program. To do this in an unbiased manner, we used the PIT mutation testing tool. This tool uses a set of patterns to suggest mutations, and reports those mutations that cause at least one test case to fail.
    4. From the mutations reported by PIT, we selected 20 mutations for each benchmark program. Applying these mutations one at a time, we created 20 buggy versions of each benchmark.

    Steps to be followed by the reviewer to execute Zoltar on all buggy versions of all benchmarks

    1. Start a terminal window inside the VM, and cd to the directory /home/issta2017artifact/zoltar/.
    2. Run the script run.sh in this directory. This script will apply Zoltar on each of the 20 buggy versions of each benchmark, using the test-cases provided for the respective benchmarks.
    3. Once the script run.sh completes, open the output file summary-table.html in the same directory to view the summarized results.
    The table summary-table.html contains 21 rows for each benchmark. Each of the first 20 rows corresponds to a buggy version of a benchmark. The first two columns depict the name of the benchmark, and the buggy version's number. The third column depicts the name of the method where the bug is seeded. The last column "rank of buggy method" is the important one. Zoltar assigns a suspicion score to each method, indicating how likely it is that the method contains the buggy statement. The last column indicates the percentage of methods in the benchmark that have suspicion scores that are greater than or equal to the suspicion score of the method that contains the seeded bug. A lower percentage indicates that the method that contains the seeded bug appears high in Zoltar's report, which is a list of methods ranked in descending order of suspicion score. Therefore, the developer will quickly identify the buggy method. The 21st row mentions the geometric mean of the ranks of the buggy methods across the 20 buggy versions.

    The results

    It should be clear from the table /home/issta2017artifact/zoltar/summary-table.html. that Zoltar is extremely effective in localizing faults when applied on our generated non-web programs. The geometric mean of the rank of the buggy method over the 20 buggy versions of each benchmark ranges from just 0.24% (benchmark IT) to around 5% (RO). This means that on average a developer would have to look at just 0.24% to 5% of the methods in an application before they are able to zero-in on the buggy method.

    Detailed output files and other artifacts

    For each benchmark bm, there is a folder /home/issta2017artifact/zoltar/bm/ in the VM image. For instance, there are folders /home/issta2017artifact/zoltar/MS, /home/issta2017artifact/zoltar/TD, etc. Under each of these folders there are folders named Mutation1, Mutation2, ..., Mutation20. Each of these Mutation folders contains a buggy version of the benchmark, within its src/ sub-folder. Each of these Mutation folders also contains an output/ sub-folder, which contains the following detailed output files: For example, consider the file /home/issta2017artifact/zoltar/MS/Mutation1/output/individualScores.txt. There are 2705 lines in this detailed report. Three lines out of 2705 have the highest score suspicion score of 1.0. These 3 lines all happen to belong to the method where the bug was seeded, which is the method client.Index.jspService(). In fact, two of these lines are exactly the location where the bug occurs. Therefore, the file /home/issta2017artifact/zoltar/summary-table.html mentions the rank of the buggy method client.Index.jspService() in Mutation1 of benchmark MS as being 0.25% (1st method out of a total of 392 methods in the application).

    Reviewer can try their own mutations

    We encourage the reviewer to try seed their own errors and see how effective Zoltar is in identifying the bug locations. Any arbitrary seeded bug may not be a good candidate, because zoltar can only find bugs that cause at least one of the provided test cases to fail. To this end, using the tool PIT, we have identified about 22 candidate mutations for the benchmark MS, which each cause at least one test to fail. These candidate mutations are listed in file /home/issta2017artifact/zoltar/MS/ZoltarList.txt. For instance, the first mutation suggested in this file is as follows:

    Mutation 1: com.client.run/MultiUserDriver.java 148

    1.1 Location : exitTo Killed by : com.music.tests.EmailTest.test2(com.music.tests.EmailTest) removed call to client/Index::jspService is KILLED

    This suggestion can be tried as follows:

    1. cd to /home/issta2017artifact/zoltar/MS/NoMutation. This is a non-buggy version of the MS benchmark.
    2. Open the file src/com/client/run/MultiUserDriver.java, go to line 148, and remove the call to the method Index.jspService (this is the suggested mutation -- -- --see above).
    3. cd back to directory /home/issta2017artifact/zoltar/MS/NoMutation, and run ant and then run.sh in the same directory.
    4. cd to the output/ sub-folder. Look at the file individualScores.txt. The lines of code in method exitTo in class MultiUserDriver (which is the method that contains the just-seeded bug) should have high suspicion scores (i.e., close to 1.0).