IMPECS Workshop on Program Analysis
September 17-20, 2012
Department of Computer Science and Automation
Indian Institute of Science, Bangalore


Home Apply Speakers Program Local Information Contact Us Sponsors

Abstracts

Satish Chandra

Title: Fault Localization in Enterprise Applications

Abstract:
Data-centric programs primarily interact with databases to get collections of content, process each entry in the collection(s), and output another collection or write it back to the database. Bugs in data-centric programs usually show up in the form of incorrect entries in the output collections.

In this talk, I'll present a dynamic analysis technique for localizing faults in data-centric programs. In our approach, we use a precise slicing algorithm to break a faulty execution trace into multiple slices, such that each slice maps to an entry in the output collection. We then compute the "semantic" difference between the slices that correspond to correct entries and those that correspond to incorrect ones. The semantic differencing is a generalization of the usual notion of control-flow based differencing that has been previously used in fault localization. I'll discuss the effectiveness of this technique over a suite of real-world ABAP programs.

This talk is based on joint work with several of my colleagues at IBM Research, India.

Back to Schedule
Aditya Nori

Title: Interpolants as classifiers versity of Freiburg

Abstract:
Computing good invariants is key to effective and efficient program verification. In this talk, I will describe our experience using machine learning techniques for computing invariants or interpolants useful for program verification. Informally, an interpolant is a predicate that separates good program states from bad program states, and a set of appropriately chosen interpolants forms a proof of correctness of the program. Our main technical insight is that interpolants in program verification can be viewed as classifiers in supervised machine learning. This view has the following advantages: First, we are able to use off-the-shelf classification techniques, in particular support vector machines (SVMs), for interpolation. Second, we show that SVMs can find relevant predicates for a number of benchmarks. Since classification algorithms are predictive, the interpolants computed via classification are likely to be relevant predicates or invariants. Finally, the machine learning view also enables us to handle superficial non-linearities. Even if the underlying problem structure is linear, the symbolic constraints can give an impression that we are solving a non-linear problem. Since learning algorithms try to mine the underlying structure directly, we can discover the linear structure for such problems. We demonstrate the feasibility of our approach via experiments over several benchmarks from various papers on program verification.

Back to Schedule
Supratik Chakraborty

Title: Quantifier Elimination for Linear Modular Equalities, Disequalities and Inequalities

Abstract:
Quantifier elimination is a key operation in several program analysis algorithms. It is used, for example, in computing strongest post-conditions or transfer functions, in computing relational abstractions, etc. In this talk, we focus on affine programs with fixed width registers, in which equalities, disequalities or inequalities appear as conditions guarding branches/loops. For analysis of such programs, eliminating quantifiers from Boolean combinations of linear modular inequalities, disequalities and equalities is an important task. We first discuss a layered algorithm for quantifier elimination from a conjunction of linear modular in/dis/equalities. Our algorithm uses a set of computationally efficient heuristics to identify cases where the original system of constraints can be simplified without sacrificing semantic equivalence. Experiments show that this often leads to significant simplifications, resulting in increased efficiency in solving the quantifier elimination problem. If none of the simplifications work, our algorithm eliminates quantifiers using a complete (but computationally expensive) algorithm. We then present three different approaches for extending the quantifier elimination technique for conjunctions of constraints to boolean combinations of constraints. We also present experimental results comparing the performance of these approaches.

This is joint work with Ajith John.

Back to Schedule
R. Venkatesh

Title: Analyzing Industry Code for Bugs

Abstract:
This talk will describe a "Towards Zero Defects" program being executed at TRDDC. As part of the program we analyzed and categorized defects found during system testing and defined program analysis equations to detect these category of bugs during analysis. The aim being to systematically increase the number of categories of defects that will be detected through analysis. To quickly implement solvers for these equations we used PRISM, a dataflow analysis framework that is highly scalable. Difference based analysis and model checking was used to improve precision. We continue to explore various abstraction techniques to achieve high precision while analyzing several million lines of code. The talk will present the various approaches we explored or are exploring and also what the industry hopes to achieve through static analysis.

Back to Schedule
Andreas Podelski

Title: Inductive Data Flow Graphs

Abstract:
The correctness of a sequential program can be shown by the annotation of its control flow graph with inductive assertions. We propose inductive data flow graphs, data flow graphs with incorporated inductive assertions, as the basis of an approach to verifying concurrent programs. An inductive data flow graph accounts for a set of dependencies between program actions in interleaved thread executions, and therefore stands as a representation for the set of concurrent program traces which give rise to these dependencies. The approach first constructs an inductive data flow graph and then checks whether all program traces are represented. The size of the inductive data flow graph is polynomial in the number of data dependencies (in a sense that can be made formal); it does not grow exponentially in the number of threads unless the data dependencies do. The approach shifts the burden of the exponential explosion towards the check whether all program traces are represented, i.e., to a combinatorial problem (over finite graphs).

The talk is based on joint work with Azadeh Farzan, Zachary Kincaid, University of Toronto.

Back to Schedule
Helmut Seidl

Title: Side-effecting Constraint Systems: a Swiss Army Knife for Interprocedural Analysis of Concurrent Programs

Abstract:
Side-effecting constraint systems have originally been introduced for the analysis of multi-threaded code. In this talk, we show that this formalism provides one common ground for realizing efficient interprocedural analyses of programs, possibly with dynamic function calls, where various amounts of context-sensitivity is used. We also indicate that these constraint systems are well-suited to combine such context-sensitive analyses of local properties with flow-insensitive analyses of global properties, e.g., about the heap and thus forms the ideal basis for building general-purpose infra-structures for static analyzers. One such infra-structure is the analyzer generator Goblint, maintained at TUM that has been used for practically evaluating the approach on real-world examples.
Joint work with: Kalmer Apinis and Vesal Vojdani.

Back to Schedule
Jan Reineke

Title: Timing Analysis and Timing Predictability

Abstract:
In hard real-time systems, timely computation of outputs is as important as computing the correct output values. Worst-case execution time (WCET) analysis is a fundamental step in proving that all timing constraints of an application are met. Given a program and a microarchitecture, the task of WCET analysis is to determine an upper bound on the execution time of the program on the given microarchitecture under all possible program inputs. While this task is in general undecidable, commercial WCET analysis tools exist. In the first part of my talk I will introduce the WCET analysis problem and the basic structure of state-of-the-art WCET analysis tools.

Current microarchitectural developments make sound, precise, and efficient WCET analysis increasingly difficult: contemporary processor architectures employ deep pipelines, branch predictors, and caches to improve performance. Further, multicore processors share buses, caches, and other resources, introducing interference between logically-independent programs. These features are claimed to make such processors "unpredictable". Formally capturing the notion of predictability has so far proved to be hard. In the second part of my talk I will present the first formalization of the notion of predictability for caches. br>
Back to Schedule
Kartik Nagar

Title: A Comprehensive Cache Analysis for Multi-level Data caches

Abstract:
Obtaining a precise and safe estimate of the Worst Case Execution Time(WCET) of programs is an important problem for real-time systems. Modelling the underlying microarchitecture, especially the caches, while determining the WCET estimates has an enormous impact on its precision. Abstract Interpretation based cache analysis is widely used for estimating the impact of caches on WCET, and in our work, we propose a new extension to the classical singlelevel cache analysis. Our work targets a multi-level data cache hierarchy with writebacks, and we use the novel idea of partial and pseudo cache blocks to improve the precision of Multi-level Must Analysis. We also propose a safe way for performing May and Persistence Analysis, in the presence of writebacks.

Back to Schedule
Bernd Finkbeiner

Title: Lazy Synthesis

Abstract:
We present an automatic method for the synthesis of processes in a reactive system from specifications in linear-time temporal logic (LTL). The synthesis algorithm executes a loop consisting of three phases: Solve, Check, and Refine. In the Solve phase, a candidate solution is obtained as a model of a Boolean constraint system; in the Check phase, the candidate solution is checked for reachable error states; in the Refine phase, the constraint system is refined to eliminate any errors found in the Check phase. The algorithm terminates when an implementation without errors is found. We call our approach lazy, because constraints on possible process implementations are only considered incrementally, as needed to rule out incorrect candidate solutions. This contrasts with the standard eager approach, where the full specification is considered right away. We report on experience in the arbiter synthesis for the AMBA bus protocol, where lazy synthesis leads to significantly smaller implementations than the previous eager approach. The talk is based on joint work with Swen Jacobs.

Back to Schedule
Ruzica Piskac

Title: Software Synthesis using Automated Reasoning

Abstract:
Software synthesis is a technique for automatically generating code from a given specification. The goal of software synthesis is to make software development easier, while increasing both the productivity of the programmer and the correctness of the produced code. In this talk I will present an approach to synthesis that relies on the use of automated reasoning and decision procedures. First I will describe how to generalize decision procedures into predictable and complete synthesis procedures. Here completeness means that the procedure is guaranteed to find code that satisfies the given specification. I will illustrate the process of turning a decision procedure into a synthesis procedure using linear integer arithmetic as an example. This approach is implemented in Comfusy. Comfusy is an extension to the Scala compiler and a tool for complete functional synthesis. It accepts as an input expressions with input and output variables and outputs code that computes output values as a function of input values. In addition, it also outputs the preconditions that the input variables have to satisfy in order for a solution to exist. Next I will outline a synthesis procedure for specifications given in the form of type constraints. The procedure derives code snippets that use given library functions. The procedure takes into account polymorphic type constraints as well as code behavior.We implemented this approach in a tool called Insynth. To help developers in composing functionality from existing libraries and from the user defined values, InSynth synthesizes and suggests valid expressions of a given type at a given program point.

Back to Schedule
Nishant Sinha

Title: Finding Bugs Without Looking All Over

Abstract:
Most symbolic bug detection techniques perform search over the program control flow graph based on either forward symbolic execution or backward weakest preconditions computation. The complexity of determining inter-procedural all-path feasibility makes it difficult for such analysis to judge up-front whether the behavior of a particular caller or callee procedure is relevant to a given property violation. Consequently, these methods analyze several program fragments irrelevant to the property, often repeatedly, before arriving at a goal location or an entrypoint, thus wasting resources and diminishing their scalability.

In this talk I will present a systematic and scalable technique for focused bug detection which, starting from the goal function, employs alternating backward and forward exploration on the program call graph to lazily infer a small scope of program fragments, sufficient to detect the bug or show its absence. The method learns caller and callee invariants for procedures from failed exploration attempts and uses them to direct future exploration towards a scope pertinent to the violation. This is a joint work with N. Singhania, S. Chandra and M. Sridharan at IBM Research Labs.

Back to Schedule
Aditya Kanade

Title: Testing and Analysis of Data Format Compatibility of Programs

Abstract:
Modular design and composition of programs is a scalable methodology for development of large software systems. Correctness of the composed system depends on compatibility of the component programs. In many software systems, the programs manipulate and exchange complex data like network packets and files. Checking compatibility of such programs involves comparing the data formats followed by them. In this paper, we motivate the problem of data format compatibility with examples. We present a testing approach for addressing this problem. The approach involves static program inversion and dynamic constraint solving. We also discuss a static analysis approach for inference of data formats embedded in programs and apply it to formalize and check compatibility of producer-consumer and version-related programs.

Back to Schedule
Uday Khedkar

Title: Liveness-Based Pointer Analysis

Abstract:
Precise flow- and context-sensitive pointer analysis (FCPA) is generally considered prohibitively expensive for large programs; most tools relax one or both of the requirements for scalability. We argue that precise FCPA has been over-harshly judged---the vast majority of points-to pairs calculated by existing algorithms are never used by any client analysis or transformation because they involve dead variables. We therefore formulate a FCPA in terms of a joint points-to and liveness analysis which we call L-FCPA. Our analysis computes points-to information only for live pointers and its propagation is sparse (restricted to live ranges of respective pointers). Further, our analysis We implemented a naive L-FCPA in GCC-4.6.0 using linked lists. Evaluation on SPEC2006 showed significant increase in the precision of points-to pairs compared to GCC's analysis. Interestingly, our naive implementation turned out to be faster than GCC's analysis for all programs under 30kLoC. Further, L-FCPA showed that fewer than 4% of basic blocks had more than 8 points-to pairs. We conclude that the usable points-to information and the required context information is small and sparse and argue that approximations (e.g. weakening flow or context sensitivity) are not only undesirable but also unnecessary for performance.

Reference:

Back to Schedule
Amogh Margoor

Title: Improving the precision of a scalable, demand-driven null-dereference analysis

Abstract:
We describe an approach and a tool for sound, demand-driven null dereference analysis for Java programs, via over-approximated weakest precondition analysis. The primary focus of this talk is on optimizations that we introduce for efficient analysis, that allow long program paths to be traversed, and hence higher precision to be achieved. One of our major optimizations is to bypass certain expensive-to-analyze constructs, such as virtual calls with too many possible targets, by directly transferring dataflow facts from points after the construct to points before along certain kinds of def-use edges. A second major optimization is to use manually constructed summaries of Java container class methods, rather than analyze the code of the method directly. A third optimization involves reasoning about arbitrarily long access paths that may arise in a program due to traversal of recursive data structures, by using a finite closed-form representation of these access paths. We demonstrate that our optimizations result in a 48% reduction in false positives over the baseline, without significant impact on running time.

Back to Schedule
Krishna Nandivada

Title: A Transformation Framework for Optimizing Task-Parallel Programs

Abstract:
Task parallelism has increasingly become a trend with programming models such as OpenMP 3.0, Cilk, Java Concurrency, X10, Chapel and Habanero-Java (HJ) to address the requirements of multicore programmers. While task parallelism increases productivity by allowing the programmer to express multiple levels of parallelism, it can also lead to performance degradation due to increased overheads. In this paper, we introduce a transformation framework for optimizing task-parallel programs with a focus on task creation and task termination operations. Our framework includes a definition of data dependence in task-parallel programs, a happens-before analysis algorithm, and a range of program transformations for optimizing task parallelism. Further, we discuss the impact of exception semantics on the specified transformations, and extend them to handle task-parallel programs with precise exception semantics.

Back to Schedule
Vinayaka Bandishti

Title: Tiling Stencil Computations to Maximize Parallelism

Abstract:
Most stencil computations inherently have the property of tile-wise concurrent start, i.e., there always exists a face of the iteration space and a set of tiling hyperplanes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. However, existing automatic transformation tiling frameworks often choose hyperplanes that enforce pipelined start-up which not only leads to load imbalance, but also reduces the number of tiles that can be executed in parallel. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load-balance whenever possible.

We first provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start of a statement in any program with affine data accesses. We then provide a sound and complete approach to find such hyperplanes for stencil computations. Experimental evaluation on a 12-core IntelWestmere multicore shows that our code is able to outperform a tuned domain-specific stencil code generator by about 4% to 20%. In other cases, we outperform previous compiler techniques by a factor of 2X to 16.4X.

Back to Schedule
Narayan Kumar

Title: Verifying Temporal Properties of Multi-pushdown Systems

Abstract:
Multipushdown systems provide a natural formal model for recursive multi-threaded programs. In this talk we survey some of the recent results on the verification of restricted classes of multi-pushdown systems.

Back to Schedule
Rupak Mazumdar

Title: A perfect model for bounded verification

Abstract:
A class of languages C is perfect if it is closed under Boolean operations and the emptiness problem is decidable. Perfect language classes are the basis for the automata-theoretic approach to model checking: a system is correct if the language generated by the system is disjoint from the language of bad traces. Regular languages are perfect, but because the disjointness problem for CFLs is undecidable, no class containing the CFLs can be perfect.

In practice, verification problems for language classes that are not perfect are often under-approximated by checking if the property holds for all behaviors of the system belonging to a fixed subset. A general way to specify a subset of behaviors is by using bounded languages (languages of the form w1* ... wk* for fixed words w1,...,wk). A class of languages C is perfect modulo bounded languages if it is closed under Boolean operations relative to every bounded language, and if the emptiness problem is decidable relative to every bounded language.

We consider finding perfect classes of languages modulo bounded languages. We show that the class of languages accepted by multi-head pushdown automata are perfect modulo bounded languages, and characterize the complexities of decision problems. We show that computations of several known models of systems, such as recursive multi-threaded programs, recursive counter machines, and communicating finite-state machines can be encoded as multi-head pushdown automata, giving uniform and optimal underapproximation algorithms modulo bounded languages.

[Joint work with Javier Esparza and Pierre Ganty]

Back to Schedule
Sudipta Chattopadhyay

Title: Scalable and Precise Refinement of Cache Timing Analysis via Model Checking

Abstract:
Hard real time systems require absolute guarantees in their execution times. Worst case execution time (WCET) of a program has therefore become an important problem to address. However, performance enhancing features of a processor (e.g. cache) make WCET analysis a difficult problem. In this paper, we propose a novel approach of combining abstract interpretation and model checking for different varieties of cache analysis ranging from single to multi-core platforms. Our modeling is used to develop a precise yet scalable timing analysis method on top of the Chronos WCET analysis tool. Experimental results demonstrate that we can obtain significant improvement in precision with reasonable analysis time overhead.

Back to Schedule
Aravind Acharya

Title: Model Checking in Presburger Counter Systems using Accelerations.

Abstract:
Model checking is a powerful technique for analyzing reachability and temporal properties of finite state systems. Many practical systems are naturally modeled as counter systems, where the domain of counter values is possibly infinite. We identify a class of counter systems, and propose a new technique to check whether a system from this class satisfies a given CTL formula. The key novelty of our approach is a way to use existing reachability analysis techniques to answer both ``until'' and ``global'' properties. We also give some results from applying our approach to several natural examples, which illustrates the scope of our approach.

Back to Schedule
PVR Murthy

Title: Speculative Resource Acquisition

Abstract:
Choice of resource acquisition patterns such as lazy acquisition and eager acqusition can have an impact on the response time and/or scalability of an application. Lazy acquisition pattern is conservative as resources are acquired on demand only, on the first access, which can result in a task waiting and a poor response. On the other hand, in eager acquisition, no waiting may be necessary on the first access, while all the resources acquired ahead may not be accessed eventually. Our work presents a simple algorithm, to compute statically the probability with which a program unit P accesses a resource r , given branch probabilities. Branch probabilities may be available through developer annotations or usage profiles. This results in eagerly acquiring, at run-time, only those resources that are required with high probabilities minimizing the the likelihood of not accessing acquired resources at all. To achieve this, at run-time only those resources required with a probability greater than a chosen threshold value are speculatively acquired. The talk draws a comparison of lazy , eager and speculative resource acquisition strategies and indicates which resources need to be acquired eagerly ahead of their first access and which ones lazily.

Back to Schedule
Himanshu Kumar Singh

Title: Stepwise Value Flow Inference

Abstract:
We present an approach for inferring possible values of a variable, starting from a statement of interest using a stepwise backward unification. The method is based on refinement of variable-value associations, in a path- and flow-sensitive unification steps. It can be applied to infer the value flow across procedure calls. This work is currently under progress and is aimed at supporting a demand driven identification of the source of potential errors flagged by static analysis tools.

Back to Schedule
Prakash Saivasan

Title: Multipushdown systems and parity games.

Abstract:
Pushdown systems have been used to model sequential recursive programs and studied in detail. Problems such as reachability, computation of pre* and parity games have been solved. In recent years, restricted versions of systems with muliple pushdowns have been studied (bounded context switch, bounded phase, ordered multi-pushdown, bounded scope ..). We survey some of the results in this area and describe a simple inductive argument for the decidability of parity games over bounded-phase pushdown systems. We will also provide a matching lower bound for this problem.

Back to Schedule
Anirudh Santhiar

Title: Math API Discovery by Mining Unit Tests

Abstract:
Implementing algorithms from many scientific and engineering disciplines requires extensive use of functions, operators and types from mathematics. Since the Java Standard Library supports only the most rudimentary of these, users have to look to external libraries for fast and precise implementations. Choosing an API that best supports a programming task from many is difficult because of the size and heterogeneity of candidate APIs. We propose a novel technique for Math API discovery and composition by mining unit tests. Our queries are typed mathematical expressions that are interpreted on the mined test data. We introduce the notion of test frequency to rank retrieved methods, and show how to compute it using a fast and scalable parallel algorithm.

To demonstrate these techniques, we developed a tool, MathFinder, that is integrated into the Eclipse IDE and works as a high precision API search engine. A programmer can express a computation need to MathFinder as a simple query. MathFinder will return a ranked list of code snippets, perhaps from multiple target APIs. It is straightforward to complete these snippets and insert them into the user's source code.

This is joint work with Omesh Pandita and Aditya Kanade.

Back to Schedule
Shalini Kaleeswaran

Title: Learning Assertions for Program Repair

Abstract:
Debugging is an essential part of the software development cycle. Debugging involves localizing the fault and then fixing it. This work helps solve the problem of fixing a fault once it is identified. We propose to provide clues to the programmer in the form of assertions that shows how the fault is to be fixed. Learning these assertions is posed as a classification problem. We combine program verification and machine learning techniques to help in program repair.

Back to Schedule
Subhajit Roy

Title: Probabilistic Dataflow Analysis using Path Profiles on Structure Graphs

Abstract:
Speculative optimizations are increasingly becoming popular for improving program performance by allowing trans formations that benefit frequently traversed program paths. Such optimizations are based on dataflow facts which are mostly true, though not always safe. Probabilistic dataflow analysis frameworks infer such facts about a program, while also providing the probability with which a fact is likely to be true. We propose a new Probabilistic Dataflow Analysis Framework which uses path profiles and information about the nesting structure of loops to obtain improved probabilities of dataflow facts.

Back to Schedule
Chandan G

Title: Automatic Generation of Near-Optimal Data Movement Code for Distributed Heterogeneous Architectures

Abstract:
We propose a generic framework that automatically handles the task of data movement and synchronization when compiling for distributed heterogeneous architectures. Our framework allows a user to distribute computation across computing units, without the need to worry about communication between computing units. Techniques presented are oblivious to transformations applied prior to parallelization, choice of parallelization and distribution. Our framework generates communication code that minimizes the amount of data movement for a particular choice of these, problem size and number of computing units. All of this is accomplished using a combination of powerful static analysis relying on the polyhedral compiler framework and lightweight runtime routines generated using it.

Back to Schedule