Title: Fault Localization in Enterprise Applications
Abstract:
Data-centric programs primarily interact with databases to get collections
of content, process
each entry in the collection(s), and output another collection or write it
back to the database.
Bugs in data-centric programs usually show up in the form of incorrect
entries in the output
collections.
In this talk, I'll present a dynamic analysis technique for localizing
faults in data-centric programs.
In our approach, we use a precise slicing algorithm to break a faulty
execution trace into
multiple slices, such that each slice maps to an entry in the output
collection. We then compute
the "semantic" difference between the slices that correspond to correct
entries and those that
correspond to incorrect ones. The semantic differencing is a generalization
of the usual notion
of control-flow based differencing that has been previously used in fault
localization. I'll discuss
the effectiveness of this technique over a suite of real-world ABAP
programs.
This talk is based on joint work with several of my colleagues at IBM
Research, India.
Title: Interpolants as classifiers
versity of Freiburg
Abstract:
Computing good invariants is key to effective and efficient program
verification. In this talk, I will describe our experience using machine
learning techniques for computing invariants or interpolants useful for
program verification. Informally, an interpolant is a predicate that separates
good program states from bad program states, and a set of appropriately chosen
interpolants forms a proof of correctness of the program.
Our main technical insight is that interpolants in program verification can
be viewed as classifiers in supervised machine learning. This view has the
following advantages: First, we are able to use off-the-shelf classification
techniques, in particular support vector machines (SVMs), for interpolation.
Second, we show that SVMs can find relevant predicates for a number of
benchmarks. Since classification algorithms are predictive, the interpolants
computed via classification are likely to be relevant predicates or invariants.
Finally, the machine learning view also enables us to handle superficial
non-linearities. Even if the underlying problem structure is linear, the
symbolic constraints can give an impression that we are solving a non-linear
problem. Since learning algorithms try to mine the underlying structure
directly, we can discover the linear structure for such problems. We
demonstrate the feasibility of our approach via experiments over several
benchmarks from various papers on program verification.
Title: Quantifier Elimination for Linear Modular Equalities, Disequalities
and Inequalities
Abstract:
Quantifier elimination is a key operation in several program analysis
algorithms. It is used, for example, in computing strongest post-conditions
or transfer functions, in computing relational abstractions, etc. In
this talk, we focus on affine programs with fixed width registers, in which
equalities, disequalities or inequalities appear as conditions guarding
branches/loops. For analysis of such programs, eliminating quantifiers
from Boolean combinations of linear modular inequalities, disequalities
and equalities is an important task. We first discuss a layered algorithm for quantifier
elimination from a conjunction of linear modular in/dis/equalities.
Our algorithm uses a set of computationally efficient heuristics to identify
cases where the original system of constraints can be simplified without
sacrificing semantic equivalence. Experiments show that this often
leads to significant simplifications, resulting in increased efficiency in
solving the quantifier elimination problem. If none of the simplifications work,
our algorithm eliminates quantifiers using a complete (but computationally
expensive) algorithm. We then present three different approaches for extending
the quantifier elimination technique for conjunctions of constraints
to boolean combinations of constraints. We also present experimental
results comparing the performance of these approaches.
Abstract:
This talk will describe a "Towards Zero Defects" program being executed at
TRDDC. As part of the program we analyzed and categorized defects found
during system testing and defined program analysis equations to detect
these category of bugs during analysis. The aim being to systematically
increase the number of categories of defects that will be detected through
analysis. To quickly implement solvers for these equations we used PRISM,
a dataflow analysis framework that is highly scalable. Difference based
analysis and model checking was used to improve precision. We continue to
explore various abstraction techniques to achieve high precision while
analyzing several million lines of code. The talk will present the various
approaches we explored or are exploring and also what the industry hopes
to achieve through static analysis.
Abstract:
The correctness of a sequential program can be shown by the annotation of
its control flow graph with inductive assertions. We propose inductive data
flow graphs, data flow graphs with incorporated inductive assertions, as
the basis of an approach to verifying concurrent programs. An inductive
data flow graph accounts for a set of dependencies between program actions
in interleaved thread executions, and therefore stands as a representation
for the set of concurrent program traces which give rise to these dependencies.
The approach first constructs an inductive data flow graph and then checks
whether all program traces are represented. The size of the inductive data
flow graph is polynomial in the number of data dependencies (in a sense that
can be made formal); it does not grow exponentially in the number of threads
unless the data dependencies do. The approach shifts the burden of the
exponential explosion towards the check whether all program traces are
represented, i.e., to a combinatorial problem (over finite graphs).
The talk is based on joint work with Azadeh Farzan, Zachary Kincaid, University
of Toronto.
Title: Side-effecting Constraint Systems: a Swiss Army Knife for Interprocedural
Analysis of Concurrent Programs
Abstract:
Side-effecting constraint systems have originally been
introduced for the analysis of multi-threaded code.
In this talk, we show that this formalism provides one common ground
for realizing efficient interprocedural analyses of programs,
possibly with dynamic function calls, where various amounts of
context-sensitivity is used.
We also indicate that these constraint systems are well-suited to
combine such context-sensitive analyses of local properties with
flow-insensitive analyses of global properties, e.g., about
the heap and thus forms the ideal basis for building general-purpose
infra-structures for static analyzers.
One such infra-structure is the analyzer generator Goblint, maintained
at TUM that has been used for practically evaluating the approach on real-world
examples.
Joint work with: Kalmer Apinis and Vesal Vojdani.
Abstract:
In hard real-time systems, timely computation of outputs is as important as
computing the correct output values. Worst-case execution time (WCET)
analysis is a fundamental step in proving that all timing constraints of an
application are met. Given a program and a microarchitecture, the task of
WCET analysis is to determine an upper bound on the execution time of the
program on the given microarchitecture under all possible program inputs.
While this task is in general undecidable, commercial WCET analysis tools
exist. In the first part of my talk I will introduce the WCET analysis
problem and the basic structure of state-of-the-art WCET analysis tools.
Current microarchitectural developments make sound, precise, and efficient
WCET analysis increasingly difficult: contemporary processor architectures
employ deep pipelines, branch predictors, and caches to improve performance.
Further, multicore processors share buses, caches, and other resources,
introducing interference between logically-independent programs. These
features are claimed to make such processors "unpredictable". Formally
capturing the notion of predictability has so far proved to be hard. In the
second part of my talk I will present the first formalization of the notion
of predictability for caches.
br> Back to Schedule Kartik Nagar
Title: A Comprehensive Cache Analysis for
Multi-level Data caches
Abstract:
Obtaining a precise and safe estimate of the Worst
Case Execution Time(WCET) of programs is an important
problem for real-time systems. Modelling the underlying microarchitecture,
especially the caches, while determining the WCET
estimates has an enormous impact on its precision. Abstract Interpretation
based cache analysis is widely used for estimating the impact of caches on
WCET, and in our work, we propose a new extension to the classical singlelevel
cache analysis. Our work targets a multi-level data cache hierarchy with
writebacks, and we use the novel idea of partial and pseudo cache blocks to
improve the precision of Multi-level Must Analysis. We also propose a safe
way for performing May and Persistence Analysis, in the presence of
writebacks.
Abstract:
We present an automatic method for the synthesis of processes in a reactive
system from specifications in linear-time temporal logic (LTL). The synthesis
algorithm executes a loop consisting of three phases: Solve, Check, and Refine.
In the Solve phase, a candidate solution is obtained as a model of a Boolean
constraint system; in the Check phase, the candidate solution is checked for
reachable error states; in the Refine phase, the constraint system is refined
to eliminate any errors found in the Check phase. The algorithm terminates
when an implementation without errors is found. We call our approach lazy,
because constraints on possible process implementations are only considered
incrementally, as needed to rule out incorrect candidate solutions. This
contrasts with the standard eager approach, where the full specification is
considered right away. We report on experience in the arbiter synthesis for
the AMBA bus protocol, where lazy synthesis leads to significantly smaller
implementations than the previous eager approach. The talk is based on joint
work with Swen Jacobs.
Title: Software Synthesis using Automated Reasoning
Abstract:
Software synthesis is a technique for automatically generating code from
a given specification. The goal of software synthesis is to make
software development easier, while increasing both the productivity of
the programmer and the correctness of the produced code. In this talk I
will present an approach to synthesis that relies on the use of
automated reasoning and decision procedures. First I will describe how
to generalize decision procedures into predictable and complete
synthesis procedures. Here completeness means that the procedure is
guaranteed to find code that satisfies the given specification. I will
illustrate the process of turning a decision procedure into a synthesis
procedure using linear integer arithmetic as an example. This approach
is implemented in Comfusy. Comfusy is an extension to the Scala compiler
and a tool for complete functional synthesis. It accepts as an input
expressions with input and output variables and outputs code that
computes output values as a function of input values. In addition, it
also outputs the preconditions that the input variables have to satisfy
in order for a solution to exist. Next I will outline a synthesis
procedure for specifications given in the form of type constraints. The
procedure derives code snippets that use given library functions. The
procedure takes into account polymorphic type constraints as well as
code behavior.We implemented this approach in a tool called Insynth. To
help developers in composing functionality from existing libraries and
from the user defined values, InSynth synthesizes and suggests valid
expressions of a given type at a given program point.
Abstract:
Most symbolic bug detection techniques perform search over the program
control flow graph based on either forward symbolic execution or backward
weakest preconditions computation. The complexity of determining
inter-procedural all-path feasibility makes it difficult for such analysis
to judge up-front whether the behavior of a particular caller or callee
procedure is relevant to a given property violation. Consequently, these
methods analyze several program fragments irrelevant to the property, often
repeatedly, before arriving at a goal location or an entrypoint, thus
wasting resources and diminishing their scalability.
In this talk I will present a systematic and scalable technique for focused
bug detection which, starting from the goal function, employs alternating
backward and forward exploration on the program call graph to lazily infer
a small scope of program fragments, sufficient to detect the bug or show
its absence. The method learns caller and callee invariants for procedures
from failed exploration attempts and uses them to direct future exploration
towards a scope pertinent to the violation. This is a joint work with N.
Singhania, S. Chandra and M. Sridharan at IBM Research Labs.
Title: Testing and Analysis of Data Format Compatibility of Programs
Abstract:
Modular design and composition of programs is a scalable methodology
for development of large software systems. Correctness of the composed
system depends on compatibility of the component programs. In many
software systems, the programs manipulate and exchange complex data
like network packets and files. Checking compatibility of such
programs involves comparing the data formats followed by them. In this
paper, we motivate the problem of data format compatibility with
examples. We present a testing approach for addressing this problem.
The approach involves static program inversion and dynamic constraint
solving. We also discuss a static analysis approach for inference of
data formats embedded in programs and apply it to formalize and check
compatibility of producer-consumer and version-related programs.
Abstract:
Precise flow- and context-sensitive pointer analysis (FCPA) is generally
considered prohibitively expensive for large programs; most tools relax
one or both of the requirements for scalability. We argue that precise
FCPA has been over-harshly judged---the vast majority of points-to pairs
calculated by existing algorithms are never used by any client analysis
or transformation because they involve dead variables. We therefore
formulate a FCPA in terms of a joint points-to and liveness analysis
which we call L-FCPA. Our analysis computes points-to information only
for live pointers and its propagation is sparse (restricted to live
ranges of respective pointers). Further, our analysis
- uses strong liveness, effectively including dead code elimination,
- calculates must-points-to information from may-points-to information
afterwards instead of using a mutual fixed-point, and
- uses value-based termination of call strings during interprocedural
analysis (which reduces the number of call strings significantly).
We implemented a naive L-FCPA in GCC-4.6.0 using linked lists.
Evaluation on SPEC2006 showed significant increase in the precision of
points-to pairs compared to GCC's analysis. Interestingly, our naive
implementation turned out to be faster than GCC's analysis for all
programs under 30kLoC. Further, L-FCPA showed that fewer than 4% of
basic blocks had more than 8 points-to pairs. We conclude that the
usable points-to information and the required context information is
small and sparse and argue that approximations (e.g. weakening flow or
context sensitivity) are not only undesirable but also unnecessary for
performance.
Reference:
Uday P. Khedker, Alan Mycroft, and Prashant Singh Rawat.
Liveness Based Pointer Analysis.
International Static Analysis Symposium (SAS 2012). France 2012.
(This supercedes the earlier paper Lazy Pointer Analysis that was
posted in the CM Computing Research Repository, abs/cs/1112.5000.)
Title: Improving the precision of a scalable, demand-driven
null-dereference analysis
Abstract:
We describe an approach and a tool for sound, demand-driven null
dereference analysis for Java programs, via over-approximated weakest
precondition analysis. The primary focus of this talk is on optimizations
that we introduce for efficient analysis, that allow long program paths to
be traversed, and hence higher precision to be achieved. One of our major
optimizations is to bypass certain expensive-to-analyze constructs, such as
virtual calls with too many possible targets, by directly transferring
dataflow facts from points after the construct to points before along
certain kinds of def-use edges. A second major optimization is to use
manually constructed summaries of Java container class methods, rather than
analyze the code of the method directly. A third optimization involves
reasoning about arbitrarily long access paths that may arise in a program
due to traversal of recursive data structures, by using a finite
closed-form representation of these access paths. We demonstrate that our
optimizations result in a 48% reduction in false positives over the
baseline, without significant impact on running time.
Title: A Transformation Framework for Optimizing Task-Parallel Programs
Abstract:
Task parallelism has increasingly become a trend with programming models
such as OpenMP 3.0, Cilk, Java Concurrency, X10, Chapel and Habanero-Java
(HJ) to address the requirements of multicore programmers. While task
parallelism increases productivity by allowing the programmer to express
multiple levels of parallelism, it can also lead to performance
degradation due to increased overheads. In this paper, we introduce a
transformation framework for optimizing task-parallel programs with a
focus on task creation and task termination operations. Our framework
includes a definition of data dependence in task-parallel programs, a
happens-before analysis algorithm, and a range of program transformations
for optimizing task parallelism. Further, we discuss the impact of
exception semantics on the specified transformations, and extend them to
handle task-parallel programs with precise exception semantics.
Title: Tiling Stencil Computations to Maximize Parallelism
Abstract:
Most stencil computations inherently have the property of tile-wise
concurrent start, i.e., there always exists a face of the iteration space
and a set of tiling hyperplanes such that all tiles along that face can be
started concurrently. This provides load balance and maximizes parallelism.
However, existing automatic transformation tiling frameworks
often choose hyperplanes that enforce pipelined start-up which not only
leads to load imbalance, but also reduces the number of tiles that can
be executed in parallel. We address this issue with a new tiling technique
that ensures concurrent start-up as well as perfect load-balance
whenever possible.
We first provide necessary and sufficient conditions on tiling hyperplanes
to enable concurrent start of a statement in any program with
affine data accesses. We then provide a sound and complete approach
to find such hyperplanes for stencil computations. Experimental evaluation
on a 12-core IntelWestmere multicore shows that our code is able
to outperform a tuned domain-specific stencil code generator by about
4% to 20%. In other cases, we outperform previous compiler techniques
by a factor of 2X to 16.4X.
Title: Verifying Temporal Properties of Multi-pushdown Systems
Abstract:
Multipushdown systems provide a natural formal model for recursive
multi-threaded programs. In this talk we survey some of the recent
results on the verification of restricted classes of multi-pushdown
systems.
Abstract:
A class of languages C is perfect if it is closed under Boolean operations and
the emptiness problem is decidable. Perfect language classes are the basis for
the automata-theoretic approach to model checking: a system is correct if the
language generated by the system is disjoint from the language of bad traces.
Regular languages are perfect, but because the disjointness problem for CFLs is
undecidable, no class containing the CFLs can be perfect.
In practice, verification problems for language classes that are not perfect
are often under-approximated by checking if the property holds for all
behaviors of the system belonging to a fixed subset. A general way to
specify a subset of behaviors is by using bounded languages (languages of the
form w1* ... wk* for fixed words w1,...,wk). A class of languages C is perfect
modulo bounded languages if it is closed under Boolean operations relative to
every bounded language, and if the emptiness problem is decidable relative to
every bounded language.
We consider finding perfect classes of languages modulo bounded languages. We
show that the class of languages accepted by multi-head pushdown automata are
perfect modulo bounded languages, and characterize the complexities of decision
problems. We show that computations of several known models of
systems, such as recursive multi-threaded programs, recursive counter machines,
and communicating finite-state machines can be encoded as multi-head pushdown
automata, giving uniform and optimal underapproximation algorithms modulo
bounded languages.
Title: Scalable and Precise Refinement of Cache Timing Analysis via Model Checking
Abstract:
Hard real time systems require absolute guarantees
in their execution times. Worst case execution time
(WCET) of a program has therefore become an important
problem to address. However, performance enhancing features
of a processor (e.g. cache) make WCET analysis a difficult
problem. In this paper, we propose a novel approach of
combining abstract interpretation and model checking for
different varieties of cache analysis ranging from single to
multi-core platforms. Our modeling is used to develop a precise
yet scalable timing analysis method on top of the Chronos
WCET analysis tool. Experimental results demonstrate that we
can obtain significant improvement in precision with reasonable
analysis time overhead.
Title: Model Checking in Presburger Counter Systems using Accelerations.
Abstract:
Model checking is a powerful technique for analyzing reachability and temporal
properties of finite state systems. Many practical systems are naturally modeled
as counter systems, where the domain of counter values is possibly infinite. We
identify a class of counter systems, and propose a new technique to check whether
a system from this class satisfies a given CTL formula. The key novelty of our
approach is a way to use existing reachability analysis techniques to answer
both ``until'' and ``global'' properties. We also give some results from
applying our approach to several natural examples, which illustrates the scope
of our approach.
Abstract:
Choice of resource acquisition patterns such as lazy acquisition and eager
acqusition can have an impact on the response time and/or scalability of
an application. Lazy acquisition pattern is conservative as resources are
acquired on demand only, on the first access, which can result in a task
waiting and a poor response. On the other hand, in eager acquisition,
no waiting may be necessary on the first access, while all the resources
acquired ahead may not be accessed eventually. Our work presents a simple
algorithm, to compute statically the probability with which a program unit
P accesses a resource r , given branch probabilities. Branch probabilities
may be available through developer annotations or usage profiles. This
results in eagerly acquiring, at run-time, only those resources that are
required with high probabilities minimizing the the likelihood of not
accessing acquired resources at all. To achieve this, at run-time
only those resources required with a probability greater than a chosen
threshold value are speculatively acquired. The talk draws a comparison
of lazy , eager and speculative resource acquisition strategies
and indicates which resources need to be acquired eagerly ahead of their
first access and which ones lazily.
Abstract:
We present an approach for inferring possible values of a variable, starting
from a statement of interest using a stepwise backward unification. The
method is based on refinement of variable-value associations, in a path- and
flow-sensitive unification steps. It can be applied to infer the value flow
across procedure calls.
This work is currently under progress and is aimed at supporting a demand
driven identification of the source of potential errors flagged by static
analysis tools.
Abstract:
Pushdown systems have been used to model sequential recursive programs and
studied in detail. Problems such as reachability, computation of pre* and
parity games have been solved. In recent years, restricted versions of
systems with muliple pushdowns have been studied (bounded context switch,
bounded phase, ordered multi-pushdown, bounded scope ..). We survey some
of the results in this area and describe a simple inductive argument for
the decidability of parity games over bounded-phase pushdown systems. We
will also provide a matching lower bound for this problem.
Abstract:
Implementing algorithms from many scientific and engineering
disciplines requires extensive use of functions, operators and types
from mathematics. Since the Java Standard Library supports only the
most rudimentary of these, users have to look to external libraries
for fast and precise implementations. Choosing an API that best
supports a programming task from many is difficult because of the size
and heterogeneity of candidate APIs. We propose a novel technique for
Math API discovery and composition by mining unit tests. Our queries
are typed mathematical expressions that are interpreted on the mined
test data. We introduce the notion of test frequency to rank retrieved
methods, and show how to compute it using a fast and scalable parallel
algorithm.
To demonstrate these techniques, we developed a tool, MathFinder, that
is integrated into the Eclipse IDE and works as a high precision API
search engine. A programmer can express a computation need to
MathFinder as a simple query. MathFinder will return a ranked list of
code snippets, perhaps from multiple target APIs. It is
straightforward to complete these snippets and insert them into the
user's source code.
This is joint work with Omesh Pandita and Aditya Kanade.
Abstract:
Debugging is an essential part of the software development cycle.
Debugging involves localizing the fault and then fixing it. This work helps
solve the problem of fixing a fault once it is identified. We propose to
provide clues to the programmer in the form of assertions that shows how
the fault is to be fixed. Learning these assertions is posed as a
classification problem. We combine program verification and machine learning
techniques to help in program repair.
Title: Probabilistic Dataflow Analysis using Path Profiles on
Structure Graphs
Abstract:
Speculative optimizations are increasingly becoming popular for
improving program performance by allowing trans formations that benefit
frequently traversed program paths. Such optimizations are based on
dataflow facts which are mostly true, though not always safe.
Probabilistic dataflow analysis frameworks infer such facts about a
program, while also providing the probability with which a fact is
likely to be true. We propose a new Probabilistic Dataflow Analysis
Framework which uses path profiles and information about the nesting
structure of loops to obtain improved probabilities of dataflow facts.
Title: Automatic Generation of Near-Optimal Data Movement Code for Distributed
Heterogeneous Architectures
Abstract:
We propose a generic framework that automatically handles the task of
data movement and synchronization when compiling for distributed heterogeneous
architectures. Our framework allows a user to distribute computation across
computing units, without the need to worry about communication between
computing units. Techniques presented are oblivious to transformations applied
prior to parallelization, choice of parallelization and distribution. Our
framework generates communication code that minimizes the amount of data
movement for a particular choice of these, problem size and number of computing
units. All of this is accomplished using a combination of powerful static
analysis relying on the polyhedral compiler framework and lightweight runtime
routines generated using it.