Events 

Seminars 

Prof. I. G. Sarma Memorial Lecture Series 

Prof. Priti Shankar Memorial Seminar Series 

Multimedia Archive 



UPCOMING SEMINARS 

Series: M.Sc. (Engg) Colloquium Title: Large Scale Graph Processing in a Distributed Environment  Speaker: Nitesh Upadhyay
 Faculty Advisor: Y.N. Srikant
 Date and Time: Thursday, June 01, 2017, 12:00 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Graph algorithms are ubiquitously used across domains. They exhibit parallelism,
which can be exploited on parallel architectures, such as, multicore processors
and accelerators. However, real world graphs are massive in size and cannot fit
into the memory of a single machine. Such large graphs are partitioned and
processed in a distributed cluster environment which consist of multiple GPUs
and CPUs.
Existing frameworks that facilitate large scale graph processing in the distributed
cluster have their own style of programming and require extensive involvement by
the user in communication and synchronization aspects. Adaptation of these
frameworks appears to be an overhead for a programmer. Furthermore, these frameworks
have been developed to target only CPU clusters and lack the ability to harness the
GPU architecture.
We provide a backend framework to the graph Domain Specific Language, Falcon,
for large scale graph processing on CPU and GPU clusters. The Motivation behind
choosing this DSL as a frontend is its sharedmemory based imperative programmability
feature. Our framework generates Giraph code for CPU clusters. Giraph code runs
on the Hadoop cluster and is known for scalable and faulttolerant graph processing.
For GPU cluster, Our framework applies a set of optimizations to reduce computation
and communication latency, and generates efficient CUDA code coupled with MPI.
Experimental evaluations show the scalability and performance of our framework for
both CPU and GPU clusters. The performance of the framework generated code is
comparable to the manual implementations of various algorithms in distributed
environments.
Speaker Bio: Nitesh Upadhyay is an M.Sc(Engg.) student in the CSA department.
Host Faculty: Y.N. Srikant
 Series: M.Sc. (Engg) Colloquium Title: HAShCache: Heterogeneity Aware Shared DRAMCache for Integrated Heterogeneous Systems  Speaker: Adarsh Patil
 Faculty Advisor: R. Govindarajan
 Date and Time: Wednesday, May 31, 2017, 11:00 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Integrated Heterogeneous System (IHS) processors pack throughput oriented GPGPUs
alongside latency oriented CPUs on the same die sharing certain resources, e.g.,
shared last level cache, networkonchip (NoC), and the main memory. The demands for
memory accesses and other shared resources from GPU cores can exceed that of CPU
cores by 2 to 3 orders of magnitude. This disparity poses significant problems in
exploiting the full potential of these architectures.
In this work, we propose adding a large capacity stacked DRAM, used as a shared last
level cache, for the IHS processors. However, adding the DRAMCache naively, leaves
significant performance on the table due to the disparate demands from CPU and GPU
cores for DRAMCache and memory accesses. In particular, the imbalance can
significantly reduce the performance benefits that the CPU cores would have
otherwise enjoyed with the introduction of the DRAMCache, necessitating a
heterogeneity aware management of this shared resource for improved performance.
Further, in this work we propose three simple techniques to enhance the performance
of CPU application while ensuring very little to no performance impact to the GPU.
Specifically, we propose (i) PrIS, a prioritization scheme for scheduling CPU
requests at the DRAMCache controller (ii) ByE, a selective and temporal bypassing
scheme for CPU requests at the DRAMCache (iii) Chaining, an occupancy controlling
mechanism for GPU lines in the DRAMCache through pseudoassociativity. The resulting
cache, HAShCache, is heterogeneityaware and can adapt dynamically to address the
inherent disparity of demands in an IHS architecture.
We enhance the gem5gpu simulator to model an IHS architecture with a stacked DRAM
as cache, complete with coherence and unified physical memory. Using this setup we
perform detailed experimental evaluation of the proposed HAShCache which results in
an average system performance improvement of 41% over a naive DRAMCache and over
200% improvement over a baseline system with no stacked DRAMCache.
Speaker Bio: Adarsh Patil is an M.Sc[Engg] student of CSA. He has obtained his B.Tech
from M.S.Ramaiah Institute of Technology and has 2 years industry experience
at Goldman Sachs in the Virtualization and Linux engineering team.


PAST SEMINARS 

Series: CSA Distinguished Lecture Title: Old and New Algorithms for Guarding Polygons  Speaker: Prof. Subir Kumar Ghosh
Professor
Ramakrishna Mission Vivekananda University  Date and Time: Thursday, May 25, 2017, 10:30 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract The art gallery problem is to determine the number of guards (or
cameras) that are sufficient to cover or see every point in the interior
of an art gallery. An art gallery can be viewed as a polygon P (with or
without holes) with a total of n vertices, and guards as points in P. Any
point z in P is said to be visible from a guard g if the line segment
joining z and g does not intersect the exterior of P. This problem was
first posed by Victor Klee in a conference in 1973, and in the course of
time, it has become one of the important problems in computational
geometry with extensive applications to surveillance of buildings like
airport terminals, railway stations etc.
A polygon is said to be guarded by a set of chosen guards if every interior
point of P is visible from some guard within the guard set. In this lecture,
we will first present an O(log n)approximation algorithm to find the
minimum number of guards necessary for guarding P, which was designed by
Ghosh way back in 1986. For this problem, the approximation ratio was
improved to O(loglog OPT) by King and Kirkpatrickin 2011. Ghosh had also
conjectured in 1986 that a constantfactor approximation algorithm exists
for this problem in case of polygons without holes. This conjecture was
settled recently for a special class of P (without holes) by Bhattacharya,
Ghosh and Roy, when they presented a 6approximation algorithm for this
problem. An outline of this algorithm will be provided in the lecture.
Chromatic art gallery problems arise from applications in areas like
landmarkbased navigation and wireless networks. One such problem is the
weakconflict free guarding of polygons, where the objective is to colour
a guard set for P using minimum number of colours such that each point
within P sees at least one guard with an unique colour. So finally, we
will present an algorithm for weak conflictfree guarding of P (without
holes)where the guard set is coloured using only O(log n) colours.
Speaker Bio: Subir Kumar Ghosh was a professor of computer science and discrete applied
mathematics at Tata Institute of Fundamental Research, Mumbai till July,
2015. Since then, he is a professor at RKM Vivekananda University, Belur,
West Bengal. He is a Fellow of Indian Academy of Sciences since 1998.
Recently, he has been awarded a professorship by the National Board for
Higher Mathematics for his lifetime contributions to Mathematics. He
worked
as visiting scientists at many reputed universities and research institutes
around the world, and gave several invited lectures in the national and
international conferences, workshops and in the institutes in India and
abroad. Currently, he is the Chair of the Steering Committee of an
international conference on algorithms and discrete applied mathematics
(CALDAM). He is the author of a book and around sixty six papers in the
fields of computational geometry, robot path planning, geometric graph
theory and algorithms (sequential, parallel, online and approximation).
Host Faculty: Prof. Sathish Govindarajan
 Series: M.Sc. (Engg) Colloquium Title: COMPILATION OF GRAPH ALGORITHMS FOR HYBRID, CROSSPLATFORM AND DISTRIBUTED ARCHITECTURES  Speaker: Parita Patel
 Faculty Advisor: Y.N. Srikant
 Date and Time: Tuesday, May 23, 2017, 12:00 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Graph algorithms are abundantly used in various disciplines. These algorithms
perform poorly due to random memory access and negligible spatial locality. In
order to improve performance, parallelism exhibited by these algorithms can be
exploited by leveraging modern high performance parallel computing resources.
Implementing graph algorithms for these parallel architectures requires manual
thread management and memory management which becomes tedious for a programmer.
Large scale graphs cannot fit into the memory of a single machine. One solution
is to partition the graph either on multiple devices of a single node or on
multiple nodes of a distributed network. All the available frameworks for such
architectures demand unconventional programming which is difficult and error
prone.
To address these challenges, we propose a framework for compilation of graph
algorithms written in an intuitive graph domainspecific language, Falcon. The
framework targets shared memory parallel architectures, computational
accelerators and distributed architectures (CPU and GPU cluster). First, it
analyses the abstract syntax tree (generated by Falcon) and gathers essential
information. Subsequently, it generates optimized code in OpenCL for
sharedmemory parallel architectures and computational accelerators, and OpenCL
coupled with MPI code for distributed architectures. Motivation behind generating
OpenCL code is its platformagnostic and vendoragnostic behavior, i.e., it is
portable to all kinds of devices. Our framework makes memory management, thread
management, message passing, etc., transparent to the user. To the best of our
knowledge, none of the available domainspecific languages, frameworks or parallel
libraries handle portable implementations of graph algorithms.
Experimental evaluations demonstrate that the generated code performs comparably
to the stateoftheart nonportable implementations and handtuned implementations.
The results also show portability and scalability of our framework.
Speaker Bio: Parita Patel is an M.Sc(Engg.) student in the CSA department.
Host Faculty: Y.N. Srikant
 Series: Department Seminar Title: Deep Learning made easy with DARVIZ  Speaker: Shreya Khare and Anush Sankaran
 Date and Time: Thursday, May 18, 2017, 3:30 PM
 Venue: CSA Lecture Hall (Room No. 117, Ground Floor)
Abstract Struggling with implementing research papers in deep learning? Confused
between which deep learning framework to use  Tensorflow, Theano,
CAFFE, etc? Wasting hours of precious time in debugging code for building
the right deep learning model? You will not longer face these challenges!
DARVIZ. Its a visual programming IDE, where you could design a deep learning
model using an intuitive draganddrop framework. Once designed, DARVIZ
could write the execution ready deep learning code for you in three platforms
 Tensorflow, CAFFE, and Theano. This tool helps in extreme rapid prototyping
of deep learning models and implementation of state of art papers.
Speaker Bio: Shreya Khare is software engineer working in the area of deep learning and automation at IBM Research in Bangalore, India. She is currently working on DARVIZ, a deep learning IDE for implementing, visualizing and validating deep learning models. Her research interests include Machine Learning, Speech Processing, Computer Vision and Natural Language Processing. She has completed her MS by Research from Department of Computer Science, IIT Madras.
Anush Sankaran is a Researcher in the Cognitive Solutions and Services team. His research interests include deep learning, image processing, human cognition, and their applications. He is now primarily leading efforts in two projects: DARVIZ and Machine Learning for Creativity. He is currently pursuing the Ph.D. degree with the Indraprastha Institute of Information Technology, New Delhi, India. He was a recipient of the TCS Ph.D. Research Fellowship from 2010 to 2015. He has written many peerreviewed conferences and journals and also has the Best Poster Awards in the IEEE BTAS 2013 and the IEEE IJCB 2014. Anush received the B.Tech. degree (Gold Medal) in computer science from the Coimbatore Institute of Technology, Coimbatore, India, in 2010.
Host Faculty: Aditya Kanade
 Series: Department Seminar Title: Software Transactional Memory Systems for Processing LargeScale Graph Analytics  Speaker: Sathya Peri
 Date and Time: Wednesday, May 17, 2017, 3:30 PM
 Venue: CSA Multimedia Class (Room No. 252, First Floor)
Abstract Nowadays, multicore systems are quite ubiquitous from cell phones to
high performance computation servers. These systems were introduced
around 2004, 50 years of exponential improvement in the performance of
sequential computers ended. In order to get a continued speedup on
these processors, applications need to be able to harness the
parallelism of the underlying hardware. This is commonly achieved
using multithreading. Yet writing correct and scalable multithreaded
programs is far from trivial. In multithreaded programs sets of
semantically related actions may need to execute in mutual exclusion
to avoid semantic inconsistencies.
Traditionally, multithreaded programs were developed in conjunction
with locks to address these issues. But programming with locks has
many disadvantages such as deadlocks, priority inversion etc. and
makes it difficult to build scalable software systems. Importantly,
lock based software components are difficult to compose i.e. build
larger software systems using simpler software components. In recent
years, Software Transactional Memory (STM) has garnered significant
interest as an elegant alternative for developing concurrent
code. Software transactions address many of the shortcomings of lock
based systems. Multithreaded programs used with STMs address many of
these challenges.
In the past few years, analytics over large data, (also commonly
referred to as BigData Analytics) has become an very important in
various industries. Many of the problems of large data and datamining
can be cast as problems of largescale graph analytics problems. This
would typically involve looking for operations on large graphs having
millions of vertices and edges.
In this talk, I will give an overview of Software Transactional
Memory (STMs). I will describe the correctness requirements of STM
systems, describe its current state. Then, I will briefly describe how
STMs can benefit largescale graph analytics. I will specifically
describe about how STMs can benefit the problem of graph coloring
having millions of vertices.
Speaker Bio: Dr. Sathya Peri is an Associate Professor at IIT Hyderabad. He
received his masters in Computer Science from Madurai Kamaraj University, India in
2001. After working as a software engineer at HCL Technologies, India
for around one year he joined University of Texas at Dallas as a
graduate student. He obtained Masters degree and Ph.D. degrees in
Computer Science from the University of Texas at Dallas in 2004 and
2007 respectively. He worked under the guidance of Prof. Neeraj Mittal
in the area of monitoring distributed systems. Then he worked at
INRIA, Rennes, France as a postdoc from Sept 2007 to Dec 2008 in the
area of Gossip Protocols for Peer to Peer systems. From Jan 2009 to
Apr 2010, he was a Postdoc at Memorial University working with
Prof. Vidyasankar in the area of Software Transactional Memory.
Host Faculty: Deepak D'Souza
 Series: M.Sc. (Engg) Colloquium Title: An Exploratory Framework for Cyclone Identification and Tracking  Speaker: Akash Anil Valsangkar
 Faculty Advisor: Prof. Vijay Natarajan
 Date and Time: Wednesday, May 17, 2017, 11:00 AM
 Venue: CSA Multimedia Class (Room No. 252, First Floor)
Abstract Analyzing depressions plays an important role in meteorology, particularly
in the study of cyclones. In particular, the study of the temporal evolution
of cyclones requires a robust depression tracking framework. Depressions
and cyclones are not welldefined objects and their shape and size
characteristics change over time. Further, current extraction and tracking
methods depend on many parameters and are often specialized for particular
regions. In visualization literature, most of the tracking algorithms use
spatial analysis for feature identification and noise removal. With state
of the art methods, temporal noise leads to cluttered visualization.
We propose a pipeline for identification and tracking of cyclones over time.
Our method requires only a few intuitive parameters. This method for cyclone
identification within each time step is based on wellestablished topological
concepts enabling a robust tracking. Candidate tracks are computed from an
optical flow field. These tracks are clustered within a moving time window
and forwarded to a final tracking step. An exploratory framework helps in
the study of cyclone movement and identifying smooth, representative tracks.
Multiple case studies demonstrate the effectiveness of the method in
tracking cyclones, both in the northern and southern hemisphere.
 Series: M.Sc. (Engg) Thesis Defense Title: Utilizing Worker Groups and Task Dependencies in Crowdsourcing  Speaker: Mr. Prakhar Ojha
 Faculty Advisor: Dr. Partha Pratim Talukdar
 Date and Time: Monday, May 08, 2017, 11:00 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Crowdsourcing has evolved from solving simpler tasks, like
imageclassification, to more complex tasks such as document editing,
language translation, product designing etc. Unlike microtasks performed
by a single worker, these complex tasks require a team of workers and group
output is often the atomic unit of evaluation. If the taskrequester is
interested in making individual payments based on their respective efforts
in the group, she will need a strategy to discriminate between workers (who
contribute positively) from idlers (who do not contribute to group task).
It is a nontrivial problem, especially since only group output is
evaluated. In the first part of this talk, I shall demonstrate how ideas
from Group Testing may be effectively used for this challenging problem.
This is the first application of Group Testing to crowdsourcing. I shall
also list out open problems in this area.
In the second part of the talk, I shall focus on Relational Crowdsourcing
(RelCrowd), a novel crowdsourcing paradigm where human intelligence tasks
(HITs) are created by taking task interdependencies. I shall describe how
this framework may be used in the evaluation of large knowledge graphs
(KG), we call the resulting system KGeval. Estimating accuracy of
automatically constructed KGs is a challenging problem due to their size
and diversity. We shall show that the objective optimized by KGEval is
submodular and NPhard, thereby allowing for approximation algorithms with
guarantees. Through experiments on realworld datasets, our results
demonstrate that KGEval is able to estimate KG accuracy more accurately
compared to other competitive baselines, while requiring significantly
lesser number of human evaluations.
 Series: CSA Distinguished Lecture Title: The Magnificent Seven  Speaker: Prof. Vivek Shripad Borkar
Institute Chair Professor
Department of Electrical Engineering
Indian Institute of Technology, Bombay  Date and Time: Thursday, May 04, 2017, 2:30 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract This talk will give an overview of restless bandits and then
will introduce individually, seven restless bandits. The summary in
brief is:
Bandits are a pest
if they don't get much rest,
so do relax a little
your constraints a la Whittle.
Then all that it takes
is a simple index.
Speaker Bio: Prof. Vivek Borkar has made stellar contributions in the areas
of stochastic control and optimization, reinforcement learning, random
processes, and several other areas in control and optimization. He is a
J.C. Bose National Fellow and a Fellow of IEEE, INSA, IASc, INAE, NASI, and
TWAS. He is a recipient of several best paper awards and is widely known
for many pathbreaking fundamental results in the above areas.
Host Faculty: Prof. Y Narahari
 Series: M.Sc. (Engg) Thesis Defense Title: Stochastic Newton methods with enhanced Hessian estimation  Speaker: Mr. Danda Sai Koti Reddy
M.Sc (Engg.) student at CSA department  Faculty Advisor: Prof. Shalabh Bhatnagar
 Date and Time: Thursday, May 04, 2017, 11:00 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Optimization problems involving uncertainties are common in a variety
of engineering disciplines such as transportation systems, manufacturing,
communication networks, healthcare and finance. The large number of input
variables and the lack of a system model prohibit a precise analytical
solution and a viable alternative is to employ simulationbased optimization.
The idea here is to simulate a few times the stochastic system under consideration
while updating the system parameters until a good enough solution is obtained.
Formally, given only noisecorrupted measurements of an
objective function, we wish to find a parameter which minimizes the
objective function. Iterative algorithms using statistical methods search
the feasible region to improve upon the candidate parameter. Stochastic
approximation algorithms are best suited, most studied and applied
algorithms for finding solutions when the feasible region is a
continuously valued set. One can use information on the gradient/Hessian
of the objective to aid the search process. However, due to the lack of
knowledge of the noise distribution, one needs to estimate the
gradient/Hessian from noisy samples of the cost function obtained from
simulation. Simple gradient search schemes take many iterations to
converge to a local minimum and are heavily dependent on the choice of
stepsizes. Stochastic Newton methods, on the other hand, can counter the
illconditioning of the objective function as they incorporate
secondorder information into the stochastic updates. Stochastic Newton
methods are often more accurate than simple gradient search schemes.
We propose enhancements to the Hessian estimation scheme used in
two recently proposed stochastic Newton methods, based on ideas of
random directions stochastic approximation (2RDSA) and simultaneous
perturbation stochastic approximation with three function measurements
(2SPSA3), respectively. The proposed scheme reduces the error in the
Hessian estimate by (i) incorporating a zeromean feedback term; and (ii)
optimizing the stepsizes used in the Hessian recursion. We prove that
both 2RDSA and 2SPSA3 with our Hessian improvement procedure converge
asymptotically to the true Hessian. The advantage with both 2RDSA and
2SPSA3 is that they require only 75% of the simulation cost periteration
for 2SPSA with improved Hessian estimation (2SPSAIH). Numerical
experiments show that 2RDSAIH outperforms both 2SPSAIH and 2RDSA without
the improved Hessian estimation procedure.
 Series: Department Seminar Title: Practical Support for Strong, SerializabilityBased Memory
Consistency  Speaker: Dr. Swarnendu Biswas
Postdoctoral fellow
working with Prof. Keshav Pingali, UT, Austin  Date and Time: Wednesday, May 03, 2017, 11:30 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Writing correct and scalable multithreaded programs using sharedmemory
is hard because of improper synchronization, which leads to data races.
Data races often indicate the presence of concurrency bugs, and have led
to several high profile failures in the past, such as the Therac25
accident, the 2003 Northeastern blackout, and the 2012 glitch in NASDAQ
Facebook share prices. Unfortunately, current sharedmemory languages
and systems provide weak endtoend consistency guarantees in the
presence of data races. This problem of unsatisfactory semantics for
racy executions needs to be resolved, but unfortunately efficient
solutions have remained elusive even after years of research.
This talk describes our ongoing research to provide strong endtoend
memory consistency models for sharedmemory languages and systems. I
will present softwareonly and hardwarebased approaches which provide
serializability of synchronizationfree regions to all program
executions. A feature of our work is that it allows us to rethink the
design of other system features  such as cache coherence. Our work
advances the state of the art by making the designs practical and
showing that strong memory consistency is achievable.
Speaker Bio: Swarnendu Biswas is currently a postdoctoral fellow working with Prof.
Keshav Pingali at UT Austin. He did his PhD with Dr. Michael Bond at The
Ohio State University. His graduate research work involved developing
program analyses for detecting and ensuring concurrency correctness
properties in multithreaded programs and memory models. He is currently
working on approximate and highperformance computing. His work has been
recognized with the ACM SRC Grand Finals 2016 Award, OOPSLA
Distinguished Paper Award and OOPSLA Distinguished Artifact Award.
Swarnendu received an MS degree from IIT Kharagpur, and did his BE from
NIT Durgapur.
Host Faculty: Dr. Murali Krishna Ramanathan
 Series: Department Seminar Title: The Construction of Semidefinite Representation of Convex Set from
its Projections  Speaker: Dr. Anusuya Ghosh
Postdoctoral fellow in IIM Bangalore  Date and Time: Tuesday, May 02, 2017, 10:00 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract The construction methods of semidefinite representation of
convex sets draw a major role in recent research on modern convex
optimization. As we know, if a convex set is semidefinite representable,
there is no specific way to construct its semidefinite representation. This
work deals with a construction method of semidefinite representation of
convex body if its projections on lowerdimensional space are known. An
algorithm has been proposed and an approximate solution has been achieved
for such construction. Further we show that the approximate solution
converges to the convex set. We consider a detailed comparison of our
construction method with other methods (which are available in recent
literature).
Speaker Bio: Anusuya Ghosh is currently working as Postdoctoral fellow in IIM
Bangalore. She received her PhD in Industrial Engineering and Operations
Research from IIT Bombay. She also received "Best Project Award" from
department of mathematics in IIT Kharagpur in 2009 (during M. Sc.).
Host Faculty: Chiranjib Bhattacharyya
 Series: M.Sc. (Engg) Colloquium Title: Fast Actively Secure OT Extension for Short Secrets  Speaker: Mr Ajith S
M.Sc. (Engg) Student
Dept. of CSA
IISc  Faculty Advisor: Dr Arpita Patra
 Date and Time: Monday, May 01, 2017, 11:00 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Oblivious Transfer (OT) is one of the most fundamental
cryptographic primitives with widespread application in general secure
multiparty computation (MPC) as well as in a number of tailored and
specialpurpose problems of interest such as private set intersection
(PSI), private information retrieval (PIR), contract signing to name a few.
Often the instantiations of OT require prohibitive communication and
computation complexity. OT extension protocols are introduced to compute a
very large number of OTs referred as extended OTs at the cost of a small
number of OTs referred as seed OTs.
We present a fast OT extension protocol for small secrets in the active
setting. Our protocol when used to produce 1outofn OTs outperforms all
the known actively secure OT extensions. Our protocol is built on the
semihonest secure extension protocol of Kolesnikov and Kumaresan of
CRYPTO’13 (referred as KK13 protocol henceforth) which is the bestknown OT
extension for short secrets. At the heart of our protocol lies an efficient
consistency checking mechanism that relies on the linearity of Walsh
Hadamard (WH) codes. Asymptotically, our protocol adds a communication
overhead of O(µ log κ) bits over KK13 protocol irrespective of the number
of extended OTs, where κ and µ refer to computational and statistical
security parameter respectively. Concretely, our protocol when used to
generate a large enough number of OTs adds only 0.0110.028% communication
overhead and 46% runtime overhead both in LAN and WAN over KK13 extension.
The runtime overheads drop below 2% when in addition the number of inputs
of the sender in the extended OTs is large enough.
As an application of our proposed extension protocol, we show that it can
be used to obtain the most efficient PSI protocol secure against a
malicious receiver and a semihonest sender.
 Series: M.Sc. (Engg) Colloquium Title: A Study of Thompson Sampling Approach for
Sleeping MultiArmed Bandit Problems  Speaker: Mr. Aritra Chatterjee
M.Sc. (Engg) Student, CSA, IISc  Faculty Advisor: Prof. Y. Narahari
 Date and Time: Friday, April 28, 2017, 4:00 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract The Multiarmed bandit (MAB) problem provides a convenient abstraction for
many online learning problems arising in modern applications including
Internet advertising, crowdsourcing, online procurement, smart grids, etc.
Several variants of the MAB problem have been proposed to extend the basic
model to a variety of practical and general settings. The sleeping
multiarmed (SMAB) problem is one such variant where the subset of
available arms varies with time. This study is focused on analyzing the
efficacy of the Thompson Sampling algorithm for solving the SMAB problem.
Any algorithm for the classical MAB problem is expected to choose one of K
available arms (actions) in each of T consecutive rounds. Each choice of an
arm generates a stochastic reward from an unknown but fixed distribution.
The goal of the algorithm is to maximize the expected sum of rewards over
the T rounds (or equivalently minimize the expected total regret), relative
to the best fixed action in hindsight. In many of these settings, however,
not all arms may be available in any given round. For example, in Internet
advertising, some advertisers might choose to stay away from the auction
due to budget constraints; in crowdsourcing, some workers may not be
available at a given time due to timezone difference, etc. This implies
that the algorithm needs to work only with the set of available arms.
Further, unlike in the classical setting, the performance of an algorithm
can no longer be evaluated relative to the best fixed arm in hindsight; the
regret is now to be measured relative to the best "available" arm in
hindsight. One possible way is to compute the overall regret as the
worstcase regret over all possible sample paths of availabilities (that
is,under adversarial selection of available arms).
In the literature, upper confidence bound (UCB)based approaches have been
proposed and investigated for the SMAB problem. Our contribution is to
investigate a Thompson sampling based approach. Our key finding is to
establish a logarithmic regret bound, which nontrivially generalizes a
similar bound known for this approach in the classical MAB setting. Our
bound also matches (up to constants) the bestknown lower bound for the
SMAB problem. And, we show via detailed simulations, that the Thompson
Sampling approach in fact outperforms the known algorithms for the SMAB
problem.
 Series: Ph.D. Colloquium Title: New Methods for Learning from Heterogeneous and Strategic Agents  Speaker: Divya Padmanabhan (PhD student)
 Faculty Advisor: Prof. Shirish Shevade, Prof. Y. Narahari
 Date and Time: Friday, April 21, 2017, 12:00 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract In this thesis, we address several problems concerned with learning from
multiple heterogeneous agents. The problems are relevant to several
applications today such as crowdsourcing and sponsored search auctions. The
agents are noisy in scenarios such as crowdsourcing and provide the
training data for the learning problems. However the noise levels of the
agents are unknown. Any learning algorithm which relies on information
provided by these agents must learn their qualities while simultaneously
learning a model. Another challenge arises when agents are strategic. The
agents could be strategic on the efforts they put in towards performing a
task. They could also be strategic on reporting the costs they incur. The
learning algorithms suffer if incorrect information is provided by the
agents. Therefore it is necessary to ensure that the agents exhibit
desirable behaviour. We solve the following problems that arise while
learning from such agents.
(1)Multilabel Classification with Heterogeneous Noisy Agents
Multilabel classification is a wellknown supervised machine learning
problem where each instance is associated with multiple classes. Since
several labels can be assigned to a single instance, one of the key
challenges in this problem is to learn the correlations between the
classes. We first assume labels from a perfect source and ropose a novel
topic model called MultiLabel PresenceAbsence Latent Dirichlet Allocation
(MLPALDA). In the current era, a natural source for procuring the
training dataset is through mining usergenerated content or directly
through users in a crowdsourcing platform. In this more practical scenario
of crowdsourcing, an additional challenge arises as the labels of the
training instances are provided by noisy, heterogeneous crowdworkers with
unknown qualities. With this motivation, we further augment our topic model
to the scenario where the labels are provided by multiple noisy sources and
refer to this model as MLPALDAMNS. With experiments on standard datasets,
we show that the proposed models achieve superior performance over state of
the art.
(2) Active Linear Regression with Heterogeneous, Noisy and Strategic Agents
We study the problem of training a linear regression model by procuring
labels from multiple noisy agents or crowd annotators, under a budget
constraint. We propose a Bayesian model for linear regression from multiple
noisy sources and use variational inference for parameter estimation. When
labels are sought from agents, it is important to minimize the number of
labels procured as every call to an agent incurs a cost. Towards this, we
adopt an active learning approach. In this specific context, we prove the
equivalence of wellstudied criteria of active learning like entropy
minimization and expected error reduction. For the purpose of annotator
selection in active learning, we observe a useful connection with the
multiarmed bandit framework. Due to the nature of the distribution of the
rewards on the arms, we resort to the Robust Upper Confidence Bound (UCB)
scheme with truncated empirical mean estimator to solve the annotator
selection problem. This yields provable guarantees on the regret. We
further apply our model to the scenario where annotators are strategic and
design suitable incentives to induce them to put in their best efforts.
(3)Ranking with Heterogeneous Strategic Agents
We look at the problem where a planner must rank multiple strategic agents,
a problem that has many applications including sponsored search auctions
(SSA). Stochastic multiarmed bandit (MAB) mechanisms have been used to
solve this problem. Existing stochastic MAB mechanisms with a deterministic
payment rule, proposed in the literature, necessarily suffer a regret
of [image:
Omega(T^{2/3})], where [image: T] is the number of time steps. This
happens because the existing mechanisms consider the worst case scenario
where the means of the agents' stochastic rewards are separated by a very
small amount that depends on [image: T]. Our idea is to allow the planner
to indicate the resolution, [image: Delta], with which the agents must be
distinguished. This immediately leads us to introduce the notion of [image:
Delta]Regret. We propose a dominant strategy incentive compatible (DSIC)
and individually rational (IR), deterministic MAB mechanism, based on ideas
from the Upper Confidence Bound (UCB) family of MAB algorithms. The
proposed mechanism [image: Delta]UCB achieves a [image:
Delta]regret of [image:
O(log T)]. We first establish the results for single slot SSA and then
nontrivially extend the results to the case where multiple slots are to be
allocated.
 Series: M.Sc. (Engg) Colloquium Title: An improved lower bound for multiric depth four
circuits as a function of the number of input variables  Speaker: Mr. Sumant Hegde
M.Sc. (Engg) student at CSA department  Faculty Advisor: Dr. Chandan Saha
 Date and Time: Friday, April 21, 2017, 11:00 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract In this work we study the multiric formula model introduced by Kayal
and Saha (2015) and improve upon the lower bound for multiric depth
four circuits given in the work by Kayal, Saha and Tavenas (2016)
[KST16], when viewed as a function of the number of input variables N.
The improvement leads to superpolynomial lower bounds for
values of r significantly higher than what is known from prior works.
A (syntactically) multiric formula is an arithmetic formula in which
the formal degree with respect to every variable is at most r at
every gate. The formal degree of an input gate with respect to a
variable x is defined to be 1 if the gate is labelled with x and 0 if it
is labelled with a field element or a different variable. The formal degree of a sum
(respectively, product) gate with respect to x is defined as the
maximum (respectively, sum) of the formal degrees of its children with respect
to x. A multiric formula computes a polynomial with individual degree
of every variable bounded by r.
Multiric formulas are a natural extension of the relatively
wellstudied multilinear formulas in the work by Raz (2009) and
Raz and Yehudayoff (2009) [RY09]. In this work, we focus on multiric formulas
that compute multilinear polynomials. They are interesting because they
allow the formal degree of the formula to be as high
as r times the number of underlying variables. This gives extra room for ‘clever’
cancellations of the high degree components inside the formula thereby
making this type of formulas harder to analyse (as formula
homogenization is not known to be doable without blowing up the size
superpolynomially unless degree is very small, as Raz (2010) showed).
Most lower bound proofs in the literature operate under the
restriction of low formal degree or multilinearity (as in the works by
Raz (2009), [RY09], Kayal, Saha and Saptharishi (2014) and Kayal,
Limaye, Saha and Srinivasan (2014)). In this light, multiric
formulas computing multilinear polynomials form a reasonable intermediate
model to study in order to gain some insight on how to deal with high
formal degree in general formulas. Another motivation
for understanding the high formal degree case better (even at depth three)
comes from the depth reduction result by Gupta, Kamat, Kayal and
Saptharishi (2013).
With the aim of making progress on multiric formula lower bound,
[KST16] gave a (N/(d·r^2))^Ω(√(d/r))lower bound for multiric depth
four formulas computing the Nvariate Iterated Matrix Multiplication
(IMM) polynomial of degree d. As a function of N, the lower bound is at
most 2^Ω(√(N/r^3)) when d = Θ(N/r^2). In this thesis, our focus
is on getting multiric depth four formulas with larger r into
the arena of models that provenly admit a superpolynomial lower bound. In
[KST16], r can be at most N^(1/3) for the bound to remain
superpolynomial. Our result (stated below) gives a superpolynomial lower bound
for multiric depth four formulas where r can be as high as (N·log
N)^(0.9).
Theorem: Let N, d, r be positive integers such that 0.51·N ≤ d ≤ 0.9·N
and r ≤ (N·log N)^(0.9). Then there is an explicit Nvariate
degreed multilinear polynomial in VNP such that any multiric
depth four circuit computing it has size 2^Ω(√(N·(log N)/r)).
The theorem yields a better lower bound than that of [KST16], when
viewed as a function of N. Also, the bound matches the best
known lower bound (as a function of N) for multilinear (r = 1) depth four
circuits, given by [RY09], which is 2^Ω(√(N·log N)).
The improvement is obtained by analysing the shifted partials dimension
(SPD) of an Nvariate polynomial in VNP (as opposed to a VP
polynomial in [KST16]) of high degree range of Θ(N), and comparing
it with the SPD of a depth four multiric circuit. In [KST16]
a variant of shifted partials, called shifted skewed partials, is
critically used to analyse the IMM polynomial (which is in VP). We
observe that SPD (without “skew”) is still effective for the
NisanWigderson polynomial (which is in VNP), and yields a better
lower bound as a function of N when degree d is naturally chosen to be high.
Our analysis gives a better range for r and a better lower bound in the
high degree regime, not only for depth four multiric circuits
but also for the weaker models: multiric depth three circuits and
multiric depth four circuits with low bottom support. These (weaker)
models are instrumental in gaining insight about general
depth four multiric circuits, both in [KST16] and our work.
 Series: Ph.D. Thesis Defense Title: Scalable Sparse Bayesian Nonparametric and Matrix
Trifactorization Models for Text Mining applications  Speaker: Ranganath B. N
 Faculty Advisor: Prof. Shalabh Bhatnagar
 Date and Time: Tuesday, April 18, 2017, 2:00 PM
 Venue: CSA Conference Room (Room No. 225, First Floor)
Abstract Hierarchical Bayesian Models and Matrix factorization methods
provide an unsupervised way to learn latent components of a data from the
grouped or sequence data. For example, in document data, latent
component corresponds to topic with each topic as a distribution over
a finite vocabulary of words. For many applications, there exist sparse
relationships between the domain entities and the latent components of
the data. Traditional approaches for topic modeling do not take into
account these sparsity considerations. Modeling these sparse
relationships helps in extracting relevant information leading to
improvements in topic accuracy and scalable solution. In our thesis,
we explore these sparsity relationships for different applications
such as text segmentation, topical analysis and entity resolution in
dyadic data through the Bayesian and Matrix trifactorization
approaches, proposing scalable solutions.
In our first work, we address the problem of segmentation of a collection
of sequence data such as documents using probabilistic models. Existing
stateoftheart Hierarchical Bayesian Models are connected to the notion
of Complete Exchangeability or Markov Exchangeability. Bayesian
Nonparametric Models based on the notion of Markov Exchangeability such
as HDPHMM and Sticky HDPHMM, allow very restricted permutations of
latent variables in grouped data (topics in documents), which in turn
lead to computational challenges for inference. At the other extreme,
models based on Complete Exchangeability such as HDP allow arbitrary
permutations within each group or document, and inference is
significantly more tractable as a result, but segmentation is not
meaningful using such models. To overcome these problems, we explored a
new notion of exchangeability that lies between Markov Exchangeability
and Complete Exchangeability for which segmentation is meaningful, but
inference is computationally less expensive than both Markov and Complete
Exchangeability. Parametrically, Block Exchangeability contains sparser
number of transition parameters, linear in number of states compared to
the quadratic order for Markov Exchangeability that is still less
than that for Complete Exchangeability and for which parameters are on
the order of the number of documents. For this, we propose a
nonparametric Block Exchangeable model(BEM) based on the new notion of
Block Exchangeability, which we have shown to be a superclass of Complete
Exchangeability and subclass of Markov Exchangeability. We propose a
scalable inference algorithm for BEM to infer the topics for words and
segment boundaries associated with topics for a document using the
collapsed Gibbs Sampling procedure.
Empirical results show that BEM outperforms stateoftheart
nonparametric models in terms of scalability and
generalization ability and shows nearly the same segmentation quality on
News dataset, Product review dataset and on a Synthetic dataset.
Interestingly, we can tune the scalability by varying the block size
through a parameter in our model for a small tradeoff with segmentation
quality.
In addition to exploring the association between documents and words,
we also explore the sparse relationships for dyadic data, where
associations between one pair of domain entities such as (documents,
words) and associations between another pair such as (documents, users)
are completely observed.
We motivate the analysis of such dyadic data introducing an additional
discrete dimension, which we call topics, and explore sparse
relationships between the domain entities and the topic, such as of
usertopic and documenttopic respectively.
In our second work, for this problem of sparse topical analysis of dyadic
data, we propose a formulation using sparse matrix trifactorization. This
formulation requires sparsity constraints, not only on the individual
factor matrices, but also on the product of two of the factors. To the
best of our knowledge, this problem of sparse matrix trifactorization has
not been studied before. We propose a solution that introduces a surrogate
for the product of factors and enforces sparsity on this surrogate as
well as on the individual factors through L1regularization.
The resulting optimization problem is efficiently solvable in an
alternating minimization framework over subproblems involving individual
factors using the well known FISTA algorithm.
For the subproblems that are constrained, we use a projected variant of
the FISTA algorithm.
We also show that our formulation leads to independent subproblems
towards solving a factor matrix, thereby supporting parallel
implementation leading to a scalable solution.
We perform experiments over bibliographic and product review data to show
that the proposed framework based on sparse trifactorization formulation
results in better generalization ability and factorization accuracy
compared to baselines that use sparse bifactorization.
Even though the second work performs sparse topical analysis for dyadic
data, finding sparse topical associations for the
users, the user references with different names could belong to the
same entity and those with same names could belong
to different entities. The problem of entity resolution is widely studied
in the research community, where the goal is
to identify real users associated with the user references in the documents.
Finally, we focus on the problem of entity resolution in dyadic data,
where associations between one pair of domain entities such as
documentswords and associations between another pair such as
documentsusers are observed, an example of which includes bibliographic
data.
In our final work, for this problem of entity resolution in bibliographic
data, we propose a Bayesian nonparametric `Sparse entity resolution model'
(SERM) exploring the sparse relationships between the
grouped data involving grouping of the documents, and the topics/author
entities in the group. Further, we also exploit the sparseness
between an author entity and the associated author aliases. Grouping of
the documents is achieved with the stick breaking prior for the Dirichlet
processes (DP). To achieve sparseness, we propose a solution that
introduces separate Indian Buffet process (IBP) priors over topics and
the author entities for the groups and kNN mechanism for selecting author
aliases
for the author entities.
We propose a scalable inference for SERM by appropriately combining
partially collapsed Gibbs sampling scheme in Focussed topic model (FTM),
the inference scheme used for parametric IBP prior and the kNN
mechanism. We perform experiments over bibliographic datasets, Citeseer
and Rexa, to
show that the proposed SERM model improves the accuracy of entity
resolution by finding relevant author entities
through modeling sparse relationships and is scalable, when compared to
the stateoftheart baseline.
 Series: Department Seminar Title: Identifying groups of strongly correlated variables through
Smoothed Ordered Weighted l1norms  Speaker: Mr. Raman Sankaran
PhD student, CSA  Faculty Advisor: Prof. Chiranjib Bhattacharyya
 Date and Time: Thursday, April 13, 2017, 3:00 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract The failure of LASSO to identify groups of correlated predictors in linear
regression has sparked significant research interest. Re cently, various
norms [1, 2] were proposed, which can be best described as instances of
ordered weighted l1 norms (OWL) [3], as an alternative to l1
regularization used in LASSO. OWL can identify groups of cor related
variables but it forces the model to be constant within a group. This
artifact induces unnecessary bias in the model esti mation. In this paper
we take a submodu lar perspective and show that OWL can be posed as the
Lov ́asz extension of a suitably defined submodular function. The
submodu lar perspective not only explains the group wise constant
behavior of OWL, but also sug gests alternatives. The main contribution
of this paper is smoothed OWL (SOWL), a new family of norms, which not
only identifies the groups but also allows the model to be flexi ble
inside a group. We establish several algo rithmic and theoretical
properties of SOWL including group identification and model con sistency.
We also provide algorithmic tools to compute the SOWL norm and its proxi
mal operator, whose computational complex ity (O(dlogd)) is significantly
better than that of general purpose solvers (O(d2 log d)). In our
experiments, SOWL compares favor ably with respect to OWL in the regimes
of interest.
Speaker Bio: Raman Sankaran is a PhD student in the Department of CSA
This is a practice talk for AISTATS.
 Series: M.Sc. (Engg) Colloquium Title: Supervised Classification of Missense Mutations as Pathogenic or Tolerated
using Ensemble Learning Methods.  Speaker: Ms. Rashmi Balasubramanyam
M.Sc Student  Faculty Advisor: Prof Chiranjib Bhattacharyya
 Date and Time: Thursday, April 06, 2017, 9:30 AM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Missense mutations account for more than 50% of the mutations known to be involved in human inherited diseases. Missense classification is a challenging task that involves sequencing of the genome, identifying the variations, and assessing their deleteriousness. This is a very laborious, time and cost intensive task to be carried out in the laboratory. Advancements in bioinformatics have led to several largescale nextgeneration genome sequencing projects, and subsequently the identification of genome variations. Several studies have combined this data with information on established deleterious and neutral variants to develop machine learning based classifiers.
There are significant issues with the missense classifiers due to which missense classification is still an open area of research. These issues can be classified under two broad categories: (a) Dataset overlap issue  where the performance estimates reported by the stateoftheart classifiers are overly optimistic as they have often been evaluated on datasets that have significant overlaps with their training datasets. Also, there is no comparative analysis of these tools using a common benchmark dataset that contains no overlap with the training datasets, therefore making it impossible to identify the best classifier among them. Also, such a common benchmark dataset is not available. (b) Inadequate capture of vital biological information of the protein and mutations  such as conservation of longrange amino acid dependencies, changes in certain physicochemical properties of the wildtype and mutant amino acids, due to the mutation. It is also not clear how to extract and use this information. Also, some classifiers use structural information that is not available for all proteins.
In this study, we compiled a new dataset, containing around 2  15% overlap with the popularly used training datasets, with 18,036 mutations in 5,642 proteins. We reviewed and evaluated 15 stateoftheart missense classifiers  SIFT, PANTHER, PROVEAN, PhDSNP, Mutation Assessor, FATHMM, SNPs&GO, SNPs&GO3D, nsSNPAnalyzer, PolyPhen2, SNAP, MutPred, PONP2, CONDEL and MetaSNP, using the six metrics  accuracy, sensitivity, specificity, precision, NPV and MCC. When evaluated on our dataset, we observe huge performance drops from what has been claimed. Average drop in the performance for these 13 classifiers are around 15% in accuracy, 17% in sensitivity, 14% in specificity, 7% in NPV, 24% in precision and 30% in MCC. With this we show that the performance of these tools is not consistent on different datasets, and thus not reliable for practical use in a clinical setting.
As we observed that the performance of the existing classifiers is poor in general, we tried to develop a new classifier that is robust and performs consistently across datasets, and better than the stateoftheart classifiers. We developed a novel method of capturing longrange amino acid dependency conservation by boosting the conservation frequencies of substrings of amino acids of various lengths around the mutation position using AdaBoost learning algorithm. This score alone performed equivalently to the sequence conservation based tools in classifying missense mutations. Popularly used sequence conservation properties was combined with this boosted longrange dependency conservation scores using AdaBoost algorithm. This reduced the class bias, and improved the overall accuracy of the classifer. We trained a third classifier by incorporating changes in 21 important physicochemical properties, due to the mutation. In this case, we observed that the overall performance further improved and the class bias further reduced. The performance of our final classifier is comparable with the stateoftheart classifiers. We did not find any significant improvement, but the classspecific accuracies and precisions are marginally better by around 12% than those of the existing classifiers. In order to understand our classifier better, we dissected our benchmark dataset into: (a) seen and unseen proteins, and (b) pure and mixed proteins, and analyzed the performance in detail. Finally we concluded that our classifier performs consistently across each of these categories of seen, unseen, pure and mixed protein.
 Series: Ph.D. Thesis Defense Title: Program Repair by Automated Generation of Hints  Speaker: Ms. Shalini Kaleeswaran
Ph.D student  Faculty Advisor: Prof. Aditya Kanade
 Date and Time: Wednesday, April 05, 2017, 12:00 PM
 Venue: CSA Seminar Hall (Room No. 254, First Floor)
Abstract Programming has become an important skill in today's technologydriven
world. It is a complex activity because of which programmers make mistakes
in their software. Student programmers make mistakes in their programs due
to lack of programming experience and also due to lack of understanding of
the algorithmic concepts being used. An experienced programmer in the
industry, on the other hand, makes mistakes in the software he develops
because of the mere complexity of an industrygrade software. Debugging
one's programs, i.e., finding and fixing faults is one of the challenging
activities for a programmer. This necessitates the need to develop
automated techniques that are effective in assisting programmers repair
faults in their programs.
In this thesis, we present new computerassisted approaches to help
programmers repair their programs. Finding repair for a fault in the
program is essentially a search over the space of all possible repairs.
Traditional repair techniques output a complete repaired program that
eliminates the fault in the original program. This ambitious goal of
finding a complete repair is often unachievable because the search space
becomes too large to navigate efficiently. The key insight of our work is
that programmers only need guidelines that help them understand the faults
they made and suggestions on how to fix them, as opposed to a complete
repair. Therefore, this thesis proposes a novel perspective of solving the
program repair problem, where we generate textual hints that direct the
programmer on how to fix the fault. A hint specifies which part of the
program needs modification as well as how to modify it. For student
programmers, our hints help them reflect on the mistakes they made and
improve their understanding of the problem being solved in the program. For
experienced programmers, our hints help them easily identify the cause of
fault and guide them to arrive at a repair. Our approaches use novel
combinations of syntactic, symbolic and statistical reasoning techniques to
achieve this goal of semiautomated program repair.
Our first contribution is an algorithmic technique for automated synthesis
of repair hints. Our technique takes as input a faulty program and a set of
test cases and outputs a ranked list of repair hints that suggest how to
rectify a faulty statement in the program. In this technique, we leverage
symbolic execution and statistical correlation analysis to identify
expressions that are likely to occur in the repaired code. Then we use
syntactic pattern matching to combine these expressions algorithmically and
generate repair hints. Since we restrict ourselves to finding ``building
blocks'' of repair, our hints are very effective in guiding the programmer
to the final repair, even in scenarios where a complete repair cannot be
obtained. We implemented this technique to develop a tool called
``MintHint'', that performs automated synthesis of repair hints for C
programs. We evaluated the effectiveness of MintHint by performing a user
study and found that programmers who used MintHint were able to repair more
faults compared to those who didn't. In addition, they obtained the repair
5.8 times faster.
Our next contribution is a semisupervised feedback generation methodology
for student programming assignments. In this methodology, we take a set of
student submissions as input and generate personalized, verified feedback
for each submission that suggests modifications to fix faults in the
program, if any. Student submissions are first clustered by solution
strategy, which is followed by an automated feedback generation phase. We
use unsupervised clustering to reduce the burden on the instructor in
providing reference implementations for each possible solution strategy.
The instructor is called upon simply to identify a correct submission in
each cluster, which acts as the reference implementation for all
submissions in the cluster. In the next phase, we use a novel
counterexample guided feedback generation algorithm that compares a
student submission with a reference implementation to generate verified
feedback. We applied this methodology to programs implementing iterative
dynamic programming algorithms and developed a tool called ``DPAssist''
that works on C programs. We evaluated DPAssist on a dataset of 2226 student
submissions to 4 programming problems from the programming contest site
CodeChef and successfully generated verified feedback for 85% of them.

