Department of Computer Science and Automation Department of Computer Science and Automation, IISc, Bangalore, India Indian Institute of Science
HOME | ABOUT US | PEOPLE | RESEARCH | ACADEMICS | FACILITIES | EVENTS / SEMINARS | NEWS | CONTACT US

UPCOMING SEMINARS

Series: M.Sc. (Engg) Colloquium
Title: Large Scale Graph Processing in a Distributed Environment

  • Speaker: Nitesh Upadhyay
  • Faculty Advisor: Y.N. Srikant
  • Date and Time: Thursday, June 01, 2017, 12:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Graph algorithms are ubiquitously used across domains. They exhibit parallelism, which can be exploited on parallel architectures, such as, multi-core processors and accelerators. However, real world graphs are massive in size and cannot fit into the memory of a single machine. Such large graphs are partitioned and processed in a distributed cluster environment which consist of multiple GPUs and CPUs. Existing frameworks that facilitate large scale graph processing in the distributed cluster have their own style of programming and require extensive involvement by the user in communication and synchronization aspects. Adaptation of these frameworks appears to be an overhead for a programmer. Furthermore, these frameworks have been developed to target only CPU clusters and lack the ability to harness the GPU architecture.

We provide a back-end framework to the graph Domain Specific Language, Falcon, for large scale graph processing on CPU and GPU clusters. The Motivation behind choosing this DSL as a front-end is its shared-memory based imperative programmability feature. Our framework generates Giraph code for CPU clusters. Giraph code runs on the Hadoop cluster and is known for scalable and fault-tolerant graph processing. For GPU cluster, Our framework applies a set of optimizations to reduce computation and communication latency, and generates efficient CUDA code coupled with MPI.

Experimental evaluations show the scalability and performance of our framework for both CPU and GPU clusters. The performance of the framework generated code is comparable to the manual implementations of various algorithms in distributed environments.

Speaker Bio:
Nitesh Upadhyay is an M.Sc(Engg.) student in the CSA department.

Host Faculty: Y.N. Srikant

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: HAShCache: Heterogeneity Aware Shared DRAMCache for Integrated Heterogeneous Systems

  • Speaker: Adarsh Patil
  • Faculty Advisor: R. Govindarajan
  • Date and Time: Wednesday, May 31, 2017, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Integrated Heterogeneous System (IHS) processors pack throughput oriented GPGPUs alongside latency oriented CPUs on the same die sharing certain resources, e.g., shared last level cache, network-on-chip (NoC), and the main memory. The demands for memory accesses and other shared resources from GPU cores can exceed that of CPU cores by 2 to 3 orders of magnitude. This disparity poses significant problems in exploiting the full potential of these architectures.

In this work, we propose adding a large capacity stacked DRAM, used as a shared last level cache, for the IHS processors. However, adding the DRAMCache naively, leaves significant performance on the table due to the disparate demands from CPU and GPU cores for DRAMCache and memory accesses. In particular, the imbalance can significantly reduce the performance benefits that the CPU cores would have otherwise enjoyed with the introduction of the DRAMCache, necessitating a heterogeneity aware management of this shared resource for improved performance. Further, in this work we propose three simple techniques to enhance the performance of CPU application while ensuring very little to no performance impact to the GPU. Specifically, we propose (i) PrIS, a prioritization scheme for scheduling CPU requests at the DRAMCache controller (ii) ByE, a selective and temporal bypassing scheme for CPU requests at the DRAMCache (iii) Chaining, an occupancy controlling mechanism for GPU lines in the DRAMCache through pseudo-associativity. The resulting cache, HAShCache, is heterogeneity-aware and can adapt dynamically to address the inherent disparity of demands in an IHS architecture.

We enhance the gem5-gpu simulator to model an IHS architecture with a stacked DRAM as cache, complete with coherence and unified physical memory. Using this setup we perform detailed experimental evaluation of the proposed HAShCache which results in an average system performance improvement of 41% over a naive DRAMCache and over 200% improvement over a baseline system with no stacked DRAMCache.

Speaker Bio:
Adarsh Patil is an M.Sc[Engg] student of CSA. He has obtained his B.Tech from M.S.Ramaiah Institute of Technology and has 2 years industry experience at Goldman Sachs in the Virtualization and Linux engineering team.

Video Archive Go Top

 

PAST SEMINARS

Series: CSA Distinguished Lecture
Title: Old and New Algorithms for Guarding Polygons

  • Speaker: Prof. Subir Kumar Ghosh
                   Professor
                   Ramakrishna Mission Vivekananda University
  • Date and Time: Thursday, May 25, 2017, 10:30 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
The art gallery problem is to determine the number of guards (or cameras) that are sufficient to cover or see every point in the interior of an art gallery. An art gallery can be viewed as a polygon P (with or without holes) with a total of n vertices, and guards as points in P. Any point z in P is said to be visible from a guard g if the line segment joining z and g does not intersect the exterior of P. This problem was first posed by Victor Klee in a conference in 1973, and in the course of time, it has become one of the important problems in computational geometry with extensive applications to surveillance of buildings like airport terminals, railway stations etc.

A polygon is said to be guarded by a set of chosen guards if every interior point of P is visible from some guard within the guard set. In this lecture, we will first present an O(log n)-approximation algorithm to find the minimum number of guards necessary for guarding P, which was designed by Ghosh way back in 1986. For this problem, the approximation ratio was improved to O(loglog OPT) by King and Kirkpatrickin 2011. Ghosh had also conjectured in 1986 that a constant-factor approximation algorithm exists for this problem in case of polygons without holes. This conjecture was settled recently for a special class of P (without holes) by Bhattacharya, Ghosh and Roy, when they presented a 6-approximation algorithm for this problem. An outline of this algorithm will be provided in the lecture.

Chromatic art gallery problems arise from applications in areas like landmark-based navigation and wireless networks. One such problem is the weak-conflict free guarding of polygons, where the objective is to colour a guard set for P using minimum number of colours such that each point within P sees at least one guard with an unique colour. So finally, we will present an algorithm for weak conflict-free guarding of P (without holes)where the guard set is coloured using only O(log n) colours.

Speaker Bio:
Subir Kumar Ghosh was a professor of computer science and discrete applied mathematics at Tata Institute of Fundamental Research, Mumbai till July, 2015. Since then, he is a professor at RKM Vivekananda University, Belur, West Bengal. He is a Fellow of Indian Academy of Sciences since 1998. Recently, he has been awarded a professorship by the National Board for Higher Mathematics for his lifetime contributions to Mathematics. He worked as visiting scientists at many reputed universities and research institutes around the world, and gave several invited lectures in the national and international conferences, workshops and in the institutes in India and abroad. Currently, he is the Chair of the Steering Committee of an international conference on algorithms and discrete applied mathematics (CALDAM). He is the author of a book and around sixty six papers in the fields of computational geometry, robot path planning, geometric graph theory and algorithms (sequential, parallel, on-line and approximation).

Host Faculty: Prof. Sathish Govindarajan

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: COMPILATION OF GRAPH ALGORITHMS FOR HYBRID, CROSS-PLATFORM AND DISTRIBUTED ARCHITECTURES

  • Speaker: Parita Patel
  • Faculty Advisor: Y.N. Srikant
  • Date and Time: Tuesday, May 23, 2017, 12:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Graph algorithms are abundantly used in various disciplines. These algorithms perform poorly due to random memory access and negligible spatial locality. In order to improve performance, parallelism exhibited by these algorithms can be exploited by leveraging modern high performance parallel computing resources. Implementing graph algorithms for these parallel architectures requires manual thread management and memory management which becomes tedious for a programmer.

Large scale graphs cannot fit into the memory of a single machine. One solution is to partition the graph either on multiple devices of a single node or on multiple nodes of a distributed network. All the available frameworks for such architectures demand unconventional programming which is difficult and error prone.

To address these challenges, we propose a framework for compilation of graph algorithms written in an intuitive graph domain-specific language, Falcon. The framework targets shared memory parallel architectures, computational accelerators and distributed architectures (CPU and GPU cluster). First, it analyses the abstract syntax tree (generated by Falcon) and gathers essential information. Subsequently, it generates optimized code in OpenCL for shared-memory parallel architectures and computational accelerators, and OpenCL coupled with MPI code for distributed architectures. Motivation behind generating OpenCL code is its platform-agnostic and vendor-agnostic behavior, i.e., it is portable to all kinds of devices. Our framework makes memory management, thread management, message passing, etc., transparent to the user. To the best of our knowledge, none of the available domain-specific languages, frameworks or parallel libraries handle portable implementations of graph algorithms.

Experimental evaluations demonstrate that the generated code performs comparably to the state-of-the-art non-portable implementations and hand-tuned implementations. The results also show portability and scalability of our framework.

Speaker Bio:
Parita Patel is an M.Sc(Engg.) student in the CSA department.

Host Faculty: Y.N. Srikant

Video Archive Go Top

 

Series: Department Seminar
Title: Deep Learning made easy with DARVIZ

  • Speaker: Shreya Khare and Anush Sankaran
  • Date and Time: Thursday, May 18, 2017, 3:30 PM
  • Venue: CSA Lecture Hall (Room No. 117, Ground Floor)

Abstract
Struggling with implementing research papers in deep learning? Confused between which deep learning framework to use - Tensorflow, Theano, CAFFE, etc? Wasting hours of precious time in debugging code for building the right deep learning model? You will not longer face these challenges!

DARVIZ. Its a visual programming IDE, where you could design a deep learning model using an intuitive drag-and-drop framework. Once designed, DARVIZ could write the execution ready deep learning code for you in three platforms - Tensorflow, CAFFE, and Theano. This tool helps in extreme rapid prototyping of deep learning models and implementation of state of art papers.

Speaker Bio:
Shreya Khare is software engineer working in the area of deep learning and automation at IBM Research in Bangalore, India. She is currently working on DARVIZ, a deep learning IDE for implementing, visualizing and validating deep learning models. Her research interests include Machine Learning, Speech Processing, Computer Vision and Natural Language Processing. She has completed her MS by Research from Department of Computer Science, IIT Madras. Anush Sankaran is a Researcher in the Cognitive Solutions and Services team. His research interests include deep learning, image processing, human cognition, and their applications. He is now primarily leading efforts in two projects: DARVIZ and Machine Learning for Creativity. He is currently pursuing the Ph.D. degree with the Indraprastha Institute of Information Technology, New Delhi, India. He was a recipient of the TCS Ph.D. Research Fellowship from 2010 to 2015. He has written many peer-reviewed conferences and journals and also has the Best Poster Awards in the IEEE BTAS 2013 and the IEEE IJCB 2014. Anush received the B.Tech. degree (Gold Medal) in computer science from the Coimbatore Institute of Technology, Coimbatore, India, in 2010.

Host Faculty: Aditya Kanade

Video Archive Go Top

 

Series: Department Seminar
Title: Software Transactional Memory Systems for Processing Large-Scale Graph Analytics

  • Speaker: Sathya Peri
  • Date and Time: Wednesday, May 17, 2017, 3:30 PM
  • Venue: CSA Multimedia Class (Room No. 252, First Floor)

Abstract
Nowadays, multi-core systems are quite ubiquitous from cell phones to high performance computation servers. These systems were introduced around 2004, 50 years of exponential improvement in the performance of sequential computers ended. In order to get a continued speedup on these processors, applications need to be able to harness the parallelism of the underlying hardware. This is commonly achieved using multi-threading. Yet writing correct and scalable multi-threaded programs is far from trivial. In multi-threaded programs sets of semantically related actions may need to execute in mutual exclusion to avoid semantic inconsistencies.

Traditionally, multi-threaded programs were developed in conjunction with locks to address these issues. But programming with locks has many disadvantages such as deadlocks, priority inversion etc. and makes it difficult to build scalable software systems. Importantly, lock based software components are difficult to compose i.e. build larger software systems using simpler software components. In recent years, Software Transactional Memory (STM) has garnered significant interest as an elegant alternative for developing concurrent code. Software transactions address many of the shortcomings of lock based systems. Multi-threaded programs used with STMs address many of these challenges.

In the past few years, analytics over large data, (also commonly referred to as Big-Data Analytics) has become an very important in various industries. Many of the problems of large data and data-mining can be cast as problems of large-scale graph analytics problems. This would typically involve looking for operations on large graphs having millions of vertices and edges.

In this talk, I will give an overview of Software Transactional Memory (STMs). I will describe the correctness requirements of STM systems, describe its current state. Then, I will briefly describe how STMs can benefit large-scale graph analytics. I will specifically describe about how STMs can benefit the problem of graph coloring having millions of vertices.

Speaker Bio:
Dr. Sathya Peri is an Associate Professor at IIT Hyderabad. He received his masters in Computer Science from Madurai Kamaraj University, India in 2001. After working as a software engineer at HCL Technologies, India for around one year he joined University of Texas at Dallas as a graduate student. He obtained Masters degree and Ph.D. degrees in Computer Science from the University of Texas at Dallas in 2004 and 2007 respectively. He worked under the guidance of Prof. Neeraj Mittal in the area of monitoring distributed systems. Then he worked at INRIA, Rennes, France as a postdoc from Sept 2007 to Dec 2008 in the area of Gossip Protocols for Peer to Peer systems. From Jan 2009 to Apr 2010, he was a Postdoc at Memorial University working with Prof. Vidyasankar in the area of Software Transactional Memory.

Host Faculty: Deepak D'Souza

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: An Exploratory Framework for Cyclone Identification and Tracking

  • Speaker: Akash Anil Valsangkar
  • Faculty Advisor: Prof. Vijay Natarajan
  • Date and Time: Wednesday, May 17, 2017, 11:00 AM
  • Venue: CSA Multimedia Class (Room No. 252, First Floor)

Abstract
Analyzing depressions plays an important role in meteorology, particularly in the study of cyclones. In particular, the study of the temporal evolution of cyclones requires a robust depression tracking framework. Depressions and cyclones are not well-defined objects and their shape and size characteristics change over time. Further, current extraction and tracking methods depend on many parameters and are often specialized for particular regions. In visualization literature, most of the tracking algorithms use spatial analysis for feature identification and noise removal. With state of the art methods, temporal noise leads to cluttered visualization.

We propose a pipeline for identification and tracking of cyclones over time. Our method requires only a few intuitive parameters. This method for cyclone identification within each time step is based on well-established topological concepts enabling a robust tracking. Candidate tracks are computed from an optical flow field. These tracks are clustered within a moving time window and forwarded to a final tracking step. An exploratory framework helps in the study of cyclone movement and identifying smooth, representative tracks. Multiple case studies demonstrate the effectiveness of the method in tracking cyclones, both in the northern and southern hemisphere.

Video Archive Go Top

 

Series: M.Sc. (Engg) Thesis Defense
Title: Utilizing Worker Groups and Task Dependencies in Crowdsourcing

  • Speaker: Mr. Prakhar Ojha
  • Faculty Advisor: Dr. Partha Pratim Talukdar
  • Date and Time: Monday, May 08, 2017, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Crowdsourcing has evolved from solving simpler tasks, like image-classification, to more complex tasks such as document editing, language translation, product designing etc. Unlike micro-tasks performed by a single worker, these complex tasks require a team of workers and group output is often the atomic unit of evaluation. If the task-requester is interested in making individual payments based on their respective efforts in the group, she will need a strategy to discriminate between workers (who contribute positively) from idlers (who do not contribute to group task). It is a non-trivial problem, especially since only group output is evaluated. In the first part of this talk, I shall demonstrate how ideas from Group Testing may be effectively used for this challenging problem. This is the first application of Group Testing to crowdsourcing. I shall also list out open problems in this area.

In the second part of the talk, I shall focus on Relational Crowdsourcing (RelCrowd), a novel crowdsourcing paradigm where human intelligence tasks (HITs) are created by taking task inter-dependencies. I shall describe how this framework may be used in the evaluation of large knowledge graphs (KG), we call the resulting system KGeval. Estimating accuracy of automatically constructed KGs is a challenging problem due to their size and diversity. We shall show that the objective optimized by KGEval is submodular and NP-hard, thereby allowing for approximation algorithms with guarantees. Through experiments on real-world datasets, our results demonstrate that KGEval is able to estimate KG accuracy more accurately compared to other competitive baselines, while requiring significantly lesser number of human evaluations.

Video Archive Go Top

 

Series: CSA Distinguished Lecture
Title: The Magnificent Seven

  • Speaker: Prof. Vivek Shripad Borkar
                   Institute Chair Professor
                   Department of Electrical Engineering
                   Indian Institute of Technology, Bombay
  • Date and Time: Thursday, May 04, 2017, 2:30 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
This talk will give an overview of restless bandits and then will introduce individually, seven restless bandits. The summary in brief is:

Bandits are a pest

if they don't get much rest,

so do relax a little

your constraints a la Whittle.

Then all that it takes

is a simple index.

Speaker Bio:
Prof. Vivek Borkar has made stellar contributions in the areas of stochastic control and optimization, reinforcement learning, random processes, and several other areas in control and optimization. He is a J.C. Bose National Fellow and a Fellow of IEEE, INSA, IASc, INAE, NASI, and TWAS. He is a recipient of several best paper awards and is widely known for many path-breaking fundamental results in the above areas.

Host Faculty: Prof. Y Narahari

Video Archive Go Top

 

Series: M.Sc. (Engg) Thesis Defense
Title: Stochastic Newton methods with enhanced Hessian estimation

  • Speaker: Mr. Danda Sai Koti Reddy
                   M.Sc (Engg.) student at CSA department
  • Faculty Advisor: Prof. Shalabh Bhatnagar
  • Date and Time: Thursday, May 04, 2017, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Optimization problems involving uncertainties are common in a variety of engineering disciplines such as transportation systems, manufacturing, communication networks, healthcare and finance. The large number of input variables and the lack of a system model prohibit a precise analytical solution and a viable alternative is to employ simulation-based optimization. The idea here is to simulate a few times the stochastic system under consideration while updating the system parameters until a good enough solution is obtained.

Formally, given only noise-corrupted measurements of an objective function, we wish to find a parameter which minimizes the objective function. Iterative algorithms using statistical methods search the feasible region to improve upon the candidate parameter. Stochastic approximation algorithms are best suited, most studied and applied algorithms for finding solutions when the feasible region is a continuously valued set. One can use information on the gradient/Hessian of the objective to aid the search process. However, due to the lack of knowledge of the noise distribution, one needs to estimate the gradient/Hessian from noisy samples of the cost function obtained from simulation. Simple gradient search schemes take many iterations to converge to a local minimum and are heavily dependent on the choice of step-sizes. Stochastic Newton methods, on the other hand, can counter the ill-conditioning of the objective function as they incorporate second-order information into the stochastic updates. Stochastic Newton methods are often more accurate than simple gradient search schemes.

We propose enhancements to the Hessian estimation scheme used in two recently proposed stochastic Newton methods, based on ideas of random directions stochastic approximation (2RDSA) and simultaneous perturbation stochastic approximation with three function measurements (2SPSA-3), respectively. The proposed scheme reduces the error in the Hessian estimate by (i) incorporating a zero-mean feedback term; and (ii) optimizing the step-sizes used in the Hessian recursion. We prove that both 2RDSA and 2SPSA-3 with our Hessian improvement procedure converge asymptotically to the true Hessian. The advantage with both 2RDSA and 2SPSA-3 is that they require only 75% of the simulation cost per-iteration for 2SPSA with improved Hessian estimation (2SPSA-IH). Numerical experiments show that 2RDSA-IH outperforms both 2SPSA-IH and 2RDSA without the improved Hessian estimation procedure.

Video Archive Go Top

 

Series: Department Seminar
Title: Practical Support for Strong, Serializability-Based Memory Consistency

  • Speaker: Dr. Swarnendu Biswas
                   Postdoctoral fellow
                   working with Prof. Keshav Pingali, UT, Austin
  • Date and Time: Wednesday, May 03, 2017, 11:30 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Writing correct and scalable multithreaded programs using shared-memory is hard because of improper synchronization, which leads to data races. Data races often indicate the presence of concurrency bugs, and have led to several high profile failures in the past, such as the Therac-25 accident, the 2003 Northeastern blackout, and the 2012 glitch in NASDAQ Facebook share prices. Unfortunately, current shared-memory languages and systems provide weak end-to-end consistency guarantees in the presence of data races. This problem of unsatisfactory semantics for racy executions needs to be resolved, but unfortunately efficient solutions have remained elusive even after years of research.

This talk describes our ongoing research to provide strong end-to-end memory consistency models for shared-memory languages and systems. I will present software-only and hardware-based approaches which provide serializability of synchronization-free regions to all program executions. A feature of our work is that it allows us to rethink the design of other system features -- such as cache coherence. Our work advances the state of the art by making the designs practical and showing that strong memory consistency is achievable.

Speaker Bio:
Swarnendu Biswas is currently a postdoctoral fellow working with Prof. Keshav Pingali at UT Austin. He did his PhD with Dr. Michael Bond at The Ohio State University. His graduate research work involved developing program analyses for detecting and ensuring concurrency correctness properties in multithreaded programs and memory models. He is currently working on approximate and high-performance computing. His work has been recognized with the ACM SRC Grand Finals 2016 Award, OOPSLA Distinguished Paper Award and OOPSLA Distinguished Artifact Award. Swarnendu received an MS degree from IIT Kharagpur, and did his BE from NIT Durgapur.

Host Faculty: Dr. Murali Krishna Ramanathan

Video Archive Go Top

 

Series: Department Seminar
Title: The Construction of Semidefinite Representation of Convex Set from its Projections

  • Speaker: Dr. Anusuya Ghosh
                   Post-doctoral fellow in IIM Bangalore
  • Date and Time: Tuesday, May 02, 2017, 10:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
The construction methods of semidefinite representation of convex sets draw a major role in recent research on modern convex optimization. As we know, if a convex set is semidefinite representable, there is no specific way to construct its semidefinite representation. This work deals with a construction method of semidefinite representation of convex body if its projections on lower-dimensional space are known. An algorithm has been proposed and an approximate solution has been achieved for such construction. Further we show that the approximate solution converges to the convex set. We consider a detailed comparison of our construction method with other methods (which are available in recent literature).

Speaker Bio:
Anusuya Ghosh is currently working as Post-doctoral fellow in IIM Bangalore. She received her PhD in Industrial Engineering and Operations Research from IIT Bombay. She also received "Best Project Award" from department of mathematics in IIT Kharagpur in 2009 (during M. Sc.).

Host Faculty: Chiranjib Bhattacharyya

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: Fast Actively Secure OT Extension for Short Secrets

  • Speaker: Mr Ajith S
                   M.Sc. (Engg) Student
                   Dept. of CSA
                   IISc
  • Faculty Advisor: Dr Arpita Patra
  • Date and Time: Monday, May 01, 2017, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Oblivious Transfer (OT) is one of the most fundamental cryptographic primitives with wide-spread application in general secure multi-party computation (MPC) as well as in a number of tailored and special-purpose problems of interest such as private set intersection (PSI), private information retrieval (PIR), contract signing to name a few. Often the instantiations of OT require prohibitive communication and computation complexity. OT extension protocols are introduced to compute a very large number of OTs referred as extended OTs at the cost of a small number of OTs referred as seed OTs.

We present a fast OT extension protocol for small secrets in the active setting. Our protocol when used to produce 1-out-of-n OTs outperforms all the known actively secure OT extensions. Our protocol is built on the semi-honest secure extension protocol of Kolesnikov and Kumaresan of CRYPTO’13 (referred as KK13 protocol henceforth) which is the best-known OT extension for short secrets. At the heart of our protocol lies an efficient consistency checking mechanism that relies on the linearity of Walsh- Hadamard (WH) codes. Asymptotically, our protocol adds a communication overhead of O(µ log κ) bits over KK13 protocol irrespective of the number of extended OTs, where κ and µ refer to computational and statistical security parameter respectively. Concretely, our protocol when used to generate a large enough number of OTs adds only 0.011-0.028% communication overhead and 4-6% runtime overhead both in LAN and WAN over KK13 extension. The runtime overheads drop below 2% when in addition the number of inputs of the sender in the extended OTs is large enough.

As an application of our proposed extension protocol, we show that it can be used to obtain the most efficient PSI protocol secure against a malicious receiver and a semi-honest sender.

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: A Study of Thompson Sampling Approach for Sleeping Multi-Armed Bandit Problems

  • Speaker: Mr. Aritra Chatterjee
                   M.Sc. (Engg) Student, CSA, IISc
  • Faculty Advisor: Prof. Y. Narahari
  • Date and Time: Friday, April 28, 2017, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
The Multi-armed bandit (MAB) problem provides a convenient abstraction for many online learning problems arising in modern applications including Internet advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed (S-MAB) problem is one such variant where the subset of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the S-MAB problem.

Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many of these settings, however, not all arms may be available in any given round. For example, in Internet advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. This implies that the algorithm needs to work only with the set of available arms. Further, unlike in the classical setting, the performance of an algorithm can no longer be evaluated relative to the best fixed arm in hindsight; the regret is now to be measured relative to the best "available" arm in hindsight. One possible way is to compute the overall regret as the worst-case regret over all possible sample paths of availabilities (that is,under adversarial selection of available arms).

In the literature, upper confidence bound (UCB)-based approaches have been proposed and investigated for the S-MAB problem. Our contribution is to investigate a Thompson sampling based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the S-MAB problem. And, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the S-MAB problem.

Video Archive Go Top

 

Series: Ph.D. Colloquium
Title: New Methods for Learning from Heterogeneous and Strategic Agents

  • Speaker: Divya Padmanabhan (PhD student)
  • Faculty Advisor: Prof. Shirish Shevade, Prof. Y. Narahari
  • Date and Time: Friday, April 21, 2017, 12:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
In this thesis, we address several problems concerned with learning from multiple heterogeneous agents. The problems are relevant to several applications today such as crowdsourcing and sponsored search auctions. The agents are noisy in scenarios such as crowdsourcing and provide the training data for the learning problems. However the noise levels of the agents are unknown. Any learning algorithm which relies on information provided by these agents must learn their qualities while simultaneously learning a model. Another challenge arises when agents are strategic. The agents could be strategic on the efforts they put in towards performing a task. They could also be strategic on reporting the costs they incur. The learning algorithms suffer if incorrect information is provided by the agents. Therefore it is necessary to ensure that the agents exhibit desirable behaviour. We solve the following problems that arise while learning from such agents.

(1)Multi-label Classification with Heterogeneous Noisy Agents

Multi-label classification is a well-known supervised machine learning problem where each instance is associated with multiple classes. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. We first assume labels from a perfect source and ropose a novel topic model called Multi-Label Presence-Absence Latent Dirichlet Allocation (ML-PA-LDA). In the current era, a natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In this more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this motivation, we further augment our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS. With experiments on standard datasets, we show that the proposed models achieve superior performance over state of the art.

(2) Active Linear Regression with Heterogeneous, Noisy and Strategic Agents

We study the problem of training a linear regression model by procuring labels from multiple noisy agents or crowd annotators, under a budget constraint. We propose a Bayesian model for linear regression from multiple noisy sources and use variational inference for parameter estimation. When labels are sought from agents, it is important to minimize the number of labels procured as every call to an agent incurs a cost. Towards this, we adopt an active learning approach. In this specific context, we prove the equivalence of well-studied criteria of active learning like entropy minimization and expected error reduction. For the purpose of annotator selection in active learning, we observe a useful connection with the multi-armed bandit framework. Due to the nature of the distribution of the rewards on the arms, we resort to the Robust Upper Confidence Bound (UCB) scheme with truncated empirical mean estimator to solve the annotator selection problem. This yields provable guarantees on the regret. We further apply our model to the scenario where annotators are strategic and design suitable incentives to induce them to put in their best efforts.



(3)Ranking with Heterogeneous Strategic Agents

We look at the problem where a planner must rank multiple strategic agents, a problem that has many applications including sponsored search auctions (SSA). Stochastic multi-armed bandit (MAB) mechanisms have been used to solve this problem. Existing stochastic MAB mechanisms with a deterministic payment rule, proposed in the literature, necessarily suffer a regret of [image: Omega(T^{2/3})], where [image: T] is the number of time steps. This happens because the existing mechanisms consider the worst case scenario where the means of the agents' stochastic rewards are separated by a very small amount that depends on [image: T]. Our idea is to allow the planner to indicate the resolution, [image: Delta], with which the agents must be distinguished. This immediately leads us to introduce the notion of [image: Delta]-Regret. We propose a dominant strategy incentive compatible (DSIC) and individually rational (IR), deterministic MAB mechanism, based on ideas from the Upper Confidence Bound (UCB) family of MAB algorithms. The proposed mechanism [image: Delta]-UCB achieves a [image: Delta]-regret of [image: O(log T)]. We first establish the results for single slot SSA and then non-trivially extend the results to the case where multiple slots are to be allocated.

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: An improved lower bound for multi-r-ic depth four circuits as a function of the number of input variables

  • Speaker: Mr. Sumant Hegde
                   M.Sc. (Engg) student at CSA department
  • Faculty Advisor: Dr. Chandan Saha
  • Date and Time: Friday, April 21, 2017, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
In this work we study the multi-r-ic formula model introduced by Kayal and Saha (2015) and improve upon the lower bound for multi-r-ic depth four circuits given in the work by Kayal, Saha and Tavenas (2016) [KST16], when viewed as a function of the number of input variables N. The improvement leads to superpolynomial lower bounds for values of r significantly higher than what is known from prior works.

A (syntactically) multi-r-ic formula is an arithmetic formula in which the formal degree with respect to every variable is at most r at every gate. The formal degree of an input gate with respect to a variable x is defined to be 1 if the gate is labelled with x and 0 if it is labelled with a field element or a different variable. The formal degree of a sum (respectively, product) gate with respect to x is defined as the maximum (respectively, sum) of the formal degrees of its children with respect to x. A multi-r-ic formula computes a polynomial with individual degree of every variable bounded by r.

Multi-r-ic formulas are a natural extension of the relatively well-studied multilinear formulas in the work by Raz (2009) and Raz and Yehudayoff (2009) [RY09]. In this work, we focus on multi-r-ic formulas that compute multilinear polynomials. They are interesting because they allow the formal degree of the formula to be as high as r times the number of underlying variables. This gives extra room for ‘clever’ cancellations of the high degree components inside the formula thereby making this type of formulas harder to analyse (as formula homogenization is not known to be doable without blowing up the size superpolynomially unless degree is very small, as Raz (2010) showed). Most lower bound proofs in the literature operate under the restriction of low formal degree or multilinearity (as in the works by Raz (2009), [RY09], Kayal, Saha and Saptharishi (2014) and Kayal, Limaye, Saha and Srinivasan (2014)). In this light, multi-r-ic formulas computing multilinear polynomials form a reasonable intermediate model to study in order to gain some insight on how to deal with high formal degree in general formulas. Another motivation for understanding the high formal degree case better (even at depth three) comes from the depth reduction result by Gupta, Kamat, Kayal and Saptharishi (2013).

With the aim of making progress on multi-r-ic formula lower bound, [KST16] gave a (N/(d·r^2))^Ω(√(d/r))lower bound for multi-r-ic depth four formulas computing the N-variate Iterated Matrix Multiplication (IMM) polynomial of degree d. As a function of N, the lower bound is at most 2^Ω(√(N/r^3)) when d = Θ(N/r^2). In this thesis, our focus is on getting multi-r-ic depth four formulas with larger r into the arena of models that provenly admit a superpolynomial lower bound. In [KST16], r can be at most N^(1/3) for the bound to remain superpolynomial. Our result (stated below) gives a superpolynomial lower bound for multi-r-ic depth four formulas where r can be as high as (N·log N)^(0.9).

Theorem: Let N, d, r be positive integers such that 0.51·N ≤ d ≤ 0.9·N and r ≤ (N·log N)^(0.9). Then there is an explicit N-variate degree-d multilinear polynomial in VNP such that any multi-r-ic depth four circuit computing it has size 2^Ω(√(N·(log N)/r)).

The theorem yields a better lower bound than that of [KST16], when viewed as a function of N. Also, the bound matches the best known lower bound (as a function of N) for multilinear (r = 1) depth four circuits, given by [RY09], which is 2^Ω(√(N·log N)).

The improvement is obtained by analysing the shifted partials dimension (SPD) of an N-variate polynomial in VNP (as opposed to a VP polynomial in [KST16]) of high degree range of Θ(N), and comparing it with the SPD of a depth four multi-r-ic circuit. In [KST16] a variant of shifted partials, called shifted skewed partials, is critically used to analyse the IMM polynomial (which is in VP). We observe that SPD (without “skew”) is still effective for the Nisan-Wigderson polynomial (which is in VNP), and yields a better lower bound as a function of N when degree d is naturally chosen to be high.

Our analysis gives a better range for r and a better lower bound in the high degree regime, not only for depth four multi-r-ic circuits but also for the weaker models: multi-r-ic depth three circuits and multi-r-ic depth four circuits with low bottom support. These (weaker) models are instrumental in gaining insight about general depth four multi-r-ic circuits, both in [KST16] and our work.

Video Archive Go Top

 

Series: Ph.D. Thesis Defense
Title: Scalable Sparse Bayesian Nonparametric and Matrix Tri-factorization Models for Text Mining applications

  • Speaker: Ranganath B. N
  • Faculty Advisor: Prof. Shalabh Bhatnagar
  • Date and Time: Tuesday, April 18, 2017, 2:00 PM
  • Venue: CSA Conference Room (Room No. 225, First Floor)

Abstract
Hierarchical Bayesian Models and Matrix factorization methods provide an unsupervised way to learn latent components of a data from the grouped or sequence data. For example, in document data, latent component corresponds to topic with each topic as a distribution over a finite vocabulary of words. For many applications, there exist sparse relationships between the domain entities and the latent components of the data. Traditional approaches for topic modeling do not take into account these sparsity considerations. Modeling these sparse relationships helps in extracting relevant information leading to improvements in topic accuracy and scalable solution. In our thesis, we explore these sparsity relationships for different applications such as text segmentation, topical analysis and entity resolution in dyadic data through the Bayesian and Matrix tri-factorization approaches, proposing scalable solutions.

In our first work, we address the problem of segmentation of a collection of sequence data such as documents using probabilistic models. Existing state-of-the-art Hierarchical Bayesian Models are connected to the notion of Complete Exchangeability or Markov Exchangeability. Bayesian Nonparametric Models based on the notion of Markov Exchangeability such as HDP-HMM and Sticky HDP-HMM, allow very restricted permutations of latent variables in grouped data (topics in documents), which in turn lead to computational challenges for inference. At the other extreme, models based on Complete Exchangeability such as HDP allow arbitrary permutations within each group or document, and inference is significantly more tractable as a result, but segmentation is not meaningful using such models. To overcome these problems, we explored a new notion of exchangeability that lies between Markov Exchangeability and Complete Exchangeability for which segmentation is meaningful, but inference is computationally less expensive than both Markov and Complete Exchangeability. Parametrically, Block Exchangeability contains sparser number of transition parameters, linear in number of states compared to the quadratic order for Markov Exchangeability that is still less than that for Complete Exchangeability and for which parameters are on the order of the number of documents. For this, we propose a nonparametric Block Exchangeable model(BEM) based on the new notion of Block Exchangeability, which we have shown to be a superclass of Complete Exchangeability and subclass of Markov Exchangeability. We propose a scalable inference algorithm for BEM to infer the topics for words and segment boundaries associated with topics for a document using the collapsed Gibbs Sampling procedure. Empirical results show that BEM outperforms state-of-the-art nonparametric models in terms of scalability and generalization ability and shows nearly the same segmentation quality on News dataset, Product review dataset and on a Synthetic dataset. Interestingly, we can tune the scalability by varying the block size through a parameter in our model for a small trade-off with segmentation quality.

In addition to exploring the association between documents and words, we also explore the sparse relationships for dyadic data, where associations between one pair of domain entities such as (documents, words) and associations between another pair such as (documents, users) are completely observed. We motivate the analysis of such dyadic data introducing an additional discrete dimension, which we call topics, and explore sparse relationships between the domain entities and the topic, such as of user-topic and document-topic respectively.

In our second work, for this problem of sparse topical analysis of dyadic data, we propose a formulation using sparse matrix tri-factorization. This formulation requires sparsity constraints, not only on the individual factor matrices, but also on the product of two of the factors. To the best of our knowledge, this problem of sparse matrix tri-factorization has not been studied before. We propose a solution that introduces a surrogate for the product of factors and enforces sparsity on this surrogate as well as on the individual factors through L1-regularization. The resulting optimization problem is efficiently solvable in an alternating minimization framework over sub-problems involving individual factors using the well known FISTA algorithm. For the sub-problems that are constrained, we use a projected variant of the FISTA algorithm.

We also show that our formulation leads to independent sub-problems towards solving a factor matrix, thereby supporting parallel implementation leading to a scalable solution. We perform experiments over bibliographic and product review data to show that the proposed framework based on sparse tri-factorization formulation results in better generalization ability and factorization accuracy compared to baselines that use sparse bi-factorization.

Even though the second work performs sparse topical analysis for dyadic data, finding sparse topical associations for the users, the user references with different names could belong to the same entity and those with same names could belong to different entities. The problem of entity resolution is widely studied in the research community, where the goal is to identify real users associated with the user references in the documents.

Finally, we focus on the problem of entity resolution in dyadic data, where associations between one pair of domain entities such as documents-words and associations between another pair such as documents-users are observed, an example of which includes bibliographic data.

In our final work, for this problem of entity resolution in bibliographic data, we propose a Bayesian nonparametric `Sparse entity resolution model' (SERM) exploring the sparse relationships between the grouped data involving grouping of the documents, and the topics/author entities in the group. Further, we also exploit the sparseness between an author entity and the associated author aliases. Grouping of the documents is achieved with the stick breaking prior for the Dirichlet processes (DP). To achieve sparseness, we propose a solution that introduces separate Indian Buffet process (IBP) priors over topics and the author entities for the groups and k-NN mechanism for selecting author aliases for the author entities.

We propose a scalable inference for SERM by appropriately combining partially collapsed Gibbs sampling scheme in Focussed topic model (FTM), the inference scheme used for parametric IBP prior and the k-NN mechanism. We perform experiments over bibliographic datasets, Citeseer and Rexa, to show that the proposed SERM model improves the accuracy of entity resolution by finding relevant author entities through modeling sparse relationships and is scalable, when compared to the state-of-the-art baseline.

Video Archive Go Top

 

Series: Department Seminar
Title: Identifying groups of strongly correlated variables through Smoothed Ordered Weighted l1-norms

  • Speaker: Mr. Raman Sankaran
                   PhD student, CSA
  • Faculty Advisor: Prof. Chiranjib Bhattacharyya
  • Date and Time: Thursday, April 13, 2017, 3:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
The failure of LASSO to identify groups of correlated predictors in linear regression has sparked significant research interest. Re- cently, various norms [1, 2] were proposed, which can be best described as instances of ordered weighted l1 norms (OWL) [3], as an alternative to l1 regularization used in LASSO. OWL can identify groups of cor- related variables but it forces the model to be constant within a group. This artifact induces unnecessary bias in the model esti- mation. In this paper we take a submodu- lar perspective and show that OWL can be posed as the Lov ́asz extension of a suitably defined submodular function. The submodu- lar perspective not only explains the group- wise constant behavior of OWL, but also sug- gests alternatives. The main contribution of this paper is smoothed OWL (SOWL), a new family of norms, which not only identifies the groups but also allows the model to be flexi- ble inside a group. We establish several algo- rithmic and theoretical properties of SOWL including group identification and model con- sistency. We also provide algorithmic tools to compute the SOWL norm and its proxi- mal operator, whose computational complex- ity (O(dlogd)) is significantly better than that of general purpose solvers (O(d2 log d)). In our experiments, SOWL compares favor- ably with respect to OWL in the regimes of interest.

Speaker Bio:
Raman Sankaran is a PhD student in the Department of CSA This is a practice talk for AISTATS.

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: Supervised Classification of Missense Mutations as Pathogenic or Tolerated using Ensemble Learning Methods.

  • Speaker: Ms. Rashmi Balasubramanyam
                   M.Sc Student
  • Faculty Advisor: Prof Chiranjib Bhattacharyya
  • Date and Time: Thursday, April 06, 2017, 9:30 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Missense mutations account for more than 50% of the mutations known to be involved in human inherited diseases. Missense classification is a challenging task that involves sequencing of the genome, identifying the variations, and assessing their deleteriousness. This is a very laborious, time and cost intensive task to be carried out in the laboratory. Advancements in bioinformatics have led to several large-scale next-generation genome sequencing projects, and subsequently the identification of genome variations. Several studies have combined this data with information on established deleterious and neutral variants to develop machine learning based classifiers.

There are significant issues with the missense classifiers due to which missense classification is still an open area of research. These issues can be classified under two broad categories: (a) Dataset overlap issue - where the performance estimates reported by the state-of-the-art classifiers are overly optimistic as they have often been evaluated on datasets that have significant overlaps with their training datasets. Also, there is no comparative analysis of these tools using a common benchmark dataset that contains no overlap with the training datasets, therefore making it impossible to identify the best classifier among them. Also, such a common benchmark dataset is not available. (b) Inadequate capture of vital biological information of the protein and mutations - such as conservation of long-range amino acid dependencies, changes in certain physico-chemical properties of the wild-type and mutant amino acids, due to the mutation. It is also not clear how to extract and use this information. Also, some classifiers use structural information that is not available for all proteins.

In this study, we compiled a new dataset, containing around 2 - 15% overlap with the popularly used training datasets, with 18,036 mutations in 5,642 proteins. We reviewed and evaluated 15 state-of-the-art missense classifiers - SIFT, PANTHER, PROVEAN, PhD-SNP, Mutation Assessor, FATHMM, SNPs&GO, SNPs&GO-3D, nsSNPAnalyzer, PolyPhen-2, SNAP, MutPred, PON-P2, CONDEL and MetaSNP, using the six metrics - accuracy, sensitivity, specificity, precision, NPV and MCC. When evaluated on our dataset, we observe huge performance drops from what has been claimed. Average drop in the performance for these 13 classifiers are around 15% in accuracy, 17% in sensitivity, 14% in specificity, 7% in NPV, 24% in precision and 30% in MCC. With this we show that the performance of these tools is not consistent on different datasets, and thus not reliable for practical use in a clinical setting.

As we observed that the performance of the existing classifiers is poor in general, we tried to develop a new classifier that is robust and performs consistently across datasets, and better than the state-of-the-art classifiers. We developed a novel method of capturing long-range amino acid dependency conservation by boosting the conservation frequencies of substrings of amino acids of various lengths around the mutation position using AdaBoost learning algorithm. This score alone performed equivalently to the sequence conservation based tools in classifying missense mutations. Popularly used sequence conservation properties was combined with this boosted long-range dependency conservation scores using AdaBoost algorithm. This reduced the class bias, and improved the overall accuracy of the classifer. We trained a third classifier by incorporating changes in 21 important physico-chemical properties, due to the mutation. In this case, we observed that the overall performance further improved and the class bias further reduced. The performance of our final classifier is comparable with the state-of-the-art classifiers. We did not find any significant improvement, but the class-specific accuracies and precisions are marginally better by around 1-2% than those of the existing classifiers. In order to understand our classifier better, we dissected our benchmark dataset into: (a) seen and unseen proteins, and (b) pure and mixed proteins, and analyzed the performance in detail. Finally we concluded that our classifier performs consistently across each of these categories of seen, unseen, pure and mixed protein.

Video Archive Go Top

 

Series: Ph.D. Thesis Defense
Title: Program Repair by Automated Generation of Hints

  • Speaker: Ms. Shalini Kaleeswaran
                   Ph.D student
  • Faculty Advisor: Prof. Aditya Kanade
  • Date and Time: Wednesday, April 05, 2017, 12:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Programming has become an important skill in today's technology-driven world. It is a complex activity because of which programmers make mistakes in their software. Student programmers make mistakes in their programs due to lack of programming experience and also due to lack of understanding of the algorithmic concepts being used. An experienced programmer in the industry, on the other hand, makes mistakes in the software he develops because of the mere complexity of an industry-grade software. Debugging one's programs, i.e., finding and fixing faults is one of the challenging activities for a programmer. This necessitates the need to develop automated techniques that are effective in assisting programmers repair faults in their programs.

In this thesis, we present new computer-assisted approaches to help programmers repair their programs. Finding repair for a fault in the program is essentially a search over the space of all possible repairs. Traditional repair techniques output a complete repaired program that eliminates the fault in the original program. This ambitious goal of finding a complete repair is often unachievable because the search space becomes too large to navigate efficiently. The key insight of our work is that programmers only need guidelines that help them understand the faults they made and suggestions on how to fix them, as opposed to a complete repair. Therefore, this thesis proposes a novel perspective of solving the program repair problem, where we generate textual hints that direct the programmer on how to fix the fault. A hint specifies which part of the program needs modification as well as how to modify it. For student programmers, our hints help them reflect on the mistakes they made and improve their understanding of the problem being solved in the program. For experienced programmers, our hints help them easily identify the cause of fault and guide them to arrive at a repair. Our approaches use novel combinations of syntactic, symbolic and statistical reasoning techniques to achieve this goal of semi-automated program repair.

Our first contribution is an algorithmic technique for automated synthesis of repair hints. Our technique takes as input a faulty program and a set of test cases and outputs a ranked list of repair hints that suggest how to rectify a faulty statement in the program. In this technique, we leverage symbolic execution and statistical correlation analysis to identify expressions that are likely to occur in the repaired code. Then we use syntactic pattern matching to combine these expressions algorithmically and generate repair hints. Since we restrict ourselves to finding ``building blocks'' of repair, our hints are very effective in guiding the programmer to the final repair, even in scenarios where a complete repair cannot be obtained. We implemented this technique to develop a tool called ``MintHint'', that performs automated synthesis of repair hints for C programs. We evaluated the effectiveness of MintHint by performing a user study and found that programmers who used MintHint were able to repair more faults compared to those who didn't. In addition, they obtained the repair 5.8 times faster.

Our next contribution is a semi-supervised feedback generation methodology for student programming assignments. In this methodology, we take a set of student submissions as input and generate personalized, verified feedback for each submission that suggests modifications to fix faults in the program, if any. Student submissions are first clustered by solution strategy, which is followed by an automated feedback generation phase. We use unsupervised clustering to reduce the burden on the instructor in providing reference implementations for each possible solution strategy. The instructor is called upon simply to identify a correct submission in each cluster, which acts as the reference implementation for all submissions in the cluster. In the next phase, we use a novel counter-example guided feedback generation algorithm that compares a student submission with a reference implementation to generate verified feedback. We applied this methodology to programs implementing iterative dynamic programming algorithms and developed a tool called ``DPAssist'' that works on C programs. We evaluated DPAssist on a dataset of 2226 student submissions to 4 programming problems from the programming contest site CodeChef and successfully generated verified feedback for 85% of them.

Video Archive Go Top

 

 

 

 

 

 

 

 

 

 

Copyright: CSA, IISc 2017      Phone: +91-80-22932368          Fax: +91-80-23602911 Travel Blog    Feedback    Credits