Department of Computer Science and Automation Department of Computer Science and Automation, IISc, Bangalore, India Indian Institute of Science
HOME | ABOUT US | PEOPLE | RESEARCH | ACADEMICS | FACILITIES | EVENTS / SEMINARS | NEWS | CONTACT US

UPCOMING SEMINARS

Series: Research Student Colloquium
Title: Data Races and Static Analysis for Interrupt-Driven Kernels

  • Speaker: Ms.Nikita Chopra
                   M.Sc.(Engg.) student
                   Dept. of CSA
  • Date and Time: Friday, July 27, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Embedded software is widespread and increasingly employed in safety-critical applications in medical, automobile, and aerospace domains. In this work, we consider a class of interrupt-driven programs that model the kernel API libraries of some popular real-time embedded operating systems and the synchronization mechanisms they use. We define a natural notion of data races and a happens-before ordering for such programs. The key insight is the notion of disjoint blocks to define the synchronizes-with relation. This notion also suggests an efficient and effective lockset based analysis for race detection. It also enables us to define efficient "sync-CFG" based static analyses for such programs, which exploit data-race freedom. We use this theory to carry out static analysis on the FreeRTOS kernel library to detect races and to infer simple relational invariants on key kernel variables and data-structures.

Speaker Bio:
Nikita Chopra is an M.Sc.(Engg.) student in the CSA department. She is working in Programming Languages Lab under the supervision of Prof. Deepak D'Souza. Her broad research interests lie in the area of static program analysis.

Host Faculty: Prof. Sunil L Chandran & Prof. Shalabh Bhatnagar

Video Archive Go Top

 

Series: Ph.D. Colloquium
Title: Design of a Large-Scale Distributed File System for Peer-to-Peer Networks

  • Speaker: Mr. Hrishikesh Dewan
                   PhD(ERP)
  • Faculty Advisor: Prof. R. C. Hansdah and Dr. Girish Suryanarayana(Orgn)
  • Date and Time: Wednesday, July 25, 2018, 3:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
We present, in this thesis, a distributed file storage system called Julunga. Julunga is designed for peer-to-peer systems supporting federated data centre environments such as cloud computing centres as well as individual nodes. In Julunga, the metadata, the namespace and the data blocks of the files are entirely distributed with no hard limits on the number of files that can be accommodated in a single directory. Also, there is no physical limit on the size of a file. Junlunga supports file of exabytes size with ease along with the number of concurrent users updating, reading and writing to the same file or a directory. The location of the data blocks of a file is determined by using functions, thus expunging the need for file allocation tables. Many new data structures and algorithms were designed where locality and preferences of users are considered to provide optimal storage locations for files and file system metadata. Julunga is designed on top of a new overlay network design named 'Afuronto' that is suited for large-scale peer-to-peer networks requiring fast message transmission, and we use different operators as found in evolutionary computing to emulate a semi-structured network. We present the network structure and the various algorithms required for essential communication and maintenance. We also study the proposed network by simulation and compare its performance with existing overlay networks. The simulated results show that, on the average, there are at most six hops required for a message originated at any node in the network to be transferred to any other node in the network. The network so designed is scalable, resilient to failures, and can include a node of arbitrary size in processing capacity, network bandwidth, and storage.

Reaching consensus is an essential problem in the design of distributed systems. To date, many algorithms for achieving consensus in a distributed system have been proposed. However, only a few of them works under practical implementation. For Julnga, we have designed and used a new consensus algorithm called Majuli. Majuli is designed such that it easier to understand for system developers and resembles natural techniques of reaching agreement when acting co-operatively. We show the safety and the liveness property of the proposed consensus algorithm and demonstrate its usefulness in a few distributed algorithms used in Julunga that require agreement or consensus for their correct operations.

Finally, we describe a new data structure called Hash-Trie Tree that we have developed to parse and locate REST based services. The implementation of the overlay network and the file system are exposed as REST-based services on top of HTTP(s) protocol. Hash-Trie Tree is used in our custom HTTP server designed in C# that has optimum performance when compared with other REST-based HTTP servers such as Windows Communication Foundation, Jersey. We show how our implementation supports REST based service declarations, organizes the various methods in an optimized way for faster access and scale to support thousands of concurrent users.

Video Archive Go Top

 

Series: M.Tech (Research) Colloquium
Title: Guarding Terrain Using k Watchtowers

  • Speaker: Nitesh Tripathi
                   Dept. of CSA
  • Faculty Advisor: Prof. Sunil L. Chandran
  • Date and Time: Tuesday, July 24, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
The discrete k-watchtower problem for a polyhedral terrain T in 3D with n vertices is to find k vertical segments, called watchtowers, of smallest height, whose bottom end-points (bases) lie on some vertices of T , and every point of T is visible from the top end-point (tip) of at least one of those watchtowers. In other words, we have to find k watchtowers whose bottom end-points (bases) lie on some vertices of T, such that every point of T is visible from the tip of at least one watchtower and the maximum height among these k watchtowers is minimized. Zhu gave an algorithm for this problem when k = 1 with running time O(n log n). Agrawal et al. proposed a polynomial time algorithm using parametric search technique for this problem with k = 2. Surprisingly, no result is known for the problem when k > 2. In this talk, we discuss our proposed algorithm to solve k-watchtower problem for a fixed constant k. Our algorithm does not use parametric search and it’s running time is polynomial in n.

Speaker Bio:
Nitesh is a Masters student in CSA department, working in Theory Lab. He is advised by Prof. Sunil L. Chandran.

Video Archive Go Top

 

Series: Department Seminar
Title: On Randomization and Combinatorics in Computational Geometry,Discrete Mathematics, and Combinatorial Representation Theory.

  • Speaker: Dr. Kunal Dutta
                   Postdoc Researcher
                   Data Shape group
                   Inria Sophia
                   France
  • Date and Time: Monday, July 23, 2018, 3:30 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
In this talk we shall see three very different areas of applications of combinatorics in computer science and mathematics, illustrating different flavours of combinatorial reasoning.

First, we consider lower bounds on the maximum size of an independent set, as well as the number of independent sets, in k-uniform hypergraphs, together with an extension to the maximum size of a subgraph of bounded degeneracy in a hypergraph. Joint works with C. R. Subramanian (IMSc, Chennai), Dhruv Mubayi (UIC, Chicago) and Jeff Cooper (UIC, Chicago) and Arijit Ghosh (IMSc Chennai).

Next, we shall look at Haussler's Packing Lemma from Computational Geometry and Machine Learning, for set systems of bounded VC dimension. We shall go through its generalization to the Shallow Packing Lemma for systems of shallow cell complexity, and see how it can be used to prove the existence of small representations of set systems, such as epsilon nets, M-nets, etc. Joint works with Arijit Ghosh (IMSc, Chennai), Nabil Mustafa (ESIEE Paris), Bruno Jartoux (ESIEE Paris) and Esther Ezra (Georgia Inst. Tech., Atlanta).

The last problem is on the decomposition, into irreducible representations, of the Weil representation of the full symplectic group associated to a finite module of odd order over a Dedekind domain. We shall discuss how a poset structure defined on the orbits of finite abelian p-groups under automorphisms can be used to show the decomposition of the Weil representation is multiplicity-free, as well as parametrize the irreducible subrepresentations, compute their dimensions in terms of p, etc. Joint works with Amritanshu Prasad (IMSc, Chennai).

Host Faculty: Prof. Sunil L Chandran

Video Archive Go Top

 

Series: Ph.D. Colloquium
Title: Problems on bend-number, circular separation dimension and maximum edge 2-colouring

  • Speaker: Mr. Abhiruk Lahiri
                   Ph.D Student
                   Dept. of CSA
  • Faculty Advisor: Prof. Sunil L Chandran
  • Date and Time: Monday, July 23, 2018, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Representation of graphs as the intersection graphs of geometric objects has a long history. The objective is to a find a collection of “simple” sets S such that a given graph G is its intersection graph. We are interested in two types of intersection representations motivated by VLSI circuit layout problem. In these models, vertices are represented by polygonal paths with alternating horizontal and vertical segments. The measure of the complexity of a path is defined by the number of bends it has. The objective is to minimise the maximum number of bends used by any path in a representation. This minimum number (over all possible representations) is called the bend number of the graph.

In the first model, two vertices share an edge if and only if corresponding paths intersect. A graph that can be represented in such a way is called a VPG graph. We study a subclass of the planar graphs on this model. In the second model, two vertices of the graph share an edge if and only if corresponding paths overlap on a non-zero length segment. A graph that can be represented in such a way is called an EPG graph. We study Halin graphs which is a subclass of the planar graphs, fully subdivided graphs and minimally 2-connected graphs for this model. Using one of these results, we show optimization problems such as maximum independent set, minimum dominating set are APX-hard on 1-bend EPG graphs and its subclass which is called L-EPG graphs. We devise a constant factor approximation algorithm for the maximum independent set problem on 1-bend EPG graphs which guarantees an approximation ratio of 3, which improves the previously known approximation ratio of 4.

In the second part, we study the notion of circular separation dimension which was introduced recently by Douglas West. Formally, a pair of non-adjacent edges is said to be separated in a circular ordering of vertices if the endpoints of the two edges do not alternate in the ordering. The circular separation dimension (CSD) of a graph G is the minimum number of circular orderings of the vertices of G such that every pair of non-adjacent edges is separated in at least one of the circular orderings. We establish a new upper bound for CSD in terms of the chromatic number of the graph. We further study this question for special graph classes such as series-parallel graph and two-outerplanar graph.

In the final part, we study maximum edge 2-colouring problem. For a graph G, the maximum edge 2-colouring problem seeks the maximum possible colours that can be used to colour the edges of the graph such that edges incident on a vertex span at most two distinct colours. The problem is well studied in combinatorics, in the context of the anti-Ramsey number. Algorithmically, the problem is known to be NP-hard. It is also known that no polynomial time algorithm can approximate to a factor less than 3/2 assuming the unique game conjecture. The obvious but the only known algorithm issues different colours to the edges of a maximum matching and different colours to remaining connected components. We establish an improved approximation bound of 8/5 for the algorithm, for triangle-free graphs with perfect matching.

Video Archive Go Top

 

PAST SEMINARS

Series: CSA Faculty Colloquium
Title: Efficient Algorithms for Structured Prediction Problems

  • Speaker: Prof. Shirish K Shevade.
  • Date and Time: Friday, July 20, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
In many real-world prediction problems, the output is a structured object like a sequence or a tree or a graph. Such problems are referred to as structured prediction problems. Support vector machines (SVMs) have shown promise for solving these problems. Learning structural SVMs amounts to solving a convex quadratic program with a huge number of constraints. The number of constraints is typically exponential, which makes the problem intractable by general purpose optimization methods. In this talk, we discuss a fast sequential dual method for training structural SVMs. We also discuss a simple and efficient algorithm for solving a non-convex optimization problem for structured prediction in a semi-supervised setting.

Speaker Bio:
Shirish Shevade received his Ph.D. from the Indian Institute of Science, Bangalore, India, in 2002. He is currently an Associate Professor in the Department of Computer Science and Automation at the Indian Institute of Science. His research interests span many areas of Machine Learning such as Support Vector Machines, Gaussian Processes and semi-supervised learning. He is a Senior Member of IEEE.

Host Faculty: Prof. Sunil Chandran & Prof. Shalabh Bhatnagar

Video Archive Go Top

 

Series: M.Tech (Research) Thesis Defense
Title: Design of Trusted Market Platforms using Permissioned Blockchains and Game Theory

  • Speaker: Ms. Shivika Narang
                   M.Tech. (Research) Student
                   Dept. of CSA
                   IISc
  • Faculty Advisor: Prof. Y. Narahari
  • Date and Time: Thursday, July 19, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
The blockchain concept forms the backbone of a new wave technology that promises to be deployed extensively in a wide variety of industrial and societal applications. Governments, financial institutions, banks, industrial supply chains, service companies, and even educational institutions and hospitals are investing in a substantial manner in the hope of improving business efficiency and operational robustness through deployment of blockchain technology. This thesis work is concerned with designing trustworthy business-to-business (B2B) market platforms drawing upon blockchain technology and game theory.

The proposed platform is built upon three key ideas. First, we use permissioned blockchains with smart contracts as a technically sound approach for building the B2B platform. The blockchain deploys smart contracts that govern the interactions of enterprise buyers and sellers. Second, the smart contracts are designed using a rigorous analysis of a repeated game model of the strategic interactions between buyers and sellers. We show that such smart contracts induce honest behavior from buyers and sellers. Third, we embed cryptographic regulation protocols into the permissioned blockchain to ensure that business sensitive information is not revealed to the competitors. We believe our work is an important step in the direction of building a powerful B2B platform that maximizes social welfare and enables trusted collaboration between strategic enterprise agents.

Video Archive Go Top

 

Series: CSA Distinguished Lecture
Title: A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics

  • Speaker: Professor Moses Charikar
                   Donald E. Knuth Professor Computer Science
                   Stanford University
  • Date and Time: Tuesday, July 17, 2018, 11:30 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
We study the Stochastic Gradient Langevin Dynamics (SGLD) algorithm for non-convex optimization. The algorithm performs stochastic gradient descent, where in each step it injects appropriately scaled Gaussian noise to the update. We analyze the algorithm’s hitting time to an arbitrary subset of the parameter space. Two results follow from our general theory: First, we prove that for empirical risk minimization, if the empirical risk is pointwise close to the (smooth) population risk, then the algorithm finds an approximate local minimum of the population risk in polynomial time, escaping suboptimal local minima that only exist in the empirical risk. Second, we show that SGLD gives an alternate approach and an improvement for learning linear classifiers under zero-one loss.

Joint work with Yuchen Zhang and Percy Liang. (COLT '17)

Speaker Bio:
Moses Charikar is the Donald E. Knuth professor of Computer Science at Stanford University. He obtained his PhD from Stanford in 2000, spent a year in the research group at Google, and was on the faculty at Princeton from 2001-2015. He is broadly interested in the design and analysis of algorithms with an emphasis on approximation algorithms for hard problems, metric embeddings and algorithmic techniques for big data. He won the best paper award at FOCS 2003 for his work on the impossibility of dimension reduction, the best paper award at COLT 2017 and the 10 year best paper award at VLDB 2017. He was jointly awarded the 2012 Paris Kanellakis Theory and Practice Award for his work on locality sensitive hashing, and was named a Simons Investigator in theoretical computer science in 2014.

Host Faculty: Dr. Anand Louis

Video Archive Go Top

 

Series: Department Seminar
Title: A Case for (Limited) Explicit Dataflow Graph Execution

  • Speaker: Dr. Gagan Gupta
                   Computer Architecture Researcher
                   Microsoft Research
                   USA
  • Date and Time: Tuesday, July 17, 2018, 10:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
As Moore’s law slows down, we must find new ways of scaling processor performance. Von Neumann ISAs have been successful because they provide a clean conceptual target to software while running the complete gamut of algorithms reasonably well. We badly need clean new abstractions that utilize fine-grain parallelism and run energy efficiently. Prior work (such as the UT-Austin TRIPS EDGE ISA and others) showed how to form blocks of computation containing limited-scope dataflow graphs, which can be thought of as small structures (DAGs) mapped to silicon. In this presentation I describe some post-TRIPS work that addresses the limitations of the original ISA, and how those extensions can provide energy-efficient execution for single threads compared to a conventional out-of-order superscalar design. I will also describe a specific microarchitecture based on these extensions, and show early results. Finally, I will describe how this notion of “structured computation” can be extended to form accelerators dynamically with minor changes to the CPU, or extended to synthesize efficient accelerators that are specialized to a specific workload.

Speaker Bio:
Gagan Gupta has worked in the semiconductor industry for over two decades. Presently he is a computer architecture researcher at Microsoft Research. He has led engineering and marketing teams to launch commercially successful processors at LSI Logic, Huawei, and ARC International, and influenced R&D strategies in the industry (e.g., at Intel and Sandisk). Gagan has a Ph.D. in Computer Sciences from the University of Wisconsin-Madison. His work has been recognized with awards at multiple industrial and academic fora, and is regularly cited in trade publications.

Host Faculty: Dr. Arkaprava Basu

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: Optimizing Matrix Computations with PolyMage

  • Speaker: Ms. Kumudha K N
  • Faculty Advisor: Prof. Uday Kumar Reddy
  • Date and Time: Friday, July 13, 2018, 11:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Linear algebra computations and other arbitrary affine accesses are ubiquitous in applications from domains like scientific computing and digital signal processing (DSP). Libraries such as OpenBLAS, MKL, and FFTW provide efficient hand-optimized implementations for matrix and vector primitives used in these domains for various architectures. Applications are then built upon these standard library routines to obtain high performance. These libraries do not perform well for all matrix sizes and obtain sub-optimal performance for small matrices. The interface of these libraries can also be fairly complex requiring several input parameters. Thus, an even higher level of abstraction is often desired to improve productivity. Further, by using these libraries the opportunity to optimize across different library calls is lost and traditional programming to exploit this locality using library functions becomes complex.

The work in this thesis proposes (i) a tile size selection model which works for any arbitrary affine access and eschews auto-tuning, (ii) a simple heuristic to determine profitability of library call mapping and falling back to generated code otherwise, (iii) an intra-tile optimization technique to expose inner-loop parallelism thus enabling general purpose compiler's vectorizer to generate vector instructions, (iv) a DSL approach with high level primitives and functions to allow expressing computations efficiently. The optimizations proposed are implemented in the PolyMage DSL, a domain specific language (DSL) for image processing pipelines. Its optimizer is able to perform optimizations like fusion, tiling, and loop optimizations for image processing pipelines. We extend PolyMage's compiler to support the above optimizations.It is thus able to optimize computations from additional domains including dense linear algebra and certain DSP computations.

We experimentally evaluates these optimizations on modern multicore systems using representative benchmarks from PolyBench, digital signal processing, image processing and neural networks. The results are compared to state-of-the-art optimization approaches and frameworks in each domain. Experiments on DSP benchmarks show that our optimizations has a mean speed up of 7.7x over existing PolyMage optimizer, 5.1x over numpy and 1.9x over MATLAB toolboxes with parallel computing support. On linear algebra computations from PolyBench benchmark suite show that we obtain a speedup of 7.5x over existing PolyMage optimizer, 3.6x over Pluto and 4.1x over PPCG.

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: Active Learning for Efficient Testing of Student Programs

  • Speaker: Mr. Ishan Rastogi
                   M.Sc (Engg)Student
                   Dept. of CSA
  • Faculty Advisor: Prof. Shirish Shevade and Prof. Aditya Kanade
  • Date and Time: Friday, July 13, 2018, 10:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Automated evaluation of student programs is an important problem since it eases instructor's burden by reducing or altogether eliminating the time required for program evaluation. A solution to this problem would also make it possible to scale up courses having computer programming content to a larger audience, such as to a massive online open courses setting. The problem is, however, inherently difficult as students can make a variety of mistakes spanning the entire gamut -- from shallow syntactic errors to more involved semantic ones. Among these, identifying whether a program has a logical error is perhaps the most challenging, since the error may only surface in a subset of all valid test cases. Additionally, unlike syntax errors, logical errors are task specific and hence require an additional input from the instructor (for example, in the form of a set of quality test cases) for every new task introduced into the system.

In this talk, we will discuss our work, Automated Testing using Active learning and Symbolic-execution (ATAS), an algorithm designed to identify whether a student program is logically incorrect in an online setting. This algorithm builds upon the recent advances in both symbolic execution and active learning. Symbolic execution is a program analysis technique which can generate test cases through symbolic constraint solving. Our method makes use of a reference implementation of the task as its sole input. We compare our method with a symbolic execution-based baseline on six programming tasks retrieved from CodeForces comprising of about 23000 student submissions. We demonstrate an average improvement of over $2.5$x over the baseline in terms of runtime (thus making it more suitable for online evaluation), without significant degradation in evaluation accuracy.

Speaker Bio:
Ishan is a masters student at the CSA department, working in Intelligent Systems Lab. He is advised by Prof. Shirish Shevade and Prof. Aditya Kanade.

Video Archive Go Top

 

Series: Department Seminar
Title: Leveraging Data and AI for Early Warning of Social and Economic Disruption

  • Speaker: Prof. Rohini Srihari
                   Professor,
                   Dept. of Computer Science and Engineering
                   State University of New York, Buffalo.
  • Date and Time: Thursday, July 12, 2018, 11:30 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
According to the Institute of Economics and Peace, conflict costs the world 13.3% of its GDP, or $13.6 trillion per year. And these numbers don’t even begin to tell the story of the human costs of violence, which disproportionately affects women and youth.

This talk focuses on the design and development of a software platform, groundTruth, for early warning of social and economic disruption in emerging markets and fragile states. This initiative leverages the following developments: (i) major advances in artificial intelligence, (ii) widespread adoption of mobile devices and the internet in emerging countries, and (iii) the availability of an unprecedented amount of data, including social media and sensor data. A taxonomy of risk drivers, types of disruptions, and key indicators useful in predicting disruptions will be presented. Predictive analytics algorithms based on machine learning models (including deep learning models} will be discussed along with initial results. These have been trained and tested on rich, manually curated, global event databases. A demo of the current system will be presented along with plans for further development.

Speaker Bio:
Rohini Srihari is an educator, entrepreneur and computer scientist. She is a Professor in the Dept. of Computer Science and Engineering at the State University of New York, Buffalo. Her research has focused on various aspects of multilingual text mining and natural language processing and has been funded by agencies such as DARPA and the National Science Foundation. She has founded and led language technology companies, delivering content analytics solutions and services to enterprise customers across multiple industries. She is currently serving as the Chief Data Scientist with PeaceTech Lab based in Washington, DC where she is overseeing data and technology programs with the potential for major societal impact. She received a B. Math. from the University of Waterloo, Canada and a PhD in Computer Science from the University at Buffalo.

Host Faculty: Prof. M Narasimha Murty

Video Archive Go Top

 

Series: Ph.D. Colloquium
Title: Recommendations in Complex Networks: Unifying Structure into Random Walk.

  • Speaker: Sharad Kumar Nandanwar
  • Faculty Advisor: Prof. M. Narasimha Murty
  • Date and Time: Thursday, July 12, 2018, 10:00 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Making recommendations or predicting links which are likely to exist in the future is one of the central problems in network science and graph mining. In spite of modern state-of-the-art approaches for link prediction, the traditional approaches like Resource Allocation Index, Adamic Adar still find heavy use, because of their simplistic nature yet competitive performance. Our preliminary investigation reveals that a major fraction of missing links which are observed in the near future, are the links which are between one-hop distant nodes. Current "friend of friend is a friend" based approaches, provide a mechanism for aggregating the effect of triangles getting closed by inclusion of an edge. In this work we look beyond these triangles, and hunt for higher order structures in common neighborhood. However, with an idealistic choice of cliques as higher order structures, the problem easily gets out of hand, even for common ego-networks involving 50-100 nodes. In wake of this, we define a dense structure k-support, as an approximation to the clique. Further, for a given k value, we propose a goodness measure to capture the contribution of the common neighborhood w.r.t. k-support, towards the link strength. The proposed goodness measure with different k values is then exploited in learning a supervised model.

We next take a multi-class classification view to the recommendation problem. This is suited for the cases where cardinality of the set of target objects is reasonably small. Classification of entities based on the underlying network structure is an important problem and has been studied extensively in literature. Networks encountered in practice are sparse and have many missing and noisy links. Statistical learning techniques have been used in intra-network classification; however, they typically exploit only the local neighborhood, so may not perform well. We propose a novel structural neighborhood-based classifier learning using a random walk, where we label a node based on how nodes in the respective k^{th}-level neighborhood are labeled. We observe that random walks of short length are helpful in classification, while emphasizing role of longer random walks may cause the underlying Markov chain to converge to a stationary distribution. Considering this, we take a lazy random walk based approach with variable termination probability for each node, based on the node's structural properties including its degree.

Further, we observe that conventional loss functions penalize a node based on a function of its predicted label and target label. Such loss functions under-perform while learning on a network having overlapping classes. In relational setting, even though the ground truth is not known for the unlabeled nodes, some evidence is present in the form of labeling acquired by the nodes in their neighborhood. Considering this, we propose a structural loss function for learning in networks based on the hypothesis that loss is induced when a node fails to acquire a label that is consistent with the labels of the majority of the nodes in its neighborhood. We further combine this with a novel semantic regularizer, which we call homophily regularizer, to capture the smooth transition of discriminatory power and behavior of semantically similar nodes.

Hybrid recommender systems have shown the power of exploiting relationships amongst objects which directly or indirectly effect the recommendation task. However, the effect of all relations is not equal, and choosing their right balance for a recommendation problem at hand is non-trivial. We model these interactions using a Heterogeneous Information Network, and propose a systematic framework for learning their influence weights for a given recommendation task. We address the issue of redundant results, which is very much prevalent in recommender systems. To alleviate redundancy in recommendations we use Vertex Reinforced Random Walk (a non-Markovian random walk) over a heterogeneous graph. It works by boosting the transitions to the influential nodes, while simultaneously shrinking the weights of others. This helps in discouraging recommendation of multiple influential nodes which lie in close proximity of each other, thus ensuring diversity.

Video Archive Go Top

 

Series: Department Seminar
Title: The coincidence based uniformity testing achieves $l_2$-tolerance

  • Speaker: Mr. Sutanu Gayen
                   Ph.D. student
                   Department of Computer Science and Engineering
                   University of Nebraska-Lincoln
  • Date and Time: Tuesday, July 10, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Distribution testing is an area in the intersection of computer science, statistics and information theory. In recent years, there has been a surge in the interest in this area, due to its importance in data analytics. In this talk I will start by defining the central question in this area: Given sample access to an unknown distribution $D$ on $n$ items, decide between the following two cases, 1) does $D$ have a certain property? versus 2) is $D$ far from having the property?

In particular, the focus of this talk is uniformity testing, the question of testing whether the unknown distribution $D$ is uniform or it is far from uniform. Algorithms have been proposed for a number of variations of uniformity testing, which I will mention. Then, I will present a new result of ours which gives new guarantees of a known algorithm for this problem. In particular, we give a tester by slightly modifying the coincidence based tester of Paninski from 2008, and show this new tester achieves the strongest known guarantee for uniformity testing. Interestingly, our analysis is quite simple.

This is a joint work with A. Pavan and N.V. Vinodchandran.

Speaker Bio:
Sutanu Gayen is a fifth year Ph.D. student in the Department of Computer Science and Engineering, University of Nebraska-Lincoln. He is working with Dr. Vinodchandran N. Variyam. He works in the area of streaming algorithms, distribution testing, and probabilistic algorithms in general. His homepage is at https://cse.unl.edu/~sgayen/.

Host Faculty: Dr. Arnab Bhattacharyya

Video Archive Go Top

 

Series: M.Sc. (Engg) Colloquium
Title: IO Pattern Aware Methods to Improve the Performance and Lifetime of NAND SSD

  • Speaker: Mr. Arpith K
                   M.Sc (Engg)Student
                   Dept. of CSA
  • Faculty Advisor: Prof. K Gopinath
  • Date and Time: Tuesday, July 10, 2018, 11:30 AM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Solid State Drives based on NAND flash use transistors to store information persistently. Data is stored as a threshold voltage in each flash cell. Due to its energy efficiency, small form factor and resilience to physical shock, they are the storage medium of choice in a variety of mobile devices. Internal parallelism facilitates SSDs to deliver a significantly higher IO performance when compared to traditional magnetic disks, and are hence used in data centers. Modern flash memories have transistors that allow it to store multiple bits, thus enabling the production of SSDs with higher capacities and a low cost-per-bit. As of 2018, cells with an ability to store three bits are being widely used, with Intel and Micron just announcing even the availability of the first commercial SSD with quad level cells.

However, such high-density SSDs suffer from longer latencies to program and read data, resulting in reduced throughputs, when compared to flash memories that store a single bit per cell. Also, they suffer from reduced reliability. Mechanisms to detect bit errors and prevent data loss add to performance overheads.

In the talk, we will discuss the two proposed system-level solutions, which with the knowledge of IO patterns of the workload can improve the performance and lifetime of NAND based solid state drives. The first work proposes to combine various page types in a wordline to a single logical page called a Melded-Page. This improves the read performance of an SSD by mitigating the overheads involved in the read operation. Using this method, we achieve performance improvements of up to 44% on distributed workloads that use Hadoop Distributed File System (HDFS).

Second, to improve the write performance and lifetime of an SSD, we propose a modified programming scheme called Hot Page Aware Relaxed Program Sequence scheme. Experimental results show an average improvement of 56% in the performance of the SSD when compared to methods that use shadow program sequence scheme. We also observe a reduction in the number of pages backed up by an average of 85%. When compared to methods that use dynamic SLC caching, the proposed scheme can reduce the number of block-erases by an average of 61%.

Speaker Bio:
Arpith is a masters student at the CSA department, working in Computer Architecture and Systems Lab. He is advised by Prof. K. Gopinath.

Video Archive Go Top

 

Series: CSA Faculty Colloquium
Title: Going beyond worst case analysis in the design of approximation algorithms

  • Speaker: Dr. Anand Louis
                   Assistant Professor
                   Dept. of CSA
  • Date and Time: Friday, June 29, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
In the study of algorithms for combinatorial optimization problems, often the best known algorithms provide a somewhat underwhelming performance guarantee, however simple heuristics perform remarkably well in practice. A possible explanation for this phenomenon could be that for many problems, the instances arising in practice tend to have some inherent structure that makes them "easier" than the "worst case instances". Many attempts have been made to understand the structural properties of these instances, and to use them in designing algorithms specifically for such instances, which could perform much better than algorithms for general instances. I will talk about a few vignettes in this direction.

Speaker Bio:
Anand is an Assistant Professor at CSA. He received his Ph.D. in 'Algorithms, Combinatorics, and Optimization' from Georgia Tech in 2014, and held a postdoctoral position at Princeton University before moving to IISc in September 2016. His research interests lie in the theory of algorithms and optimization.

Host Faculty: Prof. Sunil L Chandran & Prof. Shalabh Bhatnagar

Video Archive Go Top

 

Series: Department Seminar
Title: Hardness of learning noisy halfspaces using polynomial thresholds

  • Speaker: Mr. Suprovat Ghoshal
                   ​Ph.D. student
                   Department of CSA
  • Faculty Advisor: Dr. Arnab Bhattacharyya and Dr. Siddharth Barman
  • Date and Time: Wednesday, June 27, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
We prove the hardness of weakly learning halfspaces in the presence of adversarial noise using polynomial threshold functions (PTFs). In particular, we prove that for any ​positive integer d and ​parameter ​ε, it is NP-hard to decide: given a set of {−1,1}-labeled points in R ​^​ n whether (YES Case) there exists a halfspace that classifies (1−ε)-fraction of the points correctly, or (NO Case) any degree-d PTF classifies at most (1/2+ε)-fraction of the points correctly. This strengthens to all constant degrees the previous NP-hardness of learning using degree-2 PTFs shown by Diakonikolas et al. (2011). The latter result had remained the only progress over the works of Feldman et al. (2006) and Guruswami et al. (2006) ruling out weakly proper learning adversarially noisy halfspaces.

​Joint work with Arnab Bhattacharyya and Rishi Saket.

Speaker Bio:
Suprovat Ghoshal is a ​Ph.D. student at the CSA department, co-advised by Arnab Bhattacharyya and Siddharth Barman. *This is a practice talk for COLT 2018.*

Host Faculty: Dr. Arnab Bhattacharyya

Video Archive Go Top

 

Series: Department Seminar
Title: Multiplayer Parallel Repetition for Expanding Games

  • Speaker: Dr. Rakesh Venkat
                   Postdoctoral Fellow
                   Hebrew University of Jerusalem
                   Israel
  • Date and Time: Tuesday, June 26, 2018, 3:00 PM
  • Venue: CSA Multimedia Class (Room No. 252, First Floor)

Abstract
A two-player game is an important construct used in conjunction with the PCP theorem to show hardness of approximating various NP-Hard problems. It consists of two non-communicating players, Alice and Bob, trying to win against a verifier V, who sends them questions drawn from a known distribution D. The goal of Alice and Bob is to come up with strategies to provide answers to these questions that are consistent with a known predicate checked by the verifier. Since they cannot communicate after receiving their respective questions, they might not be able to win the game with probability 1. The maximum probability (over D) of winning such a game is called the value of the game.

Raz first showed that repeating a two-player game in parallel n-times (where n question pairs are drawn independently from D and given to the players) drives down the probability of the players winning *all* the rounds exponentially with n. In contrast to two-player games, very little is known about the parallel-repetition of games with k players for k>=3. The only known upper bound on the value of a general n-repeated, k-player game is due to Verbitsky, which proves a weak Inverse-Ackermann(n) decay. Some special classes of games have been shown to have exponential decay in value w.r.t. n.

In this talk, I will show that a large class of multiplayer games do indeed exhibit an exponential decay in value under parallel repetition. Our result recovers previously known exponential decay theorems for special classes as a corollary. We also point out a simple game not handled by the above class, and conjecture that it is in fact, the hardest case. (Based on joint work with Irit Dinur, Prahladh Harsha and Henry Yuen).

Speaker Bio:
The speaker is a postdoctoral fellow at the Hebrew University of Jerusalem, Israel. He obtained his PhD from TIFR, Mumbai. His research interests are in Approximation Algorithms, Spectral graph Theory and Hardness of Approximation.

Host Faculty: Dr. Anand Louis

Video Archive Go Top

 

Series: Ph.D. Thesis Defense
Title: Algorithms for Multilingual IR in Low Resource Languages using Weakly Aligned Corpora

  • Speaker: Mr. Goutham Tholpadi
                   Ph.D student
                   Dept. of CSA
                   IISc
  • Faculty Advisor: Prof. Chiranjib Bhattacharyya & Prof. Shirish K Shevade
  • Date and Time: Tuesday, June 26, 2018, 2:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
A large section of the world population is multilingual, and a large proportion of the digitized textual content generated today is in non-English languages. As a result, multilingual information retrieval (MLIR) has become a real and pressing need. Most MLIR methods rely on machine translation and other linguistic resources such as dictionaries, taggers, parsers, etc. However, most of these resources are unavailable in low resource languages, e.g. in many Indian languages. For many MLIR problems, an alternative approach that has shown some success is to map all texts to an abstract semantic space of concepts where similarity can be computed uniformly across languages. These concepts are induced using statistical models learned from multilingual "weakly aligned" corpora which are available even in low resource languages. A number of models have been proposed for learning topical correspondence from weakly aligned corpora that are document-aligned. However, in recent times, there is a growing interest in modeling topical correspondence from user generated content (UGC) data from social networks, commenting widgets, etc. A general pattern seen in such data is the presence of a main object (such as an article, post, tweet, image, etc.) and a collection of dependent objects (such as comments, replies, retweets, tags, etc.). However very little work has been done on topical correspondence models for corpora where the dependent objects are multilingual. This task is made more challenging due to the presence of romanization, sparsity, and noise.

Topic models are often used to compute topical affinity between textual units, which is a core task in most MLIR problems. However, the topic models that can be learned from weak aligment are not directly applicable to the computation of all kinds of topical affinities. In particular, the size of the textual units involved in the affinity computation has a significant bearing on the applicability of a model, and on the method of application. In this thesis, we focus on low resource languages and develop a series of hierarchical Bayesian models for capturing topical correspondence in multilingual news-comment corpora. The final model, called Multi-glyphic Correspondence Topic Model (MCTM), captures topical correspondence between comments and the article, between comments and specific sections of the article, among the comments themselves, and between comments and the other articles. The model handles problems of data sparsity, noise and comment romanization and is shown to be a better fit for article-comment data when compared to existing models.

From an application perspective, we consider three different MLIR problems corresponding to different levels with respect to the size of the textual units involved in the topical affinity computation step. The first MLIR problem (at the level of single words) involves inducing translation correspondences in arbitrary language pairs using Wikipedia data. We present an approach for translation induction that leverages Wikipedia corpora in auxiliary languages using the notion of translingual themes. We propose extensions that enable the computation of cross lingual semantic relatedness between words and apply the method to the task of cross-lingual Wikipedia title suggestion. The second MLIR problem (at the level of sentences or paragraphs) involves the detection of topical relevance between specific portions of articles and comments. Using MCTM, we devise methods for topical categorization of comments with respect to the article. In the third MLIR problem (at the level of document collections), the task is to generate keyword summaries for a cluster of documents. We develop an architecture for performing Scatter/Gather on multilingual document collections and a labeling procedure that generates keyword labels on-the-fly to summarize a multilingual document cluster in any language.

Video Archive Go Top

 

Series: Department Seminar
Title: Using Inherent Structures to design Lean 2-layer RBMs

  • Speaker: Mr. Abhishek Bansal
                   Software Engineer
                   IBM India Research Lab, Delhi
  • Date and Time: Monday, June 25, 2018, 4:00 PM
  • Venue: CSA Seminar Hall (Room No. 254, First Floor)

Abstract
Understanding the representational power of Restricted Boltzmann Machines (RBMs) with multiple layers is an ill-understood problem and remains an important area of active research. Motivated from the approach of Inherent Structure formalism (Stillinger & Weber, 1982), extensively used in analysing Spin Glasses, we propose a novel measure called Inherent Structure Capacity (ISC), which characterizes the capacity of a fixed architecture RBM by the expected number of modes of distributions emanating from the RBM with parameters drawn from a prior distribution. Though ISC is intractable, we provide a tractable upper bound. The bound suggests that for a single layer RBM ISC approaches a finite constant and to increase it one needs to add additional layers. Furthermore, we introduce Lean RBMs, which are multi-layer RBMs where each layer can have at-most O(n) units with the number of visible units being n. We show that for every single layer RBM with sigma(n^(2+r)); r>=0, hidden units there exists a two-layered lean RBM with theta(n2) parameters with the same ISC, establishing that 2 layer RBMs can achieve the same representational power as single-layer RBMs but using far fewer number of parameters. To the best of our knowledge, this is the first result which quantitatively establishes the need for layering.

Speaker Bio:
The speaker is a Software Engineer at IBM India Research Lab, Delhi. He completed his BTech from IIT Delhi in Manufacturing Sc & Engg and ME in CSE from IISc Bengaluru. *This is Practice talk for ICML 2018*

Host Faculty: Prof. Chiranjib Bhattacharyya

Video Archive Go Top

 

 

 

 

 

 

 

 

 

 

Copyright: CSA, IISc 2018      Phone: +91-80-22932368          Fax: +91-80-23602911 Travel Blog    Feedback    Credits