Seminars

View all Seminars  |  Download ICal for this event

Rethinking the IID Assumption for Data-efficient and Robust NLP

Series: Department Seminar

Speaker: Dr. Pradeep Dasigi, Allen Institute for AI

Date/Time: Nov 11 11:00:00

Location: CSA Seminar Hall (Room No. 254, First Floor)

Abstract:
The standard practice for training Machine Learning models is to assume access to independent and identically distributed (IID) labelled data, to optimise for the average loss on a training set and to measure generalisation on a held-out in-distribution test set. While this assumption is grounded in theory, it does not often hold in practice. In this talk I will focus on two issues with the setup that affect NLP systems: 1) Obtaining sufficient high quality labelled training data can be expensive, making it difficult to train models that generalise well to in-distribution test sets; 2) Even when large enough labelled training sets are available, they may come with unwanted correlations between labels and task-independent features. These spurious correlations are a consequence of unavoidable biases in the dataset collection processes, and can provide shortcuts to models that allow them to generalise well to in-distribution test sets that also have those spurious correlations, without actually learning the intended tasks.

To illustrate the first problem, I will first present Qasper, a complex document-level Question Answering task over research papers and an associated dataset collection process that requires expert annotators. To address the problem, I will present a data-efficient training method that leverages data from other tasks where it is easier to obtain labelled data. In contrast with recent work that trained massive multitask models (e.g. T0, FLAN) on tens of millions of instances from all available datasets without any knowledge of the target tasks, our method selects small subsets of multi-task training instances that are relevant to the target tasks, using only unlabelled target task instances. Our method is algorithmically simple, yet quite effective and data-efficient??on Qasper and ten other datasets, the target-task specific models outperform the T0 model of the same size by up to 30% without accessing target task labels (zero-shot), and by up to 23% while accessing a few target-task labels (few-shot), all while using about 2% of the multitask data used to train T0 models.

To address the second problem, as an alternative to Empirical Risk Minimisation (ERM) which optimises for average training loss under the IID assumption, I will present a novel optimisation technique based on Group Distributionally Robust Optimisation (G-DRO) which optimises for worst group performance over a known set of pre-defined groups in the training data. Since spurious correlations are often unknown, directly applying G-DRO to this problem is not feasible. Our method, AGRO, simultaneously identifies error-prone groups in the training data and optimises for the model??s performance on them. We show that on several NLP and Vision tasks, AGRO based models outperform models trained using ERM on known error-prone subsets of in-distribution test data by up to 8%, and also on out-of-distributions test sets by up to 10% without using any knowledge of the distribution shifts.

Speaker Bio:
Dr. Pradeep Dasigi is a researcher in the field of Artificial Intelligence, specializing in Natural Language Processing, currently working at the Allen Institute for AI in Seattle, USA. Prior to joining Allen Institute, he completed his Ph.D. from CMU and Masters from Columbia University. His research interests are in the areas of Natural Language Processing and Machine Learning.

Host Faculty: R Govindarajan