Seminars

View all Seminars  |  Download ICal for this event

Fairness, Alignment & Audit: A Triptych for Responsible Modern AI

Series: M.Tech (Research) Colloquium

Speaker: Nirjhar Das, M.Tech (Research) student, Dept. of CSA, IISc

Date/Time: Apr 07 14:00:00

Location: CSA Auditorium, (Room No. 104, Ground Floor)

Faculty Advisor: Prof. Siddharth Barman

Abstract:
In this thesis, we investigate three different building blocks of modern artificial intelligence (AI), namely---nearest neighbor search, large language models, and classification in algorithmic decision-making---through the lens of responsible AI. As AI models get deployed in all aspects of society, fairness, alignment, and audit become crucial ingredients for designing responsible AI and for regulating it. Through this thesis, we study principled formulations and algorithmic questions in these areas: fairness or diversity in nearest neighbor search, alignment of large-language models, and auditing fairness of classifiers in algorithmic decision-making.

In the first part of the thesis, we study fairness or diversity in the context of nearest neighbor search. Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented generations (RAG). In such recent applications, in addition to the relevance (similarity) of the returned neighbors, diversity among the neighbors is a central requirement. In this work, we develop principled welfare-based formulations in NNS for realizing diversity across attributes. Our formulations are based on welfare functions---from mathematical economics---that satisfy central diversity (fairness) and relevance (economic efficiency) axioms. With a particular focus on Nash social welfare, we note that our welfare-based formulations provide objective functions that adaptively balance relevance and diversity in a query-dependent manner. Notably, such a balance was not present in the prior constraint-based approach, which forced a fixed level of diversity and optimized for relevance. We develop efficient nearest neighbor algorithms with provable guarantees for the welfare-based objectives. Experimental results demonstrate that our approach is practical and substantially improves diversity while maintaining high relevance of the retrieved neighbors.

In the second part, we investigate the problem of preference-based alignment of large-language models (LLMs). LLMs aligned using Reinforcement Learning from Human Feedback (RLHF) have shown remarkable generation abilities in numerous tasks. However, collecting high-quality human preferences creates costly bottlenecks in practical deployments, and hence, training data are often budgeted. In these scenarios, it is crucial to collect training data (e.g., contexts, a pair of generations for each context, and a preference indicating which generation is better) carefully, yet most of the existing methods sample contexts uniformly at random from a given collection. Given this, under the Bradley-Terry-Luce preference model and with a small budget of training data, we show that uniform sampling of contexts could lead to a policy (i.e., an aligned model) that suffers a constant sub-optimality gap from the optimal policy. This highlights the need for an adaptive context sampling strategy for effective alignment under a small sample budget. To address this, we reformulate RLHF within the contextual preference bandit framework, treating generations as actions, and give a nearly complete characterization of the sub-optimality gap in terms of both lower and upper bounds. Finally, we perform experiments on practical datasets to validate our algorithms efficacy over existing methods, establishing it as a sample-efficient and cost-effective solution for LLM alignment.

In the third and final part of the thesis, we study the problem of auditing the fairness of a given classifier under partial feedback, where true labels are available only for positively classified individuals, (e.g., loan repayment outcomes are observed only for approved applicants). We introduce a novel cost model for acquiring additional labeled data, designed to more accurately reflect real-world costs such as credit assessment, loan processing, and potential defaults. Our goal is to find optimal fairness audit algorithms that are more cost-effective than random exploration and natural baselines. In our work, we consider two audit settings: a black-box model with no assumptions on the data distribution, and a mixture model, where features and true labels follow a mixture of exponential family distributions. In the black-box setting, we propose a near-optimal auditing algorithm under mild assumptions and show that a natural baseline can be strictly suboptimal. In the mixture model setting, we design a novel algorithm that achieves significantly lower audit cost than the black-box case. Empirically, we demonstrate strong performance of our algorithms on real-world fair classification datasets like Adult Income and Law School, consistently outperforming baselines by around 50% in audit cost.