View all Seminars  |  Download ICal for this event

Algorithms for Fair Clustering

Series: M.Tech (Research) Colloquium- ON-LINE

Speaker: Ms. Swati Allabadi M.Tech (Research) Dept. of CSA

Date/Time: Jul 28 15:30:00

Location: Microsoft Teams - ON-LINE

Faculty Advisor: Prof. Anand Louis and Prof. Arindam Khan

Many decisions today are taken by various machine learning algorithms, hence it is crucial to accommodate fairness in such algorithms to remove/reduce any kind of bias in the decision. We incorporate fairness in the problem of clustering. Clustering is a classical machine learning problem in which the task is to partition the data points into various groups such that the data points belonging to one group are more similar to each other than the data points belonging to some other group in the partition.
In our model, each data point belongs to one or more number of categories. We define fairness in terms of two constraints, restricted dominance and minority protection. While ensuring fairness in the clustering, we consider each data point in only of the categories from the set of categories it belongs to. Our model ensures that no category is either in minority or in dominance in any of the clusters. Representation of a category in a cluster is considered not in absolute terms but in proportion to its presence in the whole dataset. We give bi-criteria approximation for fair clustering whose objective is to minimise $L_p$-norm where $p$ can take any integral value. We implement this algorithm and do experiments to compare it with the state-of-the-art. For any $epsilon >0$, we give a randomized algorithm with approximation ratio of $(1 + epsilon)$ for fair clustering for points lying in Euclidean space whose objective is to minimise $L_1$-norm (or $L_2$-norm). For points lying in $mathbb{R}^d$, the run time of this algorithm for $L_2$-norm is $Oleft(nd cdot 2^{tilde{O}(k/epsilon)}right) + poly(n) cdot 2^{tilde{O}(k/epsilon)}$, where $n$ represents the size of the dataset. For $L_1$-norm, the run time of this algorithm is $Oleft(nd cdot 2^{tilde{O}(k/epsilon^{O(1)})}right) + poly(n) cdot 2^{tilde{O}(k/epsilon^{O(1)})} $. Given a $gamma$-perturbation resilient instance of clustering in the metric space $(V,d)$, we also give a bi-criteria approximation for the fair clustering of the same instance while changing its metric to $d $. Here, $d $ is any metric which is a $gamma$-perturbation of $(V,d)$.
Microsoft teams link:

Speaker Bio:

Host Faculty: