Home » Event

Seminars

View all Seminars | Download ICal for this event

Linear Time Streaming Algorithms for Heavy Tailed Statistics

Series: Bangalore Theory Seminars

Speaker: Aniket Das, Google Research

Date/Time: Aug 09 11:00:00

Location: Online Talk (See Teams link below)

Abstract:
In this talk, I shall discuss the problem of high-dimensional statistical estimation under heavy tailed data distributions (i.e. data distributions for which only lower order moments are guaranteed to be finite). This problem finds applications in a wide variety of domains including the natural sciences, finance, machine learning and operations research.

While recent advances in algorithmic statistics has led to the development of efficient polynomial-time algorithms for this problem via the Sum of Squares proofs-to-algorithms paradigm, these algorithms are either: 1) primarily focused on heavy tailed mean estimation, or 2) incur a time and space complexity that is at least quadratic (and often a high degree polynomial) in the dimension d and inverse confidence level δ-1, thereby making them infeasible for deployment in most practical settings.

I shall present our recent work which aims to analyze the fine-grained information-computation-memory tradeoffs associated with this problem by considering the task of heavy tailed statistical estimation in the streaming setting (i.e. with an O(d) memory constraint). To this end, we develop linear time streaming algorithms that (nearly) attain the optimal statistical rate for a wide variety of high dimensional estimation tasks such as Mean Estimation, Linear Regression, LAD Regression, Classification and Generalized Linear Models. In particular, by recasting the heavy tailed statistical estimation problem as Stochastic Convex Optimization with a heavy tailed oracle, we present a unified analysis which proves that the widely deployed Clipped Stochastic Gradient Descent Algorithm (nearly) attains the optimal statistical rate in linear time and space.

A key component of our analysis is a novel and highly generalizable iterative refinement technique for establishing fine-grained concentration of vector-valued martingales via PAC Bayes Theory, which could be of independent interest for proving sharp concentration guarantees for more general stochastic processes.

Joint work with Dheeraj Nagaraj, Arun Suggala, Prateek Varshney and Soumyabrata Pal

Microsoft teams link:

Link

We are grateful to the Kirani family for generously supporting the theory seminar series

Hosts: Rameesh Paul, KVN Sreenivas, Rahul Madhavan, Debajyoti Kar

Department of Computer Science and Automation

Seminars

Linear Time Streaming Algorithms for Heavy Tailed Statistics

Explore

Quick Links

Resources

Seminars Calendar