Home » Event

Seminars

View all Seminars | Download ICal for this event

From Model Improvements to Human Outcomes: Evaluation as Infrastructure for Accountable and Inclusive AI

Series: Department Seminar

Speaker: Dr. Sunayana Sitaram

Date/Time: Mar 05 10:00:00

Location: CSA Auditorium, (Room No. 104, Ground Floor)

Abstract:
As AI systems are rapidly deployed across domains and geographies,
evaluation has become one of the most consequential yet under-examined
components of the AI pipeline. What we measure shapes what models
optimize for, and ultimately, who they work for. In this talk, I argue
that evaluation is not merely a diagnostic tool for improving
accuracy, but a form of infrastructure that determines accountability,
representation, and equity in AI systems. Without rigorous and
inclusive evaluation, advances in AI risk deepening existing
linguistic, cultural, and socio-economic divides. I will first outline
key challenges in evaluation, including coverage, representativeness
and rigor. I will then present my work on multilingual and
community-grounded evaluation, including large-scale benchmarking
efforts and data creation pipelines designed to reflect real-world
information needs. I will discuss recent research on LLM-as-judge
methods, examining when automated evaluation aligns with human
judgment and where it fails, particularly in multilingual and
culturally nuanced settings. Finally, I will introduce Samiksha, a
large-scale evaluation across 11 Indian languages built through
community participation, and situate it within a broader research
agenda that connects model evaluation to user experience and
downstream impact.

Speaker Bio:
Dr. Sunayana Sitaram is a Principal Researcher at Microsoft Research India, where she works on speech and natural language processing with a focus on making AI systems more inclusive and multilingual. At MSRI, she collaborates with and leads interdisciplinary teams that span NLP, machine learning, linguistics, HCI, and social science. Her research spans evaluation and benchmarking of large language models, multilingual speech and NLP, code-switching, and responsible AI. Her work has involved participatory approaches to evaluation, data collection and policy to ensure that AI models and systems reflect the preferences of users from diverse regions and cultures. She completed her MS and PhD in Language and Information Technology at the Language Technologies Institute, Carnegie Mellon University. She received her B.Tech from NIT Surat.

Host Faculty: R Govindarajan

Department of Computer Science and Automation

Seminars

From Model Improvements to Human Outcomes: Evaluation as Infrastructure for Accountable and Inclusive AI

Explore

Quick Links

Resources

Seminars Calendar