Seminars
View all Seminars | Download ICal for this eventPreference Learning from Human Response TimeÂ
Series: Bangalore Theory Seminars
Speaker: Ayush Sawarni Stanford University
Date/Time: Jun 27 11:00:00
Location: CSA Auditorium, (Room No. 104, Ground Floor)
Abstract:
In this talk, I will present our recent work on enhancing human preference learning by utilizing response-time information??an implicit signal that reveals the strength of a users choice. Under the EZ diffusion model, we model learning the reward function from binary preferences and response times with a Neyman-orthogonal loss function. While using the standard preference learning approaches results in exponential scaling of errors (with reward magnitude), incorporating response time with our approach reduces this to a polynomial scaling. We provide finite-sample guarantees in rich nonparametric function spaces (including RKHS). We validate our theory across three experimental settings- synthetic linear rewards, random neural-network rewards, and a semi-synthetic text-to-image preference task -- and observe improved sample efficiency. I will conclude by discussing some techniques from semi-parametric statistics used in our proofs, which may be broadly applicable beyond preference learning. Paper - Link
Microsoft teams link:
Link
We are grateful to the Kirani family (Link and the Walmart Center for Tech Excellence (Link for generously supporting this seminar series
Hosts: Rameesh Paul, KVN Sreenivas, Rahul Madhavan, Debajyoti Kar, Nirjhar Das