Seminars
View all Seminars | Download ICal for this eventLayers as Lenses: Feature Learning via Positive Feedback in Gated Deep Networks
Series: Department Seminar
Speaker: Dr. Harish Ramaswamy, Assistant Professor, Department of Data Science and AI, Indian Institute of Technology Madras
Date/Time: Jun 29 16:00:00
Location: CSA Auditorium, (Room No. 104, Ground Floor)
Abstract:
End-to-end gradient-based training of multi-layer architectures has been immensely successful across machine learning tasks, yet principled explanations for why, when, and how it discovers task-relevant structure in data remain incomplete. Existing viewpoints based on representational power, input-space folding, optimization conditioning, the neural tangent kernel, and hierarchical feature learning offer useful insights, but leave important aspects of end-to-end feature learning unresolved. We propose a new narrative for gated deep networks trained by gradient descent. From the viewpoint of any given layer, the other layers act as a lens that assigns importance weights to the input, facilitating the emergence of features without global scope that might otherwise be overlooked. Gradient methods then update each layer to fit data that has been importance-weighted by a lens formed by the remaining layers. In this view, individual neuron parameters serve a dual role: separator for data lensed by other layers, and component of the lens for other layers. We hypothesize that the resulting positive feedback is a key mechanism underlying feature learning in gated deep networks. We support this narrative with theoretical analysis and empirical results on the recently proposed Deep Linearly Gated Network (DLGN), an architecture that combines elements of deep linear and ReLU networks. In particular, our analysis provides predictive control over gradient-descent trajectories from initialization in the DLGN setting, and leads naturally to a notion of hierarchical feature discovery in the feature-learning regime. We also present evidence suggesting that aspects of this narrative may extend to ReLU networks.
Speaker Bio:
Dr. Harish Ramaswamy is an assistant professor at the department of Data Science and AI in IIT Madras. He was previously a research scientist at IBM research labs, a post-doc at University of Michigan, and a PhD scholar at the Computer Science and Automation department of the Indian Institute of Science (IISc), Bangalore. His broad research areas are machine learning, statistical learning theory and optimisation. He has played a key role in building the theory of consistent convex surrogate minimising algorithms in supervised learning. More recently, he has been interested in the area of deep learning theory.
Host Faculty: Prof. Chiranjib Bhattacharyya
