Seminars

View all Seminars  |  Download ICal for this event

MSCCL++: Rethinking GPU Communication Abstractions for AI Inference

Series: Department Seminar

Speaker: Roshan Dathathri, Researcher, Systems Research Group, Microsoft Research Redmond

Date/Time: Feb 06 11:30:00

Location: CSA Auditorium, (Room No. 104, Ground Floor)

Abstract:
AI applications increasingly run on fast-evolving, heterogeneous hardware to maximize performance, but general-purpose libraries lag in supporting these features. Performance-minded programmers often build custom communication stacks that are fast but error-prone and non-portable.

This talk introduces MSCCL++, a design methodology for developing high-performance, portable communication kernels. It provides (1) a low-level, performance-preserving primitive interface that exposes minimal hardware abstractions while hiding the complexities of synchronization and consistency, (2) a higher-level DSL for application developers to implement workload-specific communication algorithms, and (3) a library of efficient algorithms implementing the standard collective API, enabling adoption by users with minimal expertise.

Compared to state-of-the-art baselines, MSCCL++ achieves geomean speedups of 1.7x (up to 5.4x) for collective communication and 1.2x (up to 1.38x) for AI inference workloads. MSCCL++ is in production of multiple AI services provided by Microsoft Azure, and has also been adopted by RCCL, the GPU collective communication library maintained by AMD. MSCCL++ is open source and available at Link Our two years of experience with MSCCL++ suggests that its abstractions are robust, enabling support for new hardware features, such as multimem, within weeks of development.

Speaker Bio:
Roshan Dathathri is a researcher in the Systems Research Group at Microsoft Research Redmond. He received his PhD from the University of Texas at Austin, where he was advised by Dr. Keshav Pingali. His research interests are broadly in the field of programming languages and systems, with an emphasis on optimizing compilers and runtime systems for distributed and heterogeneous architectures. His current focus is on building efficient systems for AI. His past work included building systems for distributed, heterogeneous graph processing and privacy-preserving neural network inferencing. His work has been published in PLDI, ASPLOS, MLSys, VLDB, IPDPS, PPoPP, and other conferences.

Host Faculty: Prof. Uday Kumar Reddy B