Seminars
View all Seminars | Download ICal for this eventOptimizing KV Cache Management for Efficient LLM Inferencing
Series: CSA Faculty Colloquium
Speaker: Prof. Govindarajan Ramaswamy, Professor, Dept. of CSA, IISc,
Date/Time: Jun 05 16:00:00
Location: CSA Auditorium, (Room No. 104, Ground Floor)
Abstract:
KV Cache management contributes significantly to LLM inference time. Increasing sequence lengths, multi-turn conversations, and KV cache pruning methods further exacerbate the time spent in KV cache management. In this talk, we propose two techniques for efficient KV cache management. The first technique, BMC, is a KV cache memory allocation scheme that trades off some compute (and memory space) to avoid excessive allocation and copy overhead, thereby making LLM inferencing faster. We repurpose the redundant space and compute overhead for speculative decoding and further improve the overall inference performance. The second technique, EQUIP, is an equivariant preserving in-place update scheme for token-pruning approaches. EQUIP minimizes the cost of token eviction (typically implemented using shift-and-append or scatter-gather operations) to improve LLM inference performance. EQUIP is being presented in ACL 2026.
Speaker Bio:
Govindarajan received his B.Sc. degree in Mathematics from Madras University in 1981 and B.E. (Electronics and Communication) and Ph.D. (Computer Science) degrees from the Indian Institute of Science, Bangalore in 1984 and 1989 respectively. Since 1995, he has been with the Supercomputer Education and Research Centre and the Department of Computer Science and Automation, Indian Institute of Science, Bangalore. His research interests are in the areas of High Performance Computing, Compilation Techniques, and Computer Architecture. He is a fellow of the Indian National Academy of Engineering.
Host Faculty: Prof. Sumit Kumar Mandal
