Seminars

View all Seminars  |  Download ICal for this event

Optimizing KV Cache Management for Efficient LLM Inferencing

Series: CSA Faculty Colloquium

Speaker: Prof. Govindarajan Ramaswamy, Professor, Dept. of CSA, IISc,

Date/Time: Jun 05 16:00:00

Location: CSA Auditorium, (Room No. 104, Ground Floor)

Abstract:
KV Cache management contributes significantly to LLM inference time.   Increasing sequence lengths, multi-turn conversations, and KV cache pruning methods further exacerbate the time spent in KV cache management.  In this talk, we propose two techniques for efficient KV cache management.  The first technique, BMC, is  a KV cache memory allocation scheme  that  trades off some compute (and memory space) to avoid excessive allocation and copy overhead,  thereby making LLM inferencing faster.  We repurpose the redundant space and compute overhead for speculative decoding and further improve the overall inference performance.  The second  technique, EQUIP, is an equivariant preserving in-place update scheme  for token-pruning approaches. EQUIP minimizes the cost of token eviction (typically implemented using shift-and-append or scatter-gather operations) to improve LLM inference performance.  EQUIP is being presented in ACL 2026.

Speaker Bio:
Govindarajan received his B.Sc. degree in Mathematics from Madras University in 1981 and B.E. (Electronics and Communication) and Ph.D. (Computer Science) degrees from the Indian Institute of Science, Bangalore in 1984 and 1989 respectively. Since 1995, he has been with the Supercomputer Education and Research Centre and the Department of Computer Science and Automation, Indian Institute of Science, Bangalore. His research interests are in the areas of High Performance Computing, Compilation Techniques, and Computer Architecture. He is a fellow of the Indian National Academy of Engineering.

Host Faculty: Prof. Sumit Kumar Mandal