Seminars

View all Seminars  |  Download ICal for this event

What really limits GPU query execution? Analyzing query execution using a learned performance bottleneck model.

Series: M.Tech (Research) Colloquium

Speaker: Chethan Ravindranath, M.Tech (Research) ERP student, Dept. of CSA

Date/Time: Apr 21 10:00:00

Location: CSA Auditorium, (Room No. 104, Ground Floor)

Faculty Advisor: Prof. Govindarajan R

Abstract:
GPUs are increasingly used for analytical query processing due to their massive parallelism and high memory bandwidth. While most existing GPU query engines rely on library-based execution, we argue that compilation is essential to achieve high performance. Further, the relative performance of code generation strategies is largely unexplored. The impact of design decisions such as join implementations and divergence mitigation techniques on end-to-end query performance on GPUs remains poorly understood. Unfortunately, no prior work compares these design points in a scientific manner, making it difficult to isolate the performance impact of individual decisions. Moreover, there is little detailed systematic analysis of the microarchitectural factors that determine query performance on GPUs, providing little guidance on designing high-performance query engines on GPUs.

In this work, we present a comprehensive microarchitectural analysis of compilation-based GPU query execution using a custom MLIR-based query compiler. Our compiler enables controlled exploration of the query execution design space by generating code variants that differ only in the design choices being evaluated. Using this framework, we evaluate TPC-H queries across multiple GPUs and scale factors. We develop a novel machine learning-based bottleneck analysis framework that is trained using hardware performance counters collected from the execution of the generated kernels. The framework identifies the dominant microarchitectural factors determining query performance across the benchmark suite. Our analysis challenges prevailing assumptions about GPU query execution. We show that memory traffic and dynamic instruction count dominate query execution performance, while factors such as warp divergence and DRAM bandwidth utilization, traditionally considered critical, have negligible impact in compiled GPU query execution. These findings provide principled guidance for designing high-performance compiling GPU query engines.

Speaker Bio:
Chethan Ravindranath is an ERP MTech (Research) student in the CSA department. He obtained his BTech in Computer Science from PES University, Bangalore. He has 17 years of experience as a software engineer and is currently a Chief Engineer at National Instruments Bangalore.