Seminars

View all Seminars  |  Download ICal for this event

TREEBEARD: A Schedule-Guided Retargetable Compiler for Decision Tree Inference

Series: Ph.D. Colloquium

Speaker: Ashwin Prasad, CSA, IISc

Date/Time: Jul 14 10:00:00

Location: CSA Lecture Hall Room No. 112 (Ground Floor)

Faculty Advisor: R Govindarajan and Uday Kumar Reddy Bondhugula

Abstract:
The proliferation of machine learning, together with the rapid evolution of the hardware ecosystem, has led to a surge in the demand for model inference on a variety of hardware. Decision tree based models are the most popular models on tabular data. Our work is motivated by the problems encountered when targeting inference of these models to run at peak performance on CPU and GPU targets. Decision tree ensemble inference is usually performed with libraries such as XGBoost, LightGBM, and Sklearn on CPUs and RAPIDs FIL on the GPU. These libraries incorporate a fixed set of optimizations for the hardware targets they support. However, these solutions are neither portable nor achieve the best possible performance for the specific hardware they target. This is because they do not specialize the inference code to the model being used and the specific target processor, leaving significant performance on the table.

To address this problem, we designed and built Treebeard, a schedule-guided, retargetable compiler for decision tree based models that searches over several optimization choices and automatically generates high-performance inference routines for CPUs and GPUs. Treebeard has two core components. The first is a scheduling language that encapsulates the optimization space, and techniques to efficiently explore this space. The second is an optimizing retargetable compiler that can generate code for any specified schedule. by lowering the inference computation to optimized CPU or GPU code through multiple intermediate abstractions. By applying model-specific optimizations at the higher levels, tree walk optimizations at the middle level, and machine-specific optimizations lower down, Treebeard can specialize inference code for each model on each supported target. Treebeard combines several novel optimizations at various abstraction levels, uses different data layouts, loop structures and caching strategies to mitigate architectural bottlenecks and achieve portable performance across a range of targets.

We implement Treebeard using the MLIR compiler infrastructure and demonstrate its utility by evaluating it on a diverse set of benchmarks. Treebeard is significantly faster on the CPU than state-of-the-art systems, XGBoost, Treelite and Hummingbird, by 2.6x, 4.7x and 5.4x respectively in a single-core execution setting, and by 2.3x, 2.7x and 14x respectively in multi-core settings. On the GPU, Treebeard generated code is an order of magnitude faster than XGBoost and about 2-5x faster on average than RAPIDS FIL and Tahoe over several batch sizes. While the other systems only target NVIDIA GPUs, Treebeard achieves competent performance on AMD GPUs as well.

Speaker Bio:
Ashwin Prasad is an ERP Ph.D. student in the CSA dept. He obtained his M.Sc[Engg] degree from IISc and B.Tech from RVCE, Bangalore. He has more than 15 years of experience as a software engineer and is currently a Chief Engineer at National Instruments Bangalore.

Host Faculty: R Govindarajan