
Research
My interests are in the design of compilers, programming models, and runtimes for high-performance Artificial Intelligence (AI) systems powered by multicores and accelerators, with an emphasis on automatic parallelization and high performance. Computational domains of particular interest to me include stencil computations, image processing pipelines, dense linear algebra, and deep learning.
Research Tools / Software
- MLIR
- PLUTO
- PolyMage (not to be confused with PolyMage Labs and PolyBlocks)
Compilers for AI lab - my group's page
Publications
Google scholar profile, DBLP, BibTeX
-
SilvanForge: A Schedule-Guided Retargetable Compiler for Decision Tree Inference
Ashwin Prasad, Sampath Rajendra, Kaushik Rajan, R Govindarajan, and Uday Bondhugula.
ACM Symposium on Operating Systems Principles (SOSP), 2024.
-
HIR: An MLIR-based Intermediate Representation for Hardware
Accelerator Description
Kingshuk Majumder and Uday Bondhugula
ASPLOS 2023.
-
Treebeard: An Optimizing Compiler for Decision Tree-Based ML Inference
Ashwin Prasad, Sampath Rajendra, Kaushik Rajan, R Govindarajan, Uday Bondhugula.
IEEE/ACM International Symposium on Microarchitectures (MICRO), Oct 2022.
-
MLIR-Based Code Generation for GPU Tensor Cores
Navdeep Katel, Vivek Khandelwal, and Uday Bondhugula.
ACM/IEEE International conference on Compiler Construction (CC), Apr 2022. -
A Practical Tile Size Selection Model for Affine Loop
Nests
Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, and Uday Bondhugula.
ACM International Conference on Supercomputing (ICS'21), Jun 2021.
-
MLIR: Scaling Compiler Infrastructure for Domain-Specific Computation
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko.
ACM International Symposium on Code Generation and Optimization (CGO), 12 pages, Feb 2021.
-
Effective Loop Fusion in Polyhedral Compilation using Fusion Conflict Graphs
Aravind Acharya, Uday Bondhugula, Albert Cohen.
ACM Transactions on Architecture and Code Optimization (TACO), vol 17, issue 4, article no. 26, Sep 2020.
-
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems
Karan Aggarwal and Uday Bondhugula
ACM Transactions on Parallel Computing, vol 7, number 4, article no. 22, Nov 2020.
- An Effective Fusion and Tile Size Model for PolyMage
Abhinav Jangda and Uday Bondhugula
ACM Transactions on Programming Languages and Systems (TOPLAS), vol 42, issue 3, article no. 12, 27 pages, Nov 2020.
- Bitwidth
Customization in Image Processing Pipelines using Interval Analysis and SMT
Solvers
Suresh Purini, Vinamra Benara, Ziaul Chowdhury, Uday Bondhugula.
ACM SIGPLAN International Conference on Compiler Construction (CC), pages 167-178, Feb 2020.
-
Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems
Karan Aggarwal, Uday Bondhugula
International Conference on Supercomputing (ICS), Jun 2019.
-
Polyhedral Auto-Transformation with No Integer Linear Programming
Aravind Acharya, Uday Bondhugula, Albert Cohen
ACM SIGPLAN PLDI, Jun 2018.
-
An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
Abhinav Jangda, Uday Bondhugula
ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2018.
Artifact evaluated (reusable and available).
-
Optimizing Geometric Multigrid Method Computation using a DSL Approach [slides, benchmarks]
Vinay Vasista, Kumudha KN, Siddharth Bhat, Uday Bondhugula
Supercomputing (SC), Nov 2017.
-
Diamond Tiling: Tiling Techniques to Maximize
Parallelism for Stencil Computations
Uday Bondhugula, Vinayaka Bandishti, Irshad Pananilath
IEEE Transactions on Parallel and Distributed Systems (TPDS), pg 1285-1298, Vol 28, Issue 5, May 2017.
(extended version of SC'12 paper)
-
A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
Nitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula
IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2016), Sep 2016.
-
Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory
Roshan Dathathri, Ravi Teja Mullapudi, Uday Bondhugula
ACM Transactions on Parallel Computing (TOPC), vol 3, issue 2, Jul 2016.
-
SMO: An Integrated Approach to Intra-Array and Inter-Array
Storage Optimization [PDF, Tool]
Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen
ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Jan 2016.
-
Automatic Storage Optimization for Arrays [Tool]
Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen
ACM Transactions on Programming Languages and Systems (TOPLAS), vol 38, issue 3, Apr 2016.
Selected for presentation at ACM SIGPLAN PLDI'16, Jun 2016.
-
The Pluto+ Algorithm: A Practical Approach for Parallelization
and
Locality Optimization of Affine Loop Nests [PDF]
Uday Bondhugula, Aravind Acharya, Albert Cohen
ACM Transactions on Programming Languages and Systems (TOPLAS), vol 38, issue 3, Apr 2016.
-
An Optimizing Code Generator for a Class of Lattice-Boltzmann
Computations [PDF, slides]
Irshad Pananilath, Aravind Acharya, Vinay Vasista, Uday Bondhugula
ACM Transactions on Architecture and Code Optimization (TACO), volume 12, issue 2, article 14, July 2015.
-
PolyMage: Automatic Optimization for Image
Processing Pipelines [PDF]
Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
ASPLOS '15: International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 2015.
-
PLUTO+: Near-Complete Modeling of Affine Transformations for
Parallelism and Locality [PDF]
Aravind Acharya, Uday Bondhugula
ACM SIGPLAN symposium on Principle and Practice of Parallel Programming (PPoPP), Feb 2015.
-
Tiling and Optimizing Time-Iterated Computations over
Periodic Domains [PDF, slides, code]
Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache
IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2014), Aug 2014.
Nominated for the best paper award.
-
Effective automatic computation placement and data allocation for parallelization of regular programsChandan Reddy, Uday Bondhugula
ICS '14 Proceedings of the 28th ACM international conference on Supercomputing, Jun 2014.
-
Automatic data allocation and buffer management for multi-GPU machinesThejas Ramashekar, Uday Bondhugula
ACM Transactions on Architecture and Code Optimization (TACO), Vol 10, No. 4, Article 60, Dec 2013.
-
Compiling Affine Loop Nests for Distributed-Memory Parallel
Architectures [PDF, tool, slides]
Uday Bondhugula
ACM/IEEE Supercomputing (SC '13), Nov 2013, Denver, USA.
- Generating Efficient Data Movement Code for Heterogeneous
Architectures with Distributed-Memory [PDF, Tool ]
Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula
International conference on Parallel Architectures and Compilation Techniques (PACT 2013), Sep 2013, Edinburgh, UK.
- PolyGLoT: A Polyhedral Loop Transformation Framework for
a Graphical Dataflow Language [PDF ]
Somashekar Bhaskaracharya, Uday Bondhugula
International conference on Compiler Construction (CC 2013), Mar 2013, Rome, Italy. -
Tiling Stencil
Computations to Maximize Parallelism [PDF, code,
tool]
Vinayak Bandishti, Irshad Pananilath, and Uday Bondhugula
ACM/IEEE Supercomputing, Nov 2012, Utah, USA.
Note: please refer to the journal extension instead of this.
-
Loop Transformations:
Convexity, Pruning, and Optimization [PDF]
Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, P Sadayappan, and Nicolas Vasilache
ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), Jan 2011, Austin, USA.
-
Combined Iterative and Model-driven Optimization in an
Automatic Parallelization Framework
Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, and P Sadayappan
Supercomputing (SC), 2010, New Orleans, USA.
- A Model for Fusion and Code Motion in an Integrated
Auto-Parallelizing Compiler
[PDF]
Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and L. Renganarayana
International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep 2010, Vienna, Austria.
- Compact Multi-dimensional Kernel Extraction for Register
Tiling
L. Renganarayana, Uday Bondhugula, Salem Derisavi, Alexandre E. Eichenberger, and Kevin O'Brien
Supercomputing (SC), 2009, Portland, USA.
- Compiler-Assisted Dynamic Scheduling for Effective
Parallelization of Loop Nests on Multicore Processors
[PDF]
M. Baskaran, N. Vydyanathan, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
ACM SIGPLAN Symposium on Principle and Practice of Parallel Programming (PPoPP), Feb 2009, Raleigh, North Carolina.
- Data Layout Transformation for Enhancing Locality on NUCA Chip
Multiprocessors
Qingda Lu, Christophe Alias, Uday Bondhugula, Thomas Henretty, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan, Yongjian Chen, Haibo Lin, and Tin-fook Ngai.
International Conference on Parallel Architectures and Compilation Techniques (PACT), 2009, Raleigh, USA.
-
A Practical Automatic Polyhedral Parallelizer and Locality
Optimizer
[PDF]
Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan.
ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Jun 2008, Tucson, Arizona, USA.
ACM SIGPLAN Most Influential Paper Award in 2018.
-
Automatic Transformations for Communication-Minimized
Parallelization and Locality Optimization in the Polyhedral
Model
[PDF]
Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
International Conference on Compiler Construction (ETAPS CC), Apr 2008, Budapest, Hungary.
-
A Compiler Framework for Optimization of Affine Loop Nests for
GPGPUs [PDF]
Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
ACM International Conference on Supercomputing (ICS), Jun 2008, Island of Kos, Greece.
-
Automatic Data Movement and Computation Mapping for
Multi-level Parallel Architectures with Explicitly
Managed Memories.
Muthu Baskaran, Uday Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2008, Salt Lake City, Utah.
-
Effective Automatic Parallelization of Stencil
Computations
[PDF]
S. Krishnamoorthy, M. Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
ACM SIGPLAN Programming Language Design and Implementation (PLDI), Jun 2007, San Diego, California.
-
Automatic Mapping of
Nested Loops to FPGAs
[PDF]
Uday Bondhugula, J. Ramanujam, and P. Sadayappan.
ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Mar 2007, San Jose, California.
-
Hardware/Software Integration for FPGA-based All-Pairs
Shortest-Paths
[
PDF]
Uday Bondhugula, A. Devulapalli, James Dinan, J. Fernando, Pete Wyckoff, E. Stahlberg, and P. Sadayappan.
IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '06), Apr 2006, Napa Valley, California.
-
Parallel FPGA-based All-Pairs Shortest-Paths in a Directed
Graph
[PDF | talk | HDL code]
Uday Bondhugula, Ananth Devulapalli, Joseph Fernando, Pete Wyckoff, and P. Sadayappan.
20th IEEE International Parallel & Distributed Processing Symposium (IPDPS '06), Apr 2006, Rodos, Greece.
-
High
Performance RDMA-based All-to-all Broadcast for InfiniBand
Clusters
[PDF]
S. Sur, Uday Bondhugula, A. Mamidala, H.-W. Jin, and D. K. Panda.
12th IEEE International Conference on High Performance Computing (HiPC '05), Dec 2005, Bangalore, India.
Research Reports
- MLIR: A Compiler Infrastructure
for the End of Moore's Law
Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko.
arXiv:2002.11054, Feb 2020.
-
High Performance Code Generation in MLIR: An Early Case Study with GEMM
Uday Bondhugula
arXiv preprint arXiv:2003.00532, Mar 2020.
- Automatic Intra-Array Storage Optimization [PDF]
Somashekaracharya G Bhaskaracharya, Uday Bondhugula, Albert Cohen
IISc-CSA-TR-2014-3, Nov 2014.
-
Handling Negative Coefficients in Automatic Transformation
Schedules
Uday Bondhugula, Albert Cohen
Technical report, IISc-CSA-TR-1, Feb 2014.
Superseded by the Pluto+ paper at PPoPP'15 listed above.
- Automatic
Distributed Memory Code Generation using the Polyhedral
Framework
Uday Bondhugula
IISc Research Report, IISc-CSA-TR-2011-3.
-
Can CPUs Match GPUs on Performance with Productivity?: Experiences with
Optimizing a FLOP-intensive Application on CPUs and GPU
Rajesh Bordawekar, Uday Bondhugula, Ravi Rao
IBM Research Report RC25033, IBM T.J. Watson Research Center, Yorktown Heights, New York, Aug 2010.
-
Believe it or Not! Multicore CPUs can Match GPUs for
FLOP-intensive Applications!
Rajesh Bordawekar, Uday Bondhugula, Ravi Rao
IBM Research Report RC24982, IBM TJ Watson Research Center, Yorktown Heights, New York, Apr 2010.
-
PLUTO: A Practical and Fully Automatic Polyhedral
Program Optimization System
Uday Bondhugula, J. Ramanujam, and P. Sadayappan.
OSU Research Report OSU-CISRC-10/07-TR70, Oct 2007.
-
Affine transformations for communication minimal
parallelization and locality optimization of arbitrarily
nested loop sequences
Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
OSU Research Report OSU-CISRC-5/07-TR43, May 2007.
Ph.D. thesis
Effective Automatic
Parallelization and Locality Optimization using the Polyhedral
Model [PDF]
Ph.D. thesis, Defended Aug 4th, 2008, The Ohio State
University, USA.