CSA logo

Research


My interests are in the design of programming languages/models, compilers, and runtimes for multicores, distributed-memory clusters, and accelerators, with an emphasis on automatic parallelization and high performance. Computational domains of particular interest to me include stencil computations, image processing pipelines, dense linear algebra, and deep learning.

Research Tools / Software

Multicore Computing Lab - my group's page

Publications

Google scholar profile, DBLP, BibTeX

  1. A Practical Tile Size Selection Model for Affine Loop Nests
    Kumudha Narasimhan, Aravind Acharya, Abhinav Baid, and Uday Bondhugula.
    ACM International Conference on Supercomputing (ICS'21), Jun 2021.

  2. MLIR: Scaling Compiler Infrastructure for Domain-Specific Computation
    Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko.
    ACM International Symposium on Code Generation and Optimization (CGO), 12 pages, Feb 2021.

  3. Effective Loop Fusion in Polyhedral Compilation using Fusion Conflict Graphs
    Aravind Acharya, Uday Bondhugula, Albert Cohen.
    ACM Transactions on Architecture and Code Optimization (TACO), vol 17, issue 4, article no. 26, Sep 2020.

  4. Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems
    Karan Aggarwal and Uday Bondhugula
    ACM Transactions on Parallel Computing, vol 7, number 4, article no. 22, Nov 2020.

  5. An Effective Fusion and Tile Size Model for PolyMage
    Abhinav Jangda and Uday Bondhugula
    ACM Transactions on Programming Languages and Systems (TOPLAS), vol 42, issue 3, article no. 12, 27 pages, Nov 2020.

  6. Bitwidth Customization in Image Processing Pipelines using Interval Analysis and SMT Solvers
    Suresh Purini, Vinamra Benara, Ziaul Chowdhury, Uday Bondhugula.
    ACM SIGPLAN International Conference on Compiler Construction (CC), pages 167-178, Feb 2020.

  7. Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems
    Karan Aggarwal, Uday Bondhugula
    International Conference on Supercomputing (ICS), Jun 2019.

  8. Polyhedral Auto-Transformation with No Integer Linear Programming
    Aravind Acharya, Uday Bondhugula, Albert Cohen
    ACM SIGPLAN PLDI, Jun 2018.

  9. An Effective Fusion and Tile Size Model for Optimizing Image Processing Pipelines
    Abhinav Jangda, Uday Bondhugula
    ACM SIGPLAN symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2018.
    Artifact evaluated (reusable and available).

  10. Optimizing Geometric Multigrid Method Computation using a DSL Approach [slides, benchmarks]
    Vinay Vasista, Kumudha KN, Siddharth Bhat, Uday Bondhugula
    Supercomputing (SC), Nov 2017.

  11. Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
    Uday Bondhugula, Vinayaka Bandishti, Irshad Pananilath
    IEEE Transactions on Parallel and Distributed Systems (TPDS), pg 1285-1298, Vol 28, Issue 5, May 2017.
    (extended version of SC'12 paper)


  12. A DSL Compiler for Accelerating Image Processing Pipelines on FPGAs
    Nitin Chugh, Vinay Vasista, Suresh Purini, Uday Bondhugula
    IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2016), Sep 2016.

  13. Compiling Affine Loop Nests for a Dynamic Scheduling Runtime on Shared and Distributed Memory
    Roshan Dathathri, Ravi Teja Mullapudi, Uday Bondhugula
    ACM Transactions on Parallel Computing (TOPC), vol 3, issue 2, Jul 2016.

  14. SMO: An Integrated Approach to Intra-Array and Inter-Array Storage Optimization [PDF, Tool]
    Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen
    ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Jan 2016.

  15. Automatic Storage Optimization for Arrays [Tool]
    Somashekaracharya Bhaskaracharya, Uday Bondhugula, Albert Cohen
    ACM Transactions on Programming Languages and Systems (TOPLAS), vol 38, issue 3, Apr 2016.
    Selected for presentation at ACM SIGPLAN PLDI'16, Jun 2016.

  16. The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests [PDF]
    Uday Bondhugula, Aravind Acharya, Albert Cohen
    ACM Transactions on Programming Languages and Systems (TOPLAS), vol 38, issue 3, Apr 2016.

  17. An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations [PDF, slides]
    Irshad Pananilath, Aravind Acharya, Vinay Vasista, Uday Bondhugula
    ACM Transactions on Architecture and Code Optimization (TACO), volume 12, issue 2, article 14, July 2015.

  18. PolyMage: Automatic Optimization for Image Processing Pipelines [PDF]
    Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
    ASPLOS '15: International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 2015

  19. PLUTO+: Near-Complete Modeling of Affine Transformations for Parallelism and Locality [PDF]
    Aravind Acharya, Uday Bondhugula
    ACM SIGPLAN symposium on Principle and Practice of Parallel Programming (PPoPP), Feb 2015.

  20. Tiling and Optimizing Time-Iterated Computations over Periodic Domains [PDF, slides, code]
    Uday Bondhugula, Vinayaka Bandishti, Albert Cohen, Guillain Potron, Nicolas Vasilache
    IEEE International conference on Parallel Architectures and Compilation Techniques (PACT 2014), Aug 2014.
    Nominated for the best paper award.

  21. Effective automatic computation placement and data allocation for parallelization of regular programs
    Chandan Reddy, Uday Bondhugula
    ICS '14 Proceedings of the 28th ACM international conference on Supercomputing, Jun 2014.

  22. Automatic data allocation and buffer management for multi-GPU machines
    Thejas Ramashekar, Uday Bondhugula
    ACM Transactions on Architecture and Code Optimization (TACO), Vol 10, No. 4, Article 60, Dec 2013.

  23. Compiling Affine Loop Nests for Distributed-Memory Parallel Architectures [PDF, tool, slides]
    Uday Bondhugula
    ACM/IEEE Supercomputing (SC '13), Nov 2013, Denver, USA.

  24. Generating Efficient Data Movement Code for Heterogeneous Architectures with Distributed-Memory [PDF, Tool ]
    Roshan Dathathri, Chandan Reddy, Thejas Ramashekar, Uday Bondhugula
    International conference on Parallel Architectures and Compilation Techniques (PACT 2013), Sep 2013, Edinburgh, UK.

  25. PolyGLoT: A Polyhedral Loop Transformation Framework for a Graphical Dataflow Language [PDF ]
    Somashekar Bhaskaracharya, Uday Bondhugula
    International conference on Compiler Construction (CC 2013), Mar 2013, Rome, Italy.

  26. Tiling Stencil Computations to Maximize Parallelism [PDF, code, tool]
    Vinayak Bandishti, Irshad Pananilath, and Uday Bondhugula
    ACM/IEEE Supercomputing, Nov 2012, Utah, USA.
    Note: please refer to the journal extension instead of this.

  27. Loop Transformations: Convexity, Pruning, and Optimization [PDF]
    Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, P Sadayappan, and Nicolas Vasilache
    ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), Jan 2011, Austin, USA.

  28. Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
    Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J Ramanujam, and P Sadayappan
    Supercomputing (SC), 2010, New Orleans, USA.

  29. A Model for Fusion and Code Motion in an Integrated Auto-Parallelizing Compiler [PDF]
    Uday Bondhugula, Oktay Gunluk, Sanjeeb Dash, and L. Renganarayana
    International Conference on Parallel Architectures and Compilation Techniques (PACT), Sep 2010, Vienna, Austria.

  30. Compact Multi-dimensional Kernel Extraction for Register Tiling
    L. Renganarayana, Uday Bondhugula, Salem Derisavi, Alexandre E. Eichenberger, and Kevin O'Brien
    Supercomputing (SC), 2009, Portland, USA.

  31. Compiler-Assisted Dynamic Scheduling for Effective Parallelization of Loop Nests on Multicore Processors [PDF]
    M. Baskaran, N. Vydyanathan, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM SIGPLAN Symposium on Principle and Practice of Parallel Programming (PPoPP), Feb 2009, Raleigh, North Carolina.

  32. Data Layout Transformation for Enhancing Locality on NUCA Chip Multiprocessors
    Qingda Lu, Christophe Alias, Uday Bondhugula, Thomas Henretty, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan, Yongjian Chen, Haibo Lin, and Tin-fook Ngai.
    International Conference on Parallel Architectures and Compilation Techniques (PACT), 2009, Raleigh, USA.

  33. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer [PDF]
    Uday Bondhugula, A. Hartono, J. Ramanujan, P. Sadayappan.
    ACM SIGPLAN Programming Languages Design and Implementation (PLDI), Jun 2008, Tucson, Arizona, USA.
    ACM SIGPLAN Most Influential Paper Award in 2018.

  34. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model [PDF]
    Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
    International Conference on Compiler Construction (ETAPS CC), Apr 2008, Budapest, Hungary.

  35. A Compiler Framework for Optimization of Affine Loop Nests for GPGPUs [PDF]
    Muthu Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM International Conference on Supercomputing (ICS), Jun 2008, Island of Kos, Greece.

  36. Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories.
    Muthu Baskaran, Uday Bondhugula, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb 2008, Salt Lake City, Utah.

  37. Effective Automatic Parallelization of Stencil Computations [PDF]
    S. Krishnamoorthy, M. Baskaran, Uday Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan.
    ACM SIGPLAN Programming Language Design and Implementation (PLDI), Jun 2007, San Diego, California.

  38. Automatic Mapping of Nested Loops to FPGAs [PDF]
    Uday Bondhugula, J. Ramanujam, and P. Sadayappan.
    ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Mar 2007, San Jose, California.

  39. Hardware/Software Integration for FPGA-based All-Pairs Shortest-Paths [ PDF]
    Uday Bondhugula, A. Devulapalli, James Dinan, J. Fernando, Pete Wyckoff, E. Stahlberg, and P. Sadayappan.
    IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '06), Apr 2006, Napa Valley, California.

  40. Parallel FPGA-based All-Pairs Shortest-Paths in a Directed Graph [PDF | talk | HDL code]
    Uday Bondhugula, Ananth Devulapalli, Joseph Fernando, Pete Wyckoff, and P. Sadayappan.
    20th IEEE International Parallel & Distributed Processing Symposium (IPDPS '06), Apr 2006, Rodos, Greece.

  41. High Performance RDMA-based All-to-all Broadcast for InfiniBand Clusters [PDF]
    S. Sur, Uday Bondhugula, A. Mamidala, H.-W. Jin, and D. K. Panda.
    12th IEEE International Conference on High Performance Computing (HiPC '05), Dec 2005, Bangalore, India.

Research Reports

  1. MLIR: A Compiler Infrastructure for the End of Moore's Law
    Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko.
    arXiv:2002.11054, Feb 2020.

  2. High Performance Code Generation in MLIR: An Early Case Study with GEMM
    Uday Bondhugula
    arXiv preprint arXiv:2003.00532, Mar 2020.

  3. Automatic Intra-Array Storage Optimization [PDF]
    Somashekaracharya G Bhaskaracharya, Uday Bondhugula, Albert Cohen
    IISc-CSA-TR-2014-3, Nov 2014.

  4. Handling Negative Coefficients in Automatic Transformation Schedules
    Uday Bondhugula, Albert Cohen
    Technical report, IISc-CSA-TR-1, Feb 2014.
    Superseded by the Pluto+ paper at PPoPP'15 listed above.

  5. Automatic Distributed Memory Code Generation using the Polyhedral Framework
    Uday Bondhugula
    IISc Research Report, IISc-CSA-TR-2011-3.

  6. Can CPUs Match GPUs on Performance with Productivity?: Experiences with Optimizing a FLOP-intensive Application on CPUs and GPU
    Rajesh Bordawekar, Uday Bondhugula, Ravi Rao
    IBM Research Report RC25033, IBM T.J. Watson Research Center, Yorktown Heights, New York, Aug 2010.

  7. Believe it or Not! Multicore CPUs can Match GPUs for FLOP-intensive Applications!
    Rajesh Bordawekar, Uday Bondhugula, Ravi Rao
    IBM Research Report RC24982, IBM TJ Watson Research Center, Yorktown Heights, New York, Apr 2010.

  8. PLUTO: A Practical and Fully Automatic Polyhedral Program Optimization System
    Uday Bondhugula, J. Ramanujam, and P. Sadayappan.
    OSU Research Report OSU-CISRC-10/07-TR70, Oct 2007.

  9. Affine transformations for communication minimal parallelization and locality optimization of arbitrarily nested loop sequences
    Uday Bondhugula, M. Baskaran, S. Krishnamoorthy, J. Ramanujam, A. Rountev, and P. Sadayappan.
    OSU Research Report OSU-CISRC-5/07-TR43, May 2007.

Ph.D. thesis

Effective Automatic Parallelization and Locality Optimization using the Polyhedral Model [PDF]
Ph.D. thesis, Defended Aug 4th, 2008, The Ohio State University, USA.