SeminarsView all Seminars | Download ICal for this event
Scalable and Effective Polyhedral Auto-transformation without using Integer Linear Programming
Series: Ph.D. Colloquium
Speaker: Mr. Aravind Acharya Ph.D Student Dept. of CSA
Date/Time: Jan 08 14:00:00
Location: CSA Seminar Hall (Room No. 254, First Floor)
Faculty Advisor: Prof. Uday Kumar Reddy .B
In recent years, polyhedral auto-transformation frameworks have gained significant interest in general-purpose compilation, because of their ability to find and compose complex loop transformations that extract high performance from modern architectures. These frameworks automatically find loop transformations that either enhance locality, parallelism and minimize latency or a combination of these. Recently, focus has also shifting on developing intermediate representations, like MLIR, where complex loop transformations and data-layout optimizations can be incorporated efficiently in a single common infrastructure. Polyhedral auto-transformation frameworks typically rely on complex Integer Linear Programming (ILP) formulations to find affine loop transformations. However, construction and solving these ILP problems is time consuming which increases compilation time significantly. Secondly, loop fusion heuristics in these auto-transformation frameworks are ad hoc, and modeling loop fusion efficiently would further degrade compilation time. In this thesis, we first relax the ILP formulation in the Pluto algorithm. We show that even though LP relaxation reduces the time complexity of the problem, it does not reduce the compilation time significantly because of the complex construction of constraints. We also observe that due to relaxation, sub-optimal loop transformations that result in significant performance degradation may be obtained. Hence, we propose a new polyhedral auto-transformation framework, called Pluto-lp-dfp, that finds efficient affine loop transformations quickly,while relying on Pluto's cost function. The framework decouples auto-transformation into three components: (1) loop fusion and permutation (2) loop scaling and shifting and (3) loop skewing components. In each phase, we solve a Linear Programming (LP) formulation instead of an ILP, thereby resulting in a polynomial time affine transformation algorithm. We propose a data structure, called fusion conflict graph, that allows us to model loop fusion to work in tandem with loop permutations, loop scaling and loop shifting transformations. We describe three greedy fusion heuristics, namely,max-fuse, typed-fuse and hybrid-fuse, of which, the hybrid-fuse and typed-fuse models incorporate parallelism preserving fusion heuristic without significant compilation time overhead. We also provide a characterization of time-iterated stencils that have tile-wise concurrent start and employ a different fusion heuristic in such programs. In our experiments, we demonstrate that Pluto-lp-dfp framework not only finds loop transformations quickly, resulting in significant improvements in compilation time, but also outperforms state-of-the-art polyhedral auto-parallelizers in terms of execution time of the transformed program. We observe that Pluto-lp-dfp is faster than PoCC and Pluto by a geomean factor of 461x and 2.2x in terms of compilation time. On larger NAS benchmarks, Pluto-lp-dfp was faster than Pluto by 246x. PoCC failed to find a transformation in a reasonable amount of time in these cases. In terms of execution time, the hybrid-fuse variant in Pluto-lp-dfp outperforms PoCC by a geomean factor 1.8x, with over 3x improvements in some cases. We also observe that Pluto-lp-dfp is faster than an improved version of Pluto by a factor of 7%, with a maximum performance improvement of 2x.