Seminars
View all Seminars | Download ICal for this eventScalable and Effective Polyhedral Auto-transformation Without using Integer Linear Programming
Series: Ph.D (Engg.) Thesis Defence - ON-LINE
Speaker: Mr. Aravind Acharya Ph.D Student Dept. of CSA IISc
Date/Time: Aug 28 09:00:00
Location: On Google Meet
Faculty Advisor: Prof. Uday Kumar Reddy B
Abstract:
In recent years, polyhedral auto-transformation frameworks have gained
significant interest in general-purpose compilation, because of their ability
to find and compose complex loop transformations that extract high performance
from modern architectures. These frameworks automatically find loop
transformations that either enhance locality, parallelism and minimize latency
or a combination of these. Recently, focus has also shifting on developing
intermediate representations, like MLIR, where complex loop transformations and
data-layout optimizations can be incorporated efficiently in a single common
infrastructure.
Polyhedral auto-transformation frameworks typically rely on complex Integer
Linear Programming (ILP) formulations to find affine loop transformations.
However, construction and solving these ILP problems is time consuming which
increases compilation time significantly. Secondly, loop fusion heuristics in
these auto-transformation frameworks are ad hoc, and modeling loop fusion
efficiently would further degrade compilation time.
In this thesis, we first relax the ILP formulation in the Pluto algorithm. We
show that even though LP relaxation reduces the time complexity of the problem,
it does not reduce the compilation time significantly because of the complex
construction of constraints. We also observe that due to relaxation,
sub-optimal loop transformations that result in significant performance degradation
may be obtained. Hence, we propose a new polyhedral auto-transformation
framework, called Pluto-lp-dfp, that finds efficient affine loop
transformations quickly, while relying on Plutos cost function. The framework
decouples auto-transformation into three components: (1) loop fusion and
permutation (2) loop scaling and shifting and (3) loop skewing components.
In each phase, we solve a Linear Programming (LP) formulation instead of an ILP,
thereby resulting in a polynomial time affine transformation algorithm. We
propose a data structure, called fusion conflict graph, that allows us to model
loop fusion to work in tandem with loop permutations, loop scaling and loop
shifting transformations. We describe three greedy fusion heuristics,
namely, max-fuse, typed-fuse and hybrid-fuse, of which, the hybrid-fuse and
typed-fuse models incorporate parallelism preserving fusion heuristic without
significant compilation time overhead. We also provide a characterization of
time-iterated stencils that have tile-wise concurrent start and employ a
different fusion heuristic in such programs. In our experiments, we demonstrate
that Pluto-lp-dfp framework not only finds loop transformations quickly,
resulting in significant improvements in compilation time, but also outperforms
state-of-the-art polyhedral auto-parallelizers in terms of execution time of
the transformed program. We observe that Pluto-lp-dfp is faster than PoCC and
Pluto by a geomean factor of 461x and 2.2x in terms of compilation time. On
larger NAS benchmarks, Pluto-lp-dfp was faster than Pluto by 246x. PoCC failed
to find a transformation in a reasonable amount of time in these cases. In
terms of execution time, the hybrid-fuse variant in Pluto-lp-dfp outperforms
PoCC by a geomean factor 1.8x, with over 3x improvements in some cases. We
also observe that Pluto-lp-dfp is faster than an improved version of Pluto by a
factor of 7%, with a maximum performance improvement of 2x.
Link to join online on Google Meet:
https://meet.google.com/ioe-tjuw-ijo
Please note that the meeting will be recorded as per Institute
requirements. By joining the link you are giving your consent to the
recording.
Speaker Bio:
Aravind Acharya submitted his PhD thesis under the supervision of Prof. Uday Kumar Reddy B at the Dept of CSA. He obtained his Masters degree from the same department. Prior to that, he obtained his BE in Computer Science from Sri Jayachamarajendra College of Engineering (SJCE), Mysore. He is current a post-doctoral researcher at NVIDIA, Redmond, USA. His research interests are in the development of optimizing compliers and domain-specific languages for high performance.
Host Faculty: