A Graph Matching Based Integrated Scheduling Framework for Clustered VLIW Processors
Rahul Nagpal and Y.N. Srikant

IISc-CSA-TR-2004-13
(October 2004)

Available formats: [pdf]

Filed on October 18, 2004


Clustered architecture processors are preferred for consumer electronic
devices because centralized register file architectures scale poorly in
terms of clock rate, chip area and power consumption. Scheduling for
clustered architectures involves spatial concerns (where to schedule) as
well as temporal concerns (when to schedule) and various clustered VLIW
configurations, connectivity types and inter-cluster communication models
present different performance trade-offs to a scheduler. The scheduler
is responsible for resolving the conflicting requirements of exploiting
the parallelism offered by the hardware and limiting the communication among
clusters to reduce the code size without stretching the overall schedule.

This paper proposes a generic graph matching based framework that resolves
the phase-ordering and fixed-ordering problems associated with scheduling
on a clustered VLIW processor by simultaneously considering various
scheduling alternatives of instructions. The framework provides a mechanism
to exploit the slack of instructions by dynamically varying the cost of
scheduling an instruction using different alternatives to reduce the code
size and inter-cluster communication without stretching the overall schedule.
A better estimate of instruction slack is determined by first scheduling on
a unclustered base VLIW. We observe approximately 16% and 28% improvement
in the performance over an earlier integrated scheme and a phase-decoupled
scheme respectively as well as reduction in code size. We evaluate its
effectiveness in improving the runtime performance of the code without
code size penalty.


Please bookmark this technical report as http://aditya.csa.iisc.ernet.in/TR/2004/13/.

Problems ? Contact techrep@csa.iisc.ernet.in
[Updated at 2009-10-22T06:42Z]