Efficient Compilation of Stream Programs for Heterogeneous Architectures: A Model-Checking based approach

Rajesh Kumar Thakur, Y.N.Srikant

Abstract :

Stream programming based on the synchronous data flow (SDF) model naturally exposes data, task and pipeline parallelism. Statically scheduling stream programs for homogeneous architectures has been an area of extensive research. With graphic processing units (GPUs) now emerging as general purpose co-processors, scheduling and distribution of these stream programs onto heterogeneous architectures (having both GPUs and CPUs) provides for challenging research. Exploiting this abundant parallelism in hardware, and providing a scalable solution is a hard problem. In this paper we describe a coarse-grained software pipelined scheduling algorithm for stream programs which statically schedules a stream graph onto heterogeneous architectures. We formulate the problem of partitioning the work between the CPU cores and the GPU as a model-checking problem. The partitioning process takes into account the costs of the required buffer layout transformations associated with the partitioning and the distribution of the stream graph. The solution trace result from the model checking provides a map for the distribution of actors across different processors/cores. This solution is then divided into stages, and then a coarse grained software-pipelined code is generated. We use CUDA streams to map these programs synergistically onto the CPU and GPUs. We use a performance model for data transfers to determine the optimal number of CUDA streams on GPUs. Our software-pipelined schedule yields a speedup of upto 55.86X and a geometric mean speedup of 9.62X over a single threaded CPU.