Future Research

So far, I have focused towards constructing optimal communication strategy for DNN accelerator as well as 'high-accuracy-yet-low-overhead' analytical model for industrial communication architecture. In course of my research, I have gained significant experience on designing real systems. Moreover, I was able to sharpen my analytical skill through the research on performance modeling. As machine learning algorithms are increasing in size as well as becoming more diverse to cater to the need for complex applications, the need to balance computation and communication also is increasing day-by-day. To tackle this problem efficiently, my future research will focus on the following directions.

Energy-efficient architecture for emerging ML models

With increasing complexity of applications, diverse machine learning techniques are also emerging. Examples of such emerging machine learning techniques are graph neural networks (GNN) and attention networks. Most of the current research are limited to efficient software implementation of these machine learning algorithms. However, to make these algorithms more practical energy-efficient hardware architectures are also needed. Since the structures of the emerging machine learning models are completely different from the popular multi-layer perceptrons (MLP) and convolutional neural networks (CNN), significant effort is needed to construct hardware architectures for the emerging machine learning algorithms. I plan to utilize my experience of constructing energy-efficient hardware architectures of MLP and CNN to construct novel energy- and area-efficient hardware architectures for the emerging machine learning models.

Energy-efficient communication for 2.5D/3D-integrated DNN accelerator

Intra- and inter-chip communication requirements continue to grow rapidly as the applications, computing resources, and storage capacities scale up. For example, our recent analysis shows that communication accounts for 70% of energy consumption when a deep neural network (DNN) inference is executed on in-memory computing (IMC) platforms. Moving to higher levels in the architecture hierarchy, the communication overhead between CPU and GPU clusters or chiplets can undermine the benefits of computing and memory technology innovations. Indeed, the Decadal Plan for Semiconductors lists communication as one of the five seismic shifts and states that a global storage-communication cross-over is expected to happen around 2022 with a tremendous impact on Information and Communication Technologies (https://www.src.org/about/decadal-plan/, Seismic shift 3, page 12). I intend to construct a multi-scale reconfigurable communication architecture that integrates (1) heterogeneous cores and memory on a chip, and (2) multiple chiplets in a package. The proposed hierarchical and broadly usable architecture will enable both data-intensive and latency-sensitive applications, including DNN training/inference and execution of graph processing algorithms. On the one hand, different applications have widely varying and demanding throughput (Tbps) and latency (ns) requirements. On the other hand, the target domains (e.g., autonomous driving, data centers) put power, energy, and fabrication cost constraints. Hence, the proposed architecture will meet each application’s custom requirements without violating the platform power and fabrication cost constraints.

Power management for 2.5D/3D designs

2.5D/3D designs have so far proven to be energy efficient as well as (fabrication) cost effective with respect to monolithic ICs. Emerging 2.5D/3D systems are also heterogeneous i.e. different types and sizes of processing units are integrated into the system. Different units may require different levels of voltage and frequency to provide superior efficiency in terms of latency, power consumption etc. Moreover, the inter-chiplet communication (i.e. NoC) and intra-chiplet (i.e. NoP) communication for 2.5D systems can execute at different configurations. Therefore, the system must choose the optimal combination of voltage and frequency for each processing unit at runtime. With increasing heterogeneity, the search space (of voltage and frequency) also increases. Therefore, setting the optimal configuration for all processing elements as well as the networks (NoC and NoP) is challenging at runtime. I intend to tackle this challenging problem by leverage my prior experience with analytical modeling of NoCs and power management on heterogeneous mobile processors.