Scalable Working Set Estimation Method For Chip Multicores Using Tagged Bloom Filter And Its Application

Aparna Mandke, Bharadwaj Amrutur, Y.N.Srikant

Abstract: In chip multicore platforms (CMPs), leakage power consumption of large on-chip caches has already become a major power consuming component of the memory subsystem. Leakage power can be saved by switching off over-allocated ways in associative cache. However, the state-of-the-art heuristics such as average memory latency or cache miss rate fail to achieve near optimal energy savings. This is either due to dispersed nature of large caches or they are not fast enough to respond to changes in working set size (WSS), especially in case of over-provisioning of cache. Hence, we first propose a new kind of bloom filter, which we call it as a ``tagged bloom filter (TBF)''. We implement TBF implicitly in last level cache on a scalable tiled chip multicore platform. TBF is then used to estimate WSS of an application and switch-off over-allocated cache ways in Static and Dynamic Nonuniform Cache Architecture (SNUCA, DNUCA) accordingly. In our implementation of adaptable way SNUCA and DNUCA caches, associativity decision is taken locally by each L2 controller, making it scalable with the number of tiles present on CMP. It gives average of 22% and 23% more EDP savings than average memory latency and cache miss rate heuristics on SNUCA, respectively.