Non-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications
Siddhartha V Tambat and Sriram Vajapeyam

IISc-CSA-TR-2001-3
(July 2001)

Available formats: [ps] [ps.gz]


Software distributed shared memory (DSM) platforms on networks of workstations
tolerate large network latencies by employing one of several weak memory
consistency models. Data­race tolerant applications, such as Genetic
Algorithms (GAs), Probabilistic Inference, etc., offer an additional degree of
freedom to tolerate network latency: they do not synchronize shared memory
references, and behave correctly when supplied outdated shared data. However,
these algorithms often have a high communication-to-computation ratio and can
flood the network with messages in the presence of large message delays. We
explore the benefits of designing a DSM with non-strict cache coherence for
such applications. We study the performance of controlled asynchronous
implementations of these algorithms via the use of a previously proposed
blocking Global Read memory access primitive. Global Read implements
non-strict cache coherence by guaranteeing to return to the reader a shared
datum value from within a specified staleness range; synchronization
primitives are thereby avoided. As compared to fully asynchronous
implementations, controlled (i.e. partial) asynchrony, implemented using
Global Read, reduces the overall amount of computation done with stale data by
a process, thus controlling the amount of shared updates (and thereby the
network traffic) generated. Experiments on an IBM SP2 multicomputer with an
Ethernet interconnect show significant performance improvements for controlled
asynchronous implementations. On a lightly loaded network, most of the GA
benchmarks see 30% to 40% improvement over the best competitor across
configurations ranging from 2 to 16 processors, while two of the Probabilistic
Inference benchmarks see more than 80% improvement on a 2­node configuration.
As the network load increases, the benefits of non-strict coherence and
partial asynchrony increase significantly. Overall, non-strict cache coherence
is indicated to be significantly beneficial over both the data-race-free based
weak consistency memory models and fully asynchronous models that have no
guarantees regarding coherence.


Please bookmark this technical report as http://aditya.csa.iisc.ernet.in/TR/2001/3/.

Problems ? Contact techrep@csa.iisc.ernet.in
[Updated at 2009-10-22T06:42Z]