BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//project/author//NONSGML v1.0//EN
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTEND:20220211T120000Z
UID:ba2bd34db3a7dd8433f002579de1b0bc-246
DTSTAMP:19700101T120011Z
DESCRIPTION:Improving Reliability and Performance of Datacenter Systems via Coherence
URL;VALUE=URI:https://www.csa.iisc.ac.in/newweb/event/246/improving-reliability-and-performance-of-datacenter-systems-via-coherence/
SUMMARY:Reliability and performance are key metrics for modern datacenter machines.
Co-designing for these introduces delicate trade-off decisions for system
architects. In this talk I present 2 works, where we aim to improve both
reliability and performance of modern shared memory hardware in the datacenter
by designing tailored coherence protocols.
&lt;br&gt;
In the first work, we aim to combat increased memory system failure rates. We
propose DvÃ©, a hardware-driven replication mechanism where data blocks are
replicated in 2 different sockets across a cache-coherent NUMA system. Such an
organization has the advantage of offering two independent points of access
to data which enables: (a) strong error correction that can recover from a range
of faults affecting any of the components in the memory, upto and including the
memory controller, and (b) higher performance by providing another nearer point
of memory access. DvÃ© realizes both of these benefits via Coherent Replication,
a technique that builds on top of existing cache coherence protocols.
Coherent Replication keeps the replicas in sync for reliability and provides coherent
access to the replicas during fault-free operation for performance.
DvÃ© introduces a unique design point that offers higher reliability and
performance flexibly on-demand.
&lt;br&gt;
In the second work, we propose to improve reliability and performance of
function-as-a-service (FaaS) deployments. The FaaS model allows applications to
be decomposed into a workflow of stand-alone functions which are instantiated
and executed on-demand in the cloud. The stateless nature of this model forces
functions to store/retrieve data from a remote object store, thereby adding
latency. Our work Bolt, uses all-hardware memory disaggregation to
build an object store for FaaS applications. Bolt builds on top of the latest
cache-coherent attachment technologies for off-chip memory peripherals like GenZ, 
CXL or NVLink2 to enable an all-hardware solution. It adds an object granularityÂ 
caching mechanism to cache objects in hardware caches at compute nodes, hence 
improving performance of FaaS functions. Bolt then adds an inter-node cache 
coherence mechanism that ensures the data in the compute node caches is consistent. &lt;br&gt;
Boltâ€™s coherence ensures reliable operation in such a loosely coupled system by 
providing an asynchronous, non-blocking protocol which ensures forward progress 
during partial system failures.
&lt;br&gt;
Teams Meeting Link: &lt;br&gt; &lt;a href=&quot;https://tinyurl.com/AdarshPatilsTalk&quot;&gt;https://tinyurl.com/AdarshPatilsTalk&lt;/a&gt;
DTSTART:20220211T120000Z
END:VEVENT
END:VCALENDAR