BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//project/author//NONSGML v1.0//EN
CALSCALE:GREGORIAN
BEGIN:VEVENT
DTEND:20210111T120000Z
UID:6470f8e2ae0677626eb553b7b4540d43-112
DTSTAMP:19700101T120008Z
DESCRIPTION:Data Science at Scale: Scaling Up by Scaling Down and Out (to Disk)
URL;VALUE=URI:https://www.csa.iisc.ac.in/newweb/event/112/data-science-at-scale-scaling-up-by-scaling-down-and-out-to-disk/
SUMMARY:The standard solution to scaling applications to massive data isÂ scale-out, i.e., use more computers or RAM. This talk presents my work on complementary techniques: scaling down, i.e., shrinking data to fit in RAM, and scaling to disk, i.e., organizing data on disk so that the application can stillÂ run fast. I will describe new compact and I/O-efficient data structures andÂ their applications in stream processing, computational biology, and storage.
&lt;br&gt;
Concretely, I show how to bridge the gap between the worlds of external memoryÂ and stream processing to perform scalable and precise real-time event-detectionÂ on massive streams. I show how to shrink genomic and transcriptomic indexes by a factor of two while accelerating queries by an order of magnitude compared to the state-of-the-art tools. I show how to improve file-system random-writeÂ performance by an order of magnitude without sacrificing sequential read/writeÂ performance.
&lt;br&gt;
Teams Meeting Link:&lt;br&gt; &lt;a href=&quot;https://teams.microsoft.com/l/meetup-join/19%3ameeting_YmExNmNmZjMtODM1Zi00MDUxLWFkNmEtNjdmYThkZWIxNjkx%40thread.v2/0?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%224bcd3d56-e405-4b06-99fb-27742262f261%22%7d&quot;&gt;https://teams.microsoft.com/l/meetup-join/19%3ameeting_YmExNmNmZjMtODM1Zi00MDUxLWFkNmEtNjdmYThkZWIxNjkx%40thread.v2/0?context=%7b%22Tid%22%3a%226f15cd97-f6a7-41e3-b2c5-ad4193976476%22%2c%22Oid%22%3a%224bcd3d56-e405-4b06-99fb-27742262f261%22%7d&lt;/a&gt;
DTSTART:20210111T120000Z
END:VEVENT
END:VCALENDAR