From Chaos to Clarity: Deciphering Cache Inconsistencies in a Distributed Environment

Wednesday, March 20, 2024 - 1:55 pm2:15 pm

Akashdeep Goel and Prudhviraj Karumanchi, Netflix Inc

Abstract: 

In this session, we'll discuss a distributed caching system used at Netflix for streaming, live, games, Ads etc in multiple regions on a public cloud. There are various components that make the system highly resilient - control plane, replication engine, proxy, cache warmer etc. In this session, we will touch up on replication engine which processes around 30 million requests per second and manages to keep the response times for 95% of these requests under 2 seconds across the regions. We present and deep dive into a problem that almost put us at the risk of delaying the global launch of a business critical initiative. We will walk you through the entire process from the start to finish, the debugging journey, our takeaways and how these debugging techniques are generally applicable to any organization. The talk will focus on how and when problems get introduced, how can a simple assumption break the entire stack and how can these be caught sooner.

Akashdeep Goel, Netflix Inc

Akashdeep Goel is a Senior Software Engineer at Netflix working on distributed systems handling large scaling caching deployments for both streaming and gaming workloads across Netflix. Prior to this, Akashdeep was working on a distributed control plane at Azure CosmosDB (Microsoft) delivering standby and failover infrastructure. Outside of work, he enjoys road trips, playing snooker and exploring different cuisines.

Prudhviraj Karumanchi, Netflix Inc

Prudhviraj Karumanchi is a Staff Software Engineer in Data Platform@Netflix building large-scale distributed storage systems and cloud services. Prudhvi is currently leading the Caching infrastructure at Netflix. Prior to Netflix, Prudhvi worked at large enterprises such as Oracle, NetApp and EMC/Dell building infrastructure for cloud, and contributed to File, Block and Object storage systems.

BibTeX
@conference {295071,
author = {Akashdeep Goel and Prudhviraj Karumanchi},
title = {From Chaos to Clarity: Deciphering Cache Inconsistencies in a Distributed Environment},
year = {2024},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video