Sharding: Growing Systems from Node-scale to Planet-scale

Monday, March 18, 2024 - 11:50 am12:35 pm

Adam Mckaig, Stripe

Abstract: 

Sharding is an important part of scaling systems. Most start life without it, rightly preferring the simplicity of the monolith on the single-node database. But sooner or later, most systems need to be split up: the database is too big, the workload is too diverse, the risk (and consequences) of total outages is too dire. But when, what, and how? Sharding can increase cost, latency, and — most perniciously — complexity, so the trade-offs must be considered carefully.

This talk aims to provide SREs with a map to embark on this journey, showing the problems commonly encountered as systems grow in various dimensions, and the sharding patterns which can address each. We then present a highly opinionated Golden Path outlining how these patterns can be combined into the default route from node-scale to planet-scale, along with some traps and anti-patterns to avoid.

Adam Mckaig, Stripe

Adam is a staff engineer at Stripe, where he works on a petabyte-scale document database targeting five nines of availability. Previously he worked on datastores at Datadog and Google, and on other backend systems at the NYT, Bloomberg, and the UN.

BibTeX
@conference {295007,
author = {Adam Mckaig},
title = {Sharding: Growing Systems from Node-scale to Planet-scale},
year = {2024},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video