Principles of Chaos Engineering

Tuesday, March 14, 2017 - 11:55am12:50pm

Casey Rosenthal, Netflix

Abstract: 

Distributed systems create threats to resilience that are not addressed by classical approaches to development and testing. We’ve passed the point where individual humans can reasonably navigate these systems at scale. As we embrace a world that emphasizes automation and engineering over architecting, we left gaps open in our understanding of complex systems. 

Chaos Engineering is a new discipline within Software Engineering, building confidence in the behavior of distributed systems at scale. SREs and dedicated practitioners adopt Chaos Engineering as a practical tool for improving resiliency. An explicit, empirical approach provides a formal framework for adopting, implementing, and measuring the success of a Chaos Engineering program. Additional best practices define an ideal implementation, establishing the gold standard for this nascent discipline. 

Chaos Engineering isn’t the process of creating chaos, but rather surfacing chaos that is inherent in the behavior of these systems at scale. By focusing on high level business metric, we side step understanding *how* a particular model works in order to identify *whether* it work under realistic, turbulent conditions in production. This fills a gap that arms SREs with a better, holistic understanding of the system’s behavior.

Casey Rosenthal, Netflix

Engineering manager for the Traffic team and the Chaos team at Netflix. Previously an executive manager and senior architect, Casey has managed teams to tackle Big Data, architect solutions to difficult problems, and train others to do the same. He finds opportunities to leverage his experience with distributed systems, artificial intelligence, translating novel algorithms and academia into working models, and selling a vision of the possible to clients and colleagues alike. For fun, Casey models human behavior using personality profiles in Ruby, Erlang, Elixir, Prolog, and Scala.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@conference {201834,
author = {Casey Rosenthal},
title = {Principles of Chaos Engineering},
year = {2017},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video 

Presentation Audio