Defcon: Preventing Overload with Graceful Feature Degradation

Authors: 

Justin J. Meza, Thote Gowda, Ahmed Eid, Tomiwa Ijaware, Dmitry Chernyshev, Yi Yu, Md Nazim Uddin, Rohan Das, Chad Nachiappan, Sari Tran, Shuyang Shi, Tina Luo, David Ke Hong, Sankaralingam Panneerselvam, Hans Ragas, Svetlin Manavski, Weidong Wang, and Francois Richard, Meta Platforms, Inc.

Abstract: 

Every day, billions of people depend on Internet services for communication, commerce, and entertainment. Yet planetary-scale data center infrastructures consisting of millions of servers experience unplanned capacity outages and unexpected demand for resources; how can such infrastructures remain reliable in the face of capacity and workload flux?

In this paper, we introduce Defcon, a system for improving the availability of large-scale, globally-distributed Internet services using graceful feature degradation. In response to overload conditions, Defcon enables site operators to gradually disable less-critical features in order to reduce resource demand. Defcon presents a common interface to product developers to define feature knobs that represent degradation capabilities. Defcon automatically tests knobs to understand each knob’s product- and infrastructure-level trade-offs. At Meta, we have used Defcon to improve global product availability in the face of worldwide demand-surges in addition to large-scale infrastructure failures.

OSDI '23 Open Access Sponsored by
King Abdullah University of Science and Technology (KAUST)

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {288596,
author = {Justin J. Meza and Thote Gowda and Ahmed Eid and Tomiwa Ijaware and Dmitry Chernyshev and Yi Yu and Md Nazim Uddin and Rohan Das and Chad Nachiappan and Sari Tran and Shuyang Shi and Tina Luo and David Ke Hong and Sankaralingam Panneerselvam and Hans Ragas and Svetlin Manavski and Weidong Wang and Francois Richard},
title = {Defcon: Preventing Overload with Graceful Feature Degradation},
booktitle = {17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)},
year = {2023},
isbn = {978-1-939133-34-2},
address = {Boston, MA},
pages = {607--622},
url = {https://www.usenix.org/conference/osdi23/presentation/meza},
publisher = {USENIX Association},
month = jul
}

Presentation Video