Defence at the Boundary of Acceptable Performance

Tuesday, March 19, 2024 - 4:45 pm5:30 pm

Andrew Hatch, LinkedIn

Abstract: 

In the 1990s, Jens Rasmussen (Danish system safety, human factors, and cognitive systems engineering researcher) published "Risk Management in a Dynamic Society." The Dynamic Safety Model was a key element of this, used to illustrate how socio-technical organizations cope with pressure from competing economic, workload, and performance forces. We will use this model to demonstrate how it can represent forces acting within large technology organizations, continually pushing the point of operations closer to the Boundary of Acceptable Performance and, as we approach or cross it, how our lives as SREs become negatively impacted.

We will unpack the ruthless nature of forces protecting economic boundaries, manifesting as layoffs and budget cuts. How pushing people to exhaustion at the workload boundary decreases system safety and, ultimately, profitability. Lastly, we will examine how this model forms the underlying theory behind "chaos engineering" to detect and reinforce risk boundaries through feedback loops to build more resilient systems.

Andrew Hatch, LinkedIn

I have worked in the technology industry for over 25 years, predominantly in Australia, with time spent in India and, for the last three years, in the USA. My experience ranges from small to large-scale projects in multiple roles and industries spanning software engineering, consulting, and operations. In 2020, I migrated to the San Francisco Bay Area to take up a role at LinkedIn as an SRE Manager. Before this, I spent six years working at Australia's biggest online jobs and recruitment platform with the critical role of moving the business into AWS and up-leveling their Platform Engineering and Incident Management practices to support this. Since 2013, I have worked primarily in SRE Management roles and, through this experience, developed a passion for learning and adapting to complex systems and helping teams and organizations learn more from incidents to create better software, more resilient systems, and happier, empowered teams. I am a lifelong surfer and can now be found adapting to the crowds at Santa Cruz in California.

BibTeX
@conference {295059,
author = {Andrew Hatch},
title = {Defence at the Boundary of Acceptable Performance },
year = {2024},
address = {San Francisco, CA},
publisher = {USENIX Association},
month = mar
}

Presentation Video