***THIS IS A TEST SITE***

SREcon21, October 12–14, 2021, Virtual Event

October 12–14, 2021

SREcon Conversations are short, interactive, online discussions about Site Reliability Engineering, hosted on Zoom. SREcon Conversations maintain and celebrate the values, goals, and culture of SREcon.

Join us for SREcon Conversations Asia/Pacific on September 7–9, 2020, from 9:00–10:30 India Standard Time / 13:30–15:00 Australian Eastern Standard Time. See the featured speakers and register today!

Apply for a diversity grant! See the USENIX Grant Overview page for details.

Registration Fees

Standard Rate USENIX
Member Rate
US$60
per conversation
US$0
for all conversations

USENIX members receive free entry to SREcon Conversations Asia/Pacific. Log in to access your discount code before you register.

USENIX Conference Policies

We encourage you to learn more about USENIX’s values and how we put them into practice at our conferences.

Refunds and Cancellations

We are unable to offer refunds, cancellations, or substitutions for any registrations for this event. Please contact the Conference Department at conference@usenix.org with any questions.

Katherine Lim, Innablr

Monday, September 7, 2020
9:00–10:30 India Standard Time /
13:30–15:00 Australia Eastern Standard Time

Katherine Lim is a Senior Engineer at Innablr. An experienced DevOps Engineer, Katherine has built infrastructure-as-code and worked at the intersection of Development and Operations for the last 20 years.

Building Psychological Safety in SRE teams

In 2015, Google published the results of a two year study into what makes a great team—Project Aristotle. As engineers, we should all improve psychological safety in our teams. Google and others have produced tools to guide teams to understand and improve their effectiveness. While the studies show that the techniques are effective, there are scenarios where I think the guidelines could be harmful. This talk will cover psychological safety and its relevance for SRE teams, what happens in unsafe teams, why to improve it and how people leaders can help in creating safety. Along with discussing the advantages and pitfalls of each guideline, a framework will be proposed for building psychological safety in diverse SRE teams.

Koon Seng Lim, DBS

Tuesday, September 8, 2020
9:00–10:30 India Standard Time /
13:30–15:00 Australia Eastern Standard Time

Koon Seng Lim received his Honours in Information Systems & Computer Science from the National University of Singapore and his Masters in Electrical Engineering from Columbia University in New York. He joined his first start up, Xbind, Inc., in New York in 1998 and subsequently founded and worked for 3 other start ups in the US before returning to Singapore in 2011. He currently heads the Site Reliability Engineering team for the middle office function in DBS and oversees all non-functional aspects of development and operations. Koon Seng is an active coder and enjoys espousing the virtues of reliability engineering and the quantitative approach to problem solving.

Banking on Batch: An SRE Take on an Old-School Problem

SRE usually focuses on improving site availability and up time, but why stop there? The majority of financial institutions rely heavily on end-of-day batch runs, which when not executed properly, could result in SLA breaches, losses (both financial and reputation), and even regulatory penalties.

To provide the right data to the right systems, batches go through an intricate scheme of upstream-downstream systems as well as countless jobs executing in the backend. These make it a challenge for engineers to ensure that batch systems are reliable. In addition to the numerous dependencies between jobs that specify ordering of job execution, there are timing constraints as well as other internal dependencies not visible externally. This makes the daily operation of executing hundreds of thousands of such runs a challenge in its own right. In our organisation, running this requires a small army of operators monitoring, verifying, and intervening if necessary to get runs completed correctly and on time.

In this talk, we’ll share with you how we are doing a fresh take on this old-school problem, applying SRE principles of monitoring, automation, and machine learning to improve reliability and observability.

Karthikeyan Selvaraj and
Rajesh Ramachandran, PayPal

Wednesday, September 9, 2020
9:00–10:30 India Standard Time /
13:30–15:00 Australia Eastern Standard Time

Karthikeyan Selvaraj has more than 14 years of experience in software product development, worked as a consultant for companies such as Google, Linkedin, and Apple, and is currently leading Data DevOps Platform within PayPal. Karthik is an ardent supporter of open source platforms and a contributor. He has been a speaker in PyCon, SciPy, and SRE conferences.

Rajesh Ramachandran has more than 25+ years of experience in Product Engineering and Technical Product Management. He has extensively worked in SaaS and PaaS products within Enterprise organizations such as Trimble, Caterpillar, and PayPal. Currently at PayPal, he is leading the Data Application Lifecycle Product within the Core Data Platform Product Portfolio. He has been a panelist at various Product events in India as well as Volunteer Chair of the Product Leaders Forum in India.

Journey on Embedding SRE in Data Platforms: Insights & Lessons

DevOps philosophies today are mostly associated with service and web-based applications. With the invention of ML and AI, data is now capable of driving innovation, thereby creating new markets and revenues where the typical DevOps for building/optimizing code builds and delivery is only one piece of the bigger puzzle in the data analytics space.

It becomes significantly more important to include SRE processes such as instrumentation, metrics, and monitoring with proper change management controls as part of Data DevOps. In this talk, we share our journey on building the Governed Data DevOps Platform with abstracted technology management and embedded SRE workflows for thousands of its Data Engineers and Scientists. It covers how we were able to successfully run a reliable data platform with zero disruptions.

The talk covers in detail how to:

  • Successfully run a reliable data platform with zero disruptions
  • Embed SRE processes within the Data Development/Analytics process
  • Provide monitoring frameworks to efficiently run Data Platforms
  • Overcome the challenges that were faced in this journey

Event Sponsorship

Become a Sponsor: Sponsorship exposes your brand to highly qualified attendees, funds our diversity and student grants, supports open access to our conference content, and keeps USENIX conferences affordable. USENIX is a 501(c)(3) non-profit organization that relies on sponsor support to fulfill its mission. To learn more, please contact the Sponsorship Department with the conference name in your subject line.

The acceptance of any organization as a sponsor does not imply explicit or implicit approval by USENIX of the donor organization’s values or actions. In addition, sponsorship does not provide any control over conference program content. Questions? Contact the Sponsorship Department.