SouthEast Linux Fest 2024

Troubleshooting Microservice Architectures
06-08, 14:00–14:50 (EST5EDT), System76 (BallroomA)

This talk explores troubleshooting complex systems with hundreds or thousands of services. It addresses key questions on necessary service information and dashboard structuring for rapid comprehension. Effective utilization of observability signals like metrics, logs, traces, and profiles is discussed, along with automation strategies for root cause analysis.


In this talk, we’ll explore how to troubleshoot systems containing hundreds or even thousands of services. We’ll discuss the following questions:
What do we need to know about each service?
How should we organize dashboards to quickly understand the current state of the system?
How can we effectively leverage each observability signal, such as metrics, logs, traces, and profiles? We will discuss when to use each of them and the specific questions they can answer.
Are there ways to automate root cause analysis either completely or at least partially?

Nikolay Sivko, Founder & CEO at Coroot, is on a mission to make production troubleshooting easier for developers everywhere. He's deeply enthusiastic about SRE practices, observability, open source solutions. With over a decade of hands-on experience in the Observability field, Nikolay is a seasoned expert who brings practical insights.

This speaker also appears in:

Peter Zaitsev, an entrepreneur and co-founder of Percona, Coroot, and FerretDB, is an expert in Open Source strategy and database optimization. Peter also advises numerous open-source startups and co-authored the book "High Performance MySQL."

This speaker also appears in: