Monitoring and Tracing

Distributed Tracing Overview

Distributed tracing is essential for understanding the behavior of complex distributed systems. It allows you to track requests as they flow through multiple services and identify performance bottlenecks and failures.

Foundational Paper

Dapper: Google’s Distributed Systems Tracing Infrastructure

Dapper is Google’s large-scale distributed-systems tracing infrastructure. This seminal paper laid the foundation for modern distributed tracing systems and influenced the design of numerous open source projects.

Dapper Paper

Google’s approach to large-scale distributed systems tracing - essential reading for understanding modern tracing architectures

The Dapper paper introduces key concepts like trace trees, spans, and sampling strategies that are now standard in distributed tracing systems.

Open Source Tracing Projects

The following open source projects were directly inspired by Dapper’s design and provide production-ready distributed tracing capabilities:

Zipkin

A distributed tracing system that helps gather timing data for microservices architectures

Apache SkyWalking

Application performance monitoring system for distributed systems, especially designed for microservices, cloud native and container-based architectures

Pinpoint

An APM (Application Performance Management) tool for large-scale distributed systems written in Java

Apache HTrace

A tracing framework for use with distributed systems written in Java

Zipkin

Zipkin is one of the most widely adopted distributed tracing systems. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data.

Apache SkyWalking

Apache SkyWalking is an application performance monitoring system designed for microservices, cloud native, and container-based architectures. It provides distributed tracing, service mesh telemetry analysis, and metric aggregation.

Pinpoint

Pinpoint is an APM tool developed by Naver for large-scale distributed systems. It’s particularly well-suited for Java-based applications and provides detailed insights into application performance with minimal overhead.

Apache HTrace

HTrace is a tracing framework designed for distributed systems written in Java. It integrates with various Hadoop ecosystem components and provides flexible tracing capabilities.

All of these projects follow the core principles established by Dapper: low overhead, application-level transparency, and scalability to handle the volume of data generated by large distributed systems.

Key Tracing Concepts

When implementing distributed tracing, consider the sampling rate carefully. Too high and you’ll overwhelm your tracing infrastructure; too low and you might miss important failures.

Essential concepts in distributed tracing include:

Traces: The complete journey of a request through the system
Spans: Individual units of work within a trace
Sampling: Strategies for deciding which requests to trace
Context Propagation: Passing trace context across service boundaries
Aggregation: Combining trace data for analysis

Additional Resources

For more information on monitoring and observability in distributed systems:

High Scalability - Architectures of large-scale internet services
Amazon Builder’s Library - Amazon’s learnings on distributed systems, including monitoring and observability patterns

Overview

Learning Resources

Core Concepts

System Types

Operations

Community

Distributed Tracing Overview

Foundational Paper

Dapper: Google’s Distributed Systems Tracing Infrastructure

Dapper Paper

Open Source Tracing Projects

Zipkin

Apache SkyWalking

Pinpoint

Apache HTrace

Zipkin

Apache SkyWalking

Pinpoint

Apache HTrace

Key Tracing Concepts

Additional Resources

Build docs developers (and LLMs) love

Overview

Learning Resources

Core Concepts

System Types

Operations

Community

​Distributed Tracing Overview

​Foundational Paper

​Dapper: Google’s Distributed Systems Tracing Infrastructure

Dapper Paper

​Open Source Tracing Projects

Zipkin

Apache SkyWalking

Pinpoint

Apache HTrace

​Zipkin

​Apache SkyWalking

​Pinpoint

​Apache HTrace

​Key Tracing Concepts

​Additional Resources

Build docs developers (and LLMs) love

Distributed Tracing Overview

Foundational Paper

Dapper: Google’s Distributed Systems Tracing Infrastructure

Open Source Tracing Projects

Zipkin

Apache SkyWalking

Pinpoint

Apache HTrace

Key Tracing Concepts

Additional Resources