Skip to main content

Overview

Masar Eagle uses a complete observability stack built on OpenTelemetry, Prometheus, Grafana, and Jaeger. All services automatically export metrics, traces, and logs through the OpenTelemetry Collector.

Architecture

The monitoring stack follows this data flow:

OpenTelemetry Configuration

Service Instrumentation

All services are instrumented through the ServiceDefaults package (src/aspire/ServiceDefaults/Extensions.cs:34-73):
builder.Services.AddOpenTelemetry()
    .ConfigureResource(resource => resource
        .AddService(serviceName: serviceName)
        .AddAttributes(new Dictionary<string, object>
        {
            ["service.name"] = serviceName,
            ["deployment.environment"] = builder.Environment.EnvironmentName
        }))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddRuntimeInstrumentation())
    .WithTracing(tracing => tracing
        .AddSource(builder.Environment.ApplicationName)
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddGrpcClientInstrumentation());

Collector Configuration

The OpenTelemetry Collector (src/aspire/otelcollector/config.yaml) receives telemetry on multiple protocols:
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

Prometheus Setup

Configuration

Prometheus (src/aspire/prometheus/prometheus.yml) runs in OTLP receiver mode:
storage:
  tsdb:
    out_of_order_time_window: 30m

otlp:

scrape_configs:
  # No scrape configs - using OTLP receiver mode

Service Metrics

Each service automatically exports:

ASP.NET Core Metrics

  • HTTP request duration
  • Active requests
  • Failed requests
  • Response sizes

Runtime Metrics

  • GC collections
  • Memory allocations
  • Thread pool usage
  • Exception counts

HTTP Client Metrics

  • Outbound request duration
  • Connection pool stats
  • Failed requests
  • DNS lookup time

Custom Metrics

  • Error counters by type
  • Error duration histogram
  • Business metrics

Custom Error Metrics

The GlobalExceptionMiddleware (src/BuildingBlocks/Common/Middleware/GlobalExceptionMiddleware.cs:17-25) records detailed error metrics:
private static readonly Counter<long> ErrorCounter = ErrorMeter.CreateCounter<long>(
    "errors_total",
    "count",
    "Total number of errors by type and status code");

private static readonly Histogram<double> ErrorDuration = ErrorMeter.CreateHistogram<double>(
    "error_duration_seconds",
    "seconds",
    "Time taken to handle errors");
Tags include:
  • exception_type: Exception class name
  • error_category: ValidationError, AuthenticationError, BusinessLogicError, etc.
  • status_code: HTTP status code
  • service: Service name
  • is_client_error: true/false

Grafana Dashboards

Data Sources

Grafana (src/aspire/grafana/config/provisioning/datasources/default.yaml) is pre-configured with three data sources:
datasources:
  - name: Prometheus
    type: prometheus
    url: $PROMETHEUS_ENDPOINT
    uid: PBFA97CFB590B2093
  - name: Loki
    type: loki
    url: $LOKI_ENDPOINT
  - name: Jaeger
    type: jaeger
    url: $JAEGER_ENDPOINT

Pre-installed Dashboards

The platform includes three pre-configured dashboards:

ASP.NET Core

Overall service health, request rates, error rates, and response times

ASP.NET Core Endpoints

Per-endpoint metrics including latency percentiles and throughput

Loki Logs

Log aggregation with filtering, search, and pattern detection

Dashboard Provisioning

Dashboards are automatically loaded from (src/aspire/grafana/config/provisioning/dashboards/default.yaml):
providers:
  - name: Default
    folder: Default
    type: file
    options:
      path: /var/lib/grafana/dashboards

Service Configuration

AppHost Setup

The observability stack is configured in AppHost.cs (src/aspire/AppHost/AppHost.cs:25-88):
1

OpenTelemetry Collector

IResourceBuilder<OpenTelemetryCollectorResource> otelCollector = 
    builder.AddOpenTelemetryCollector("otelcollector", "../otelcollector/config.yaml");

string otelEndpoint = "http://otelcollector:4317";
2

Prometheus

IResourceBuilder<ContainerResource> prometheus = 
    builder.AddContainer("prometheus", "prom/prometheus", "v3.2.1")
    .WithBindMount("../prometheus", "/etc/prometheus", isReadOnly: true)
    .WithArgs("--web.enable-otlp-receiver", "--config.file=/etc/prometheus/prometheus.yml")
    .WithHttpEndpoint(targetPort: 9090, name: "http");
3

Jaeger

IResourceBuilder<ContainerResource> jaeger = 
    builder.AddContainer("jaeger", "jaegertracing/jaeger:latest")
    .WithBindMount("../jaeger/config.yaml", "/jaeger/config.yaml", isReadOnly: true)
    .WithEndpoint(port: 16686, targetPort: 16686, scheme: "http", name: "http", isExternal: true)
    .WithEndpoint(port: 4317, targetPort: 4317, name: "grpc-collector");
4

Grafana

IResourceBuilder<ContainerResource> grafana = 
    builder.AddContainer("grafana", "grafana/grafana")
    .WithBindMount("../grafana/config", "/etc/grafana", isReadOnly: true)
    .WithBindMount("../grafana/dashboards", "/var/lib/grafana/dashboards", isReadOnly: true)
    .WithEnvironment("PROMETHEUS_ENDPOINT", prometheus.GetEndpoint("http"))
    .WithEnvironment("LOKI_ENDPOINT", loki.GetEndpoint("http"))
    .WithEnvironment("JAEGER_ENDPOINT", jaeger.GetEndpoint("http"))
    .WithHttpEndpoint(targetPort: 3000, name: "http");
5

Service OTLP Configuration

Each service gets OTLP configuration:
builder.AddProject<User>(Services.User)
    .WithEnvironment("OTEL_EXPORTER_OTLP_ENDPOINT", otelEndpoint)
    .WithEnvironment("OTEL_EXPORTER_OTLP_PROTOCOL", "grpc")
    .WithEnvironment("OTEL_SERVICE_NAME", "user")
    .WaitFor(otelCollector);

Accessing Dashboards

Grafana

Unified visualization platform

Prometheus

Metrics storage and queries

Jaeger

Distributed tracing UI

Query Examples

PromQL Queries

rate(http_server_request_duration_seconds_count{service_name=~"user|trip|gateway"}[5m])
histogram_quantile(0.95, 
  rate(http_server_request_duration_seconds_bucket[5m])
)
sum by (exception_type, service) (
  rate(errors_total[5m])
)
process_runtime_dotnet_gc_heap_size_bytes{service_name="user"}

Health Checks

All services expose health check endpoints (src/aspire/ServiceDefaults/Extensions.cs:75-96):
builder.Services.AddHealthChecks()
    .AddCheck("self", () => HealthCheckResult.Healthy(), ["live"]);
curl http://localhost:8080/health
Health checks are only exposed in Development mode for security.

Best Practices

Use Structured Logging

All logs include structured metadata for better filtering in Grafana

Add Custom Spans

Use ActivitySource to add custom trace spans for business operations

Monitor Error Categories

Track error categories (ValidationError, AuthenticationError, etc.) separately

Set SLO Alerts

Configure Grafana alerts for SLO violations (error rate, latency)

Next Steps

Logging

Configure Loki logging and log levels

Troubleshooting

Common issues and debugging strategies

Build docs developers (and LLMs) love