Overview
Masar Eagle uses a complete observability stack built on OpenTelemetry, Prometheus, Grafana, and Jaeger. All services automatically export metrics, traces, and logs through the OpenTelemetry Collector.
Architecture
The monitoring stack follows this data flow:
OpenTelemetry Configuration
Service Instrumentation
All services are instrumented through the ServiceDefaults package (src/aspire/ServiceDefaults/Extensions.cs:34-73):
builder . Services . AddOpenTelemetry ()
. ConfigureResource ( resource => resource
. AddService ( serviceName : serviceName )
. AddAttributes ( new Dictionary < string , object >
{
[ "service.name" ] = serviceName ,
[ "deployment.environment" ] = builder . Environment . EnvironmentName
}))
. WithMetrics ( metrics => metrics
. AddAspNetCoreInstrumentation ()
. AddHttpClientInstrumentation ()
. AddRuntimeInstrumentation ())
. WithTracing ( tracing => tracing
. AddSource ( builder . Environment . ApplicationName )
. AddAspNetCoreInstrumentation ()
. AddHttpClientInstrumentation ()
. AddGrpcClientInstrumentation ());
Collector Configuration
The OpenTelemetry Collector (src/aspire/otelcollector/config.yaml) receives telemetry on multiple protocols:
Receivers
Exporters
Pipelines
receivers :
otlp :
protocols :
grpc :
endpoint : 0.0.0.0:4317
http :
endpoint : 0.0.0.0:4318
Prometheus Setup
Configuration
Prometheus (src/aspire/prometheus/prometheus.yml) runs in OTLP receiver mode:
storage :
tsdb :
out_of_order_time_window : 30m
otlp :
scrape_configs :
# No scrape configs - using OTLP receiver mode
Service Metrics
Each service automatically exports:
ASP.NET Core Metrics
HTTP request duration
Active requests
Failed requests
Response sizes
Runtime Metrics
GC collections
Memory allocations
Thread pool usage
Exception counts
HTTP Client Metrics
Outbound request duration
Connection pool stats
Failed requests
DNS lookup time
Custom Metrics
Error counters by type
Error duration histogram
Business metrics
Custom Error Metrics
The GlobalExceptionMiddleware (src/BuildingBlocks/Common/Middleware/GlobalExceptionMiddleware.cs:17-25) records detailed error metrics:
private static readonly Counter < long > ErrorCounter = ErrorMeter . CreateCounter < long >(
"errors_total" ,
"count" ,
"Total number of errors by type and status code" );
private static readonly Histogram < double > ErrorDuration = ErrorMeter . CreateHistogram < double >(
"error_duration_seconds" ,
"seconds" ,
"Time taken to handle errors" );
Tags include:
exception_type: Exception class name
error_category: ValidationError, AuthenticationError, BusinessLogicError, etc.
status_code: HTTP status code
service: Service name
is_client_error: true/false
Grafana Dashboards
Data Sources
Grafana (src/aspire/grafana/config/provisioning/datasources/default.yaml) is pre-configured with three data sources:
datasources :
- name : Prometheus
type : prometheus
url : $PROMETHEUS_ENDPOINT
uid : PBFA97CFB590B2093
- name : Loki
type : loki
url : $LOKI_ENDPOINT
- name : Jaeger
type : jaeger
url : $JAEGER_ENDPOINT
Pre-installed Dashboards
The platform includes three pre-configured dashboards:
ASP.NET Core Overall service health, request rates, error rates, and response times
ASP.NET Core Endpoints Per-endpoint metrics including latency percentiles and throughput
Loki Logs Log aggregation with filtering, search, and pattern detection
Dashboard Provisioning
Dashboards are automatically loaded from (src/aspire/grafana/config/provisioning/dashboards/default.yaml):
providers :
- name : Default
folder : Default
type : file
options :
path : /var/lib/grafana/dashboards
Service Configuration
AppHost Setup
The observability stack is configured in AppHost.cs (src/aspire/AppHost/AppHost.cs:25-88):
OpenTelemetry Collector
IResourceBuilder < OpenTelemetryCollectorResource > otelCollector =
builder . AddOpenTelemetryCollector ( "otelcollector" , "../otelcollector/config.yaml" );
string otelEndpoint = "http://otelcollector:4317" ;
Prometheus
IResourceBuilder < ContainerResource > prometheus =
builder . AddContainer ( "prometheus" , "prom/prometheus" , "v3.2.1" )
. WithBindMount ( "../prometheus" , "/etc/prometheus" , isReadOnly : true )
. WithArgs ( "--web.enable-otlp-receiver" , "--config.file=/etc/prometheus/prometheus.yml" )
. WithHttpEndpoint ( targetPort : 9090 , name : "http" );
Jaeger
IResourceBuilder < ContainerResource > jaeger =
builder . AddContainer ( "jaeger" , "jaegertracing/jaeger:latest" )
. WithBindMount ( "../jaeger/config.yaml" , "/jaeger/config.yaml" , isReadOnly : true )
. WithEndpoint ( port : 16686 , targetPort : 16686 , scheme : "http" , name : "http" , isExternal : true )
. WithEndpoint ( port : 4317 , targetPort : 4317 , name : "grpc-collector" );
Grafana
IResourceBuilder < ContainerResource > grafana =
builder . AddContainer ( "grafana" , "grafana/grafana" )
. WithBindMount ( "../grafana/config" , "/etc/grafana" , isReadOnly : true )
. WithBindMount ( "../grafana/dashboards" , "/var/lib/grafana/dashboards" , isReadOnly : true )
. WithEnvironment ( "PROMETHEUS_ENDPOINT" , prometheus . GetEndpoint ( "http" ))
. WithEnvironment ( "LOKI_ENDPOINT" , loki . GetEndpoint ( "http" ))
. WithEnvironment ( "JAEGER_ENDPOINT" , jaeger . GetEndpoint ( "http" ))
. WithHttpEndpoint ( targetPort : 3000 , name : "http" );
Service OTLP Configuration
Each service gets OTLP configuration: builder . AddProject < User >( Services . User )
. WithEnvironment ( "OTEL_EXPORTER_OTLP_ENDPOINT" , otelEndpoint )
. WithEnvironment ( "OTEL_EXPORTER_OTLP_PROTOCOL" , "grpc" )
. WithEnvironment ( "OTEL_SERVICE_NAME" , "user" )
. WaitFor ( otelCollector );
Accessing Dashboards
Grafana Unified visualization platform
Prometheus Metrics storage and queries
Jaeger Distributed tracing UI
Query Examples
PromQL Queries
rate(http_server_request_duration_seconds_count{service_name=~"user|trip|gateway"}[5m])
histogram_quantile(0.95,
rate(http_server_request_duration_seconds_bucket[5m])
)
sum by (exception_type, service) (
rate(errors_total[5m])
)
process_runtime_dotnet_gc_heap_size_bytes{service_name="user"}
Health Checks
All services expose health check endpoints (src/aspire/ServiceDefaults/Extensions.cs:75-96):
builder . Services . AddHealthChecks ()
. AddCheck ( "self" , () => HealthCheckResult . Healthy (), [ "live" ]);
Health Endpoint
Liveness Check
curl http://localhost:8080/health
Health checks are only exposed in Development mode for security.
Best Practices
Use Structured Logging All logs include structured metadata for better filtering in Grafana
Add Custom Spans Use ActivitySource to add custom trace spans for business operations
Monitor Error Categories Track error categories (ValidationError, AuthenticationError, etc.) separately
Set SLO Alerts Configure Grafana alerts for SLO violations (error rate, latency)
Next Steps
Logging Configure Loki logging and log levels
Troubleshooting Common issues and debugging strategies