Observability Configuration - SGIVU Config Repository

SGIVU implements comprehensive observability through Spring Boot Actuator, Zipkin distributed tracing, and structured logging.

Actuator Endpoints

Spring Boot Actuator provides production-ready monitoring and management endpoints.

Endpoint Exposure

management.endpoints.web.exposure.include

string

required

Comma-separated list of actuator endpoints to expose over HTTP. Different profiles expose different endpoints.

Default Profile (All Services)

management:
  endpoints:
    web:
      exposure:
        include: health, info

By default, services only expose health and info endpoints to minimize the attack surface.

Development Profile

management:
  endpoints:
    web:
      exposure:
        include: "*"
  endpoint:
    health:
      show-details: always

Development profiles expose all actuator endpoints for debugging. Never use "*" in production.

Production Profile

management:
  endpoints:
    web:
      exposure:
        include: health, info, prometheus

Production adds the prometheus endpoint for metrics scraping by monitoring systems.

Exposed Endpoints

Health

GET /actuator/healthService health status and readiness checks

Info

GET /actuator/infoApplication metadata and build information

Prometheus

GET /actuator/prometheusMetrics in Prometheus format (production only)

All Endpoints

GET /actuatorLists all available endpoints (dev only)

Health Check Configuration

Health Details Visibility

management.endpoint.health.show-details

string

required

Controls health check detail visibility. Values: never, when-authorized, always.

Default and Production

management:
  endpoint:
    health:
      show-details: never

Services hide health details by default to prevent information leakage. Health endpoint returns only UP or DOWN status.

Development

management:
  endpoint:
    health:
      show-details: always

Development profiles show full health details including database connectivity, disk space, and other indicators.

Example Health Responses

Production (show-details: never)

{
  "status": "UP"
}

Development (show-details: always)

{
  "status": "UP",
  "components": {
    "db": {
      "status": "UP",
      "details": {
        "database": "PostgreSQL",
        "validationQuery": "isValid()"
      }
    },
    "diskSpace": {
      "status": "UP",
      "details": {
        "total": 499963174912,
        "free": 262958080000,
        "threshold": 10485760
      }
    },
    "ping": {
      "status": "UP"
    }
  }
}

Distributed Tracing

SGIVU uses Zipkin for distributed tracing to track requests across microservices.

Tracing Configuration

management.tracing.sampling.probability

number

required

Percentage of requests to trace (0.0 to 1.0). Set to 0.1 (10%) by default to balance observability with performance.

management.zipkin.tracing.endpoint

string

required

Zipkin server endpoint for sending trace data.

management:
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

Sampling Strategy

Why 10% sampling?

Tracing every request can impact performance and generate massive amounts of data. A 10% sampling rate:

Provides sufficient visibility into system behavior
Captures enough traces to detect patterns and issues
Minimizes performance overhead
Reduces storage requirements

Increase the sampling rate to 1.0 (100%) in development for complete visibility.

Service Examples

Gateway Service

management:
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

Auth Service

management:
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

Domain Services (User, Client, Vehicle, Purchase-Sale)

All domain services use identical tracing configuration:

management:
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

Logging Configuration

SGIVU uses SLF4J with Logback for structured logging.

Log Levels

logging.level.root

string

Root logger level. Set to INFO in all environments for balanced logging.

logging.level.{package}

string

Package-specific log levels for fine-grained control.

Standard Configuration

logging:
  level:
    root: INFO

Service-Specific Logging

Gateway Service

logging:
  level:
    root: INFO
    com.sgivu.gateway.security: INFO
    com.sgivu.gateway.controller: INFO

The Gateway explicitly sets log levels for security and controller packages to ensure authentication/authorization events are captured.

Vehicle Service

logging:
  level:
    root: INFO
    software:
      amazon:
        awssdk: info

The Vehicle Service sets AWS SDK logging to info to reduce verbosity from S3 operations.

Log Level Guide

ERROR

Critical failures requiring immediate attention

WARN

Potentially harmful situations that don’t stop execution

INFO

Important runtime events and milestones

DEBUG

Detailed diagnostic information (development only)

Complete Observability Examples

Gateway Service

management:
  endpoints:
    web:
      exposure:
        include: health, info
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

logging:
  level:
    root: INFO
    com.sgivu.gateway.security: INFO
    com.sgivu.gateway.controller: INFO

Auth Service

management:
  endpoints:
    web:
      exposure:
        include: health, info
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

logging:
  level:
    root: INFO

User Service

management:
  endpoints:
    web:
      exposure:
        include: health, info
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

logging:
  level:
    root: INFO

Client Service

management:
  endpoints:
    web:
      exposure:
        include: health, info
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

logging:
  level:
    root: INFO

Vehicle Service

management:
  endpoints:
    web:
      exposure:
        include: health, info
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

logging:
  level:
    root: INFO
    software:
      amazon:
        awssdk: info

Purchase-Sale Service

management:
  endpoints:
    web:
      exposure:
        include: health, info
  endpoint:
    health:
      show-details: never
  tracing:
    sampling:
      probability: 0.1
  zipkin:
    tracing:
      endpoint: http://sgivu-zipkin:9411/api/v2/spans

logging:
  level:
    root: INFO

Observability Stack

Service Instrumentation

Spring Boot Actuator and Micrometer automatically instrument services with metrics, health checks, and trace context propagation.

Trace Collection

Sampled requests generate trace spans that are sent to the Zipkin server at http://sgivu-zipkin:9411.

Metrics Export

Production services expose Prometheus metrics at /actuator/prometheus for scraping.

Health Monitoring

Orchestrators and load balancers query /actuator/health to determine service availability.

Log Aggregation

Container logs (stdout/stderr) are collected by the container runtime and can be shipped to centralized logging systems.

Monitoring Best Practices

Set up health check alerts

Configure your orchestrator (Kubernetes, Docker Swarm, ECS) to:

Perform health checks on /actuator/health
Set appropriate timeout and interval values
Restart unhealthy containers automatically
Alert on repeated health check failures

Example Kubernetes liveness probe:

livenessProbe:
  httpGet:
    path: /actuator/health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

Use Zipkin UI for request tracing

Access the Zipkin UI (typically at http://localhost:9411) to:

Search for traces by service name or time range
Visualize request flows across services
Identify performance bottlenecks
Debug distributed transaction failures
Analyze service dependencies

Adjust sampling rate based on traffic

Low traffic environments (< 100 req/min):

management:
  tracing:
    sampling:
      probability: 1.0  # Trace 100%

Medium traffic environments (100-1000 req/min):

management:
  tracing:
    sampling:
      probability: 0.1  # Trace 10%

High traffic environments (> 1000 req/min):

management:
  tracing:
    sampling:
      probability: 0.01  # Trace 1%

Export Prometheus metrics to Grafana

Deploy Prometheus to scrape /actuator/prometheus endpoints
Configure Prometheus as a data source in Grafana
Import Spring Boot dashboards from Grafana.com
Create custom dashboards for business metrics
Set up alerting rules in Prometheus/Alertmanager

Implement structured logging

Use structured logging (JSON format) for better log analysis:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Logger log = LoggerFactory.getLogger(UserService.class);

// Good: Structured with context
log.info("User created: userId={}, email={}", user.getId(), user.getEmail());

// Better: With MDC for request tracing
MDC.put("userId", user.getId());
log.info("User login successful");

This enables powerful log filtering and analysis in centralized logging systems.

Troubleshooting

Health check endpoint returns 404

Causes:

Actuator dependency not included
Endpoints not exposed in configuration
Context path misconfiguration

Solutions:

Verify spring-boot-starter-actuator is in dependencies
Check management.endpoints.web.exposure.include configuration
Try full path: /actuator/health (not just /health)

No traces appearing in Zipkin

Causes:

Zipkin server unreachable
Sampling probability too low
Micrometer tracing not configured

Solutions:

Verify Zipkin is running: curl http://sgivu-zipkin:9411/health
Increase sampling to 1.0 for testing
Check service logs for Zipkin connection errors
Verify spring-cloud-starter-zipkin dependency

Too many actuator endpoints exposed in production

Risk: Information disclosure and potential security vulnerabilitiesSolution: Ensure production profile uses restricted endpoint exposure:

management:
  endpoints:
    web:
      exposure:
        include: health, info, prometheus

Never use include: "*" in production.

High CPU usage from tracing

Cause: Sampling probability too high for traffic volumeSolution: Reduce sampling probability:

management:
  tracing:
    sampling:
      probability: 0.01  # 1% instead of 10%

Monitor CPU usage after adjustment.

Spring Configuration

Server and application settings

Environment Variables

Observability-related environment variables

Deployment

Zipkin and monitoring stack deployment

Service Architecture

How observability fits into the system

Common Patterns

Best Practices

​Actuator Endpoints

​Endpoint Exposure

​Default Profile (All Services)

​Development Profile

​Production Profile

​Exposed Endpoints

Health

Info

Prometheus

All Endpoints

​Health Check Configuration

​Health Details Visibility

​Default and Production

​Development

​Example Health Responses

​Production (show-details: never)

​Development (show-details: always)

​Distributed Tracing

​Tracing Configuration

​Sampling Strategy

​Service Examples

​Gateway Service

​Auth Service

​Domain Services (User, Client, Vehicle, Purchase-Sale)

​Logging Configuration

​Log Levels

​Standard Configuration

​Service-Specific Logging

​Gateway Service

​Vehicle Service

​Log Level Guide

ERROR

WARN

INFO

DEBUG

​Complete Observability Examples

​Gateway Service

​Auth Service

​User Service

​Client Service

​Vehicle Service

​Purchase-Sale Service

​Observability Stack

​Monitoring Best Practices

​Troubleshooting

​Related Configuration

Spring Configuration

Environment Variables

Deployment

Service Architecture

Build docs developers (and LLMs) love

Actuator Endpoints

Endpoint Exposure

Default Profile (All Services)

Development Profile

Production Profile

Exposed Endpoints

Health Check Configuration

Health Details Visibility

Default and Production

Development

Example Health Responses

Production (show-details: never)

Development (show-details: always)

Distributed Tracing

Tracing Configuration

Sampling Strategy

Service Examples

Gateway Service

Auth Service

Domain Services (User, Client, Vehicle, Purchase-Sale)

Logging Configuration

Log Levels

Standard Configuration

Service-Specific Logging

Gateway Service

Vehicle Service

Log Level Guide

Complete Observability Examples

Gateway Service

Auth Service

User Service

Client Service

Vehicle Service

Purchase-Sale Service

Observability Stack

Monitoring Best Practices

Troubleshooting

Related Configuration