Skip to main content

Overview

KloudMate Agent is an OpenTelemetry Collector distribution that extends the upstream collector with automated deployment, remote configuration management, and lifecycle orchestration. The agent architecture separates concerns between agent management and collector execution, enabling dynamic configuration updates without manual intervention. Agent Lifecycle

Core Design Principles

The KloudMate Agent architecture is built on several key principles:

Separation of Concerns

The Agent manages lifecycle and configuration while the Collector handles telemetry processing

Remote Configuration

Configuration updates are pulled from remote APIs, eliminating the need for SSH access

Graceful Lifecycle

Atomic configuration updates with zero-downtime collector restarts

Multi-Platform

Unified architecture supports Linux, Docker, Kubernetes, and Windows deployments

System Components

The KloudMate Agent consists of several interconnected components:

Agent Layer

The agent layer is responsible for:
  • Lifecycle Management: Starting, stopping, and restarting the OpenTelemetry Collector
  • Configuration Watching: Periodically checking for configuration updates from remote APIs
  • State Tracking: Monitoring agent and collector status for health reporting
  • Service Integration: Running as a system service (systemd, Docker, Kubernetes)
internal/agent/agent.go
type Agent struct {
    cfg            *config.Config
    logger         *zap.SugaredLogger
    collector      *otelcol.Collector
    updater        *updater.ConfigUpdater
    shutdownSignal chan struct{}
    wg             sync.WaitGroup
    collectorMu    sync.Mutex
    isRunning      atomic.Bool
    collectorError string
    version        string
}

Collector Layer

The collector layer uses the upstream OpenTelemetry Collector with a curated set of components:
  • Receivers: Collect telemetry data from various sources (host metrics, logs, traces)
  • Processors: Transform, filter, and enrich telemetry data
  • Exporters: Send telemetry to backends (OTLP endpoints)
  • Extensions: Provide additional functionality (health checks, pprof)

Configuration Updater

The updater component handles remote configuration synchronization:
  • Polls remote API at configurable intervals (default: 60 seconds)
  • Compares local and remote configurations
  • Triggers collector restart when configuration changes
  • Reports agent and collector status to the API
internal/updater/updater.go
type ConfigUpdater struct {
    cfg        *config.Config
    logger     *zap.SugaredLogger
    client     *http.Client
    configPath string
}

Deployment Modes

KloudMate Agent supports multiple deployment modes, each optimized for specific environments:
Runs as a system service on Linux or Windows hosts. Collects host-level metrics, logs, and traces.Use Cases:
  • Bare metal servers
  • Virtual machines
  • Traditional infrastructure
See Host Agent Architecture for details.

Communication Flow

1

Agent Initialization

The agent starts and loads its configuration from environment variables, config files, or CLI flags.
cmd/kmagent/main.go
program := &Program{
    logger:  sugar,
    cfg:     &config.Config{},
    wg:      wg,
    version: version,
}
2

Collector Startup

The agent creates and starts an OpenTelemetry Collector instance with the current configuration.
internal/agent/agent.go
collector, err := NewCollector(a.cfg)
if err != nil {
    return fmt.Errorf("failed to create new collector instance: %w", err)
}
3

Configuration Watching

The agent periodically polls the remote API for configuration updates, sending status information.
internal/agent/agent.go
ticker := time.NewTicker(time.Duration(a.cfg.ConfigCheckInterval) * time.Second)
4

Dynamic Updates

When a configuration change is detected, the agent stops the current collector, updates the config file, and starts a new collector instance.
internal/agent/agent.go
a.stopCollectorInstance()
a.wg.Add(1)
go func() {
    defer a.wg.Done()
    if err := a.manageCollectorLifecycle(agentCtx); err != nil {
        a.collectorError = err.Error()
    }
}()

State Management

The agent maintains state using atomic operations and mutexes to ensure thread-safe access:
// Atomic boolean for running state
isRunning atomic.Bool

// Mutex-protected collector reference
collectorMu sync.Mutex
collector   *otelcol.Collector

// Error tracking
collectorError string

Configuration Sources

The agent supports multiple configuration sources with a priority hierarchy:
Highest priority. Used for runtime configuration:
  • KM_API_KEY: Authentication key for remote endpoints
  • KM_COLLECTOR_ENDPOINT: OpenTelemetry exporter endpoint
  • KM_CONFIG_CHECK_INTERVAL: Interval for configuration polling
  • KM_UPDATE_ENDPOINT: Remote configuration API endpoint
Command-line arguments override defaults:
  • --config: Path to collector configuration file
  • --api-key: API key for authentication
  • --collector-endpoint: Exporter endpoint
  • --config-check-interval: Update check interval
YAML configuration file (platform-specific paths):
  • Linux: /etc/kmagent/config.yaml
  • Windows: <executable-dir>/config.yaml
  • Docker: /etc/kmagent/config.yaml
Configuration pulled from KloudMate API:
  • Dynamic updates without restarts
  • Centralized configuration management
  • Version-specific configurations

Security Considerations

The agent communicates with remote APIs for configuration updates. Ensure proper authentication and network security:
  • Always use HTTPS endpoints for KM_COLLECTOR_ENDPOINT and KM_UPDATE_ENDPOINT
  • Protect your KM_API_KEY - it’s used for authentication
  • Configuration files may contain sensitive data - use appropriate file permissions
  • In Kubernetes, ConfigMaps are managed remotely - avoid manual edits

Observability

The agent provides multiple observability mechanisms:
  • Structured Logging: JSON-formatted logs with configurable levels
  • Status Reporting: Agent and collector status sent to remote API
  • Error Tracking: Collector errors captured and reported
  • Health Checks: Built-in health check extensions in the collector
internal/agent/agent.go
a.logger.Infow("collector configuration updated", 
    "configPath", a.cfg.OtelConfigPath)

Next Steps

Host Agent

Learn about the host agent architecture and lifecycle management

Kubernetes Agent

Understand the Kubernetes agent deployment model

Collector Lifecycle

Deep dive into collector lifecycle and restart mechanisms

Configuration

Configure your agent for different scenarios

Build docs developers (and LLMs) love