Skip to main content

Overview

The KloudMate Kubernetes Agent runs as both a DaemonSet (for node-level monitoring) and a Deployment (for cluster-level monitoring) within Kubernetes clusters. This dual-deployment architecture enables comprehensive observability across nodes, pods, and cluster resources. Kubernetes Agent Architecture

Deployment Architecture

The Kubernetes agent uses a two-component architecture:

DaemonSet Agent

Runs on every node to collect:
  • Node-level metrics (CPU, memory, disk, network)
  • Container metrics via cAdvisor
  • Host logs and container logs
  • eBPF-based network monitoring

Deployment Agent

Runs as a single replica to collect:
  • Cluster-level metrics
  • Kubernetes events
  • API server metrics
  • Custom resource monitoring
The eBPF receiver is automatically disabled in Deployment mode as it requires host-level access available only to DaemonSet pods.

Agent Components

The Kubernetes agent is structurally different from the host agent, with simplified lifecycle management:
internal/k8sagent/agent.go
type K8sAgent struct {
    Cfg       *K8sConfig
    Logger    *zap.SugaredLogger
    Collector *otelcol.Collector
    K8sClient *kubernetes.Clientset
    
    collectorMu     sync.Mutex
    wg              sync.WaitGroup
    collectorCtx    context.Context
    collectorCancel context.CancelFunc
    stopCh          chan struct{}
    AgentInfo       AgentInfo
}

Configuration Structure

internal/k8sagent/agent.go
type K8sConfig struct {
    APIKey              string `env:"KM_API_KEY"`
    CollectorEndpoint   string `env:"KM_COLLECTOR_ENDPOINT"`
    ConfigCheckInterval string `env:"KM_CONFIG_CHECK_INTERVAL"`
    DeploymentMode      string `env:"DEPLOYMENT_MODE"`
    ConfigMapName       string `env:"CONFIGMAP_NAME"`
    PodNamespace        string `env:"POD_NAMESPACE"`
}

Agent Initialization

1

Logger Setup

The agent initializes a production-grade logger with configurable log levels.
internal/k8sagent/agent.go
zapCfg := zap.NewProductionConfig()
zapCfg.Level = zap.NewAtomicLevelAt(kmlogger.ParseLogLevel())
zapLogger, err := zapCfg.Build()
logger := zapLogger.Sugar()
2

Configuration Loading

Configuration is loaded from environment variables injected by Kubernetes.
internal/k8sagent/agent.go
cfg := NewK8sConfig()

if strings.ToUpper(config.DeploymentMode) == "DAEMONSET" {
    config.DeploymentMode = "DAEMONSET"
} else {
    config.DeploymentMode = "DEPLOYMENT"
}
3

Kubernetes Client Creation

The agent creates a Kubernetes client using in-cluster configuration.
internal/k8sagent/agent.go
kubecfg, err := rest.InClusterConfig()
if err != nil {
    return nil, fmt.Errorf("failed to load in-cluster config: %w", err)
}

k8sClient, err := kubernetes.NewForConfig(kubecfg)
if err != nil {
    return nil, fmt.Errorf("failed to create kubernetes client: %w", err)
}
4

Version Information

Agent version information is set as environment variables for use by processors.
internal/k8sagent/agent.go
func (r *AgentInfo) setEnvForAgentVersion() {
    os.Setenv("KM_AGENT_VERSION", r.Version)
}

agent.AgentInfo.setEnvForAgentVersion()
agent.AgentInfo.CollectorVersion = version.GetCollectorVersion()

Collector Lifecycle

Unlike the host agent, the Kubernetes agent has a simpler lifecycle without remote configuration updates:

Startup Sequence

internal/k8sagent/agent.go
func (km *K8sAgent) StartAgent(ctx context.Context) error {
    km.Logger.Infow("starting kubernetes agent",
        "version", km.AgentInfo.Version,
        "commitSHA", km.AgentInfo.CommitSHA,
        "collectorVersion", km.AgentInfo.CollectorVersion,
    )
    return km.Start(ctx)
}

func (a *K8sAgent) Start(ctx context.Context) error {
    if err := a.startInternalCollector(); err != nil {
        return fmt.Errorf("failed to start collector: %w", err)
    }
    a.Logger.Info("collector agent started")
    return nil
}

Collector Creation

The collector is created with deployment-mode-specific component filtering:
internal/k8sagent/collector.go
func (a *K8sAgent) startInternalCollector() error {
    a.collectorMu.Lock()
    defer a.collectorMu.Unlock()
    
    collectorSettings := shared.CollectorInfoFactory(a.otelConfigPath())
    
    if a.Cfg.DeploymentMode == "DEPLOYMENT" {
        factories, err := collectorSettings.Factories()
        if err == nil {
            // eBPF receiver cannot run in deployment mode
            for typeName := range factories.Receivers {
                if typeName.String() == "ebpfreceiver" {
                    delete(factories.Receivers, typeName)
                }
            }
            collectorSettings.Factories = func() (otelcol.Factories, error) {
                return factories, nil
            }
        }
    }
    
    // Create context for this collector instance
    a.collectorCtx, a.collectorCancel = context.WithCancel(context.Background())
    
    collector, err := otelcol.NewCollector(collectorSettings)
    if err != nil {
        a.collectorCancel()
        return fmt.Errorf("failed to create new collector: %w", err)
    }
    a.Collector = collector
    
    // Start collector in goroutine
    a.wg.Add(1)
    go func(col *otelcol.Collector, ctx context.Context) {
        defer a.wg.Done()
        
        runErr := col.Run(ctx)
        
        a.collectorMu.Lock()
        if a.Collector == col {
            a.Collector = nil
        }
        a.collectorMu.Unlock()
        
        if runErr != nil {
            a.Logger.Errorw("collector exited with error", "error", runErr)
        }
    }(a.Collector, a.collectorCtx)
    
    return nil
}

Configuration Paths

The agent uses different configuration files based on deployment mode:
internal/k8sagent/agent.go
func (c *K8sAgent) otelConfigPath() string {
    daemonsetURI := "/etc/kmagent/agent-daemonset.yaml"
    deploymentURI := "/etc/kmagent/agent-deployment.yaml"
    
    if c.Cfg.DeploymentMode == "DAEMONSET" {
        return daemonsetURI
    } else {
        return deploymentURI
    }
}
Configuration files are mounted from ConfigMaps and should not be modified manually. Use the KloudMate web interface for configuration changes.

Graceful Shutdown

The Kubernetes agent implements a multi-stage shutdown process:
1

Signal Handler

The main function sets up signal handlers for SIGINT and SIGTERM.
cmd/kubeagent/main.go
func handleSignals(cancelFunc context.CancelFunc, agent *k8sagent.K8sAgent) {
    sigChan := make(chan os.Signal, 1)
    signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
    
    go func() {
        sig := <-sigChan
        agent.Logger.Warnf("Received signal %s, initiating shutdown...", sig)
        cancelFunc()
        agent.Stop()
    }()
}
2

Collector Shutdown

The collector is gracefully stopped with a timeout.
internal/k8sagent/collector.go
func (a *K8sAgent) stopInternalCollector() {
    a.collectorMu.Lock()
    defer a.collectorMu.Unlock()
    
    if a.Collector == nil {
        return
    }
    
    // Cancel collector context
    if a.collectorCancel != nil {
        a.collectorCancel()
    }
    
    // Shutdown with timeout
    shutdownCtx, shutdownCancel := context.WithTimeout(
        context.Background(), 10*time.Second)
    defer shutdownCancel()
    
    done := make(chan struct{})
    go func() {
        a.Collector.Shutdown()
        close(done)
    }()
    
    select {
    case <-done:
        a.Logger.Info("collector instance stopped successfully")
    case <-shutdownCtx.Done():
        a.Logger.Warnw("collector shutdown timed out", "timeout", "10s")
    }
    
    a.Collector = nil
}
3

Wait for Goroutines

The agent waits for all goroutines to complete before exiting.
internal/k8sagent/agent.go
func (a *K8sAgent) Stop() {
    a.Logger.Info("stopping collector agent")
    close(a.stopCh)
    a.wg.Wait()
    a.stopInternalCollector()
    a.Logger.Info("collector agent stopped")
}

Configuration Management

The Kubernetes agent receives configuration through ConfigMaps:
Important: Manually updating ConfigMaps for DaemonSet or Deployment agents is not recommended.Configurations may be overwritten by updates sent from KloudMate APIs. Always use the KloudMate Agent Config Editor (web-based YAML editor) to ensure configurations are properly synchronized and persisted.

ConfigMap Structure

apiVersion: v1
kind: ConfigMap
metadata:
  name: km-agent-daemonset-config
  namespace: km-agent
data:
  agent-daemonset.yaml: |
    receivers:
      hostmetrics:
        collection_interval: 30s
        scrapers:
          cpu:
          memory:
          disk:
          network:
    # ... rest of configuration

Environment Variables

Key environment variables are injected via Kubernetes:
env:
  - name: KM_API_KEY
    valueFrom:
      secretKeyRef:
        name: km-agent-secret
        key: api-key
  - name: KM_COLLECTOR_ENDPOINT
    value: "https://otel.kloudmate.com:4318"
  - name: DEPLOYMENT_MODE
    value: "DAEMONSET"
  - name: CONFIGMAP_NAME
    value: "km-agent-daemonset-config"
  - name: POD_NAMESPACE
    valueFrom:
      fieldRef:
        fieldPath: metadata.namespace
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

Service Account and RBAC

The Kubernetes agent requires specific permissions to monitor cluster resources:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: km-agent
  namespace: km-agent

Node Selection and Tolerations

The DaemonSet agent must run on all nodes, including those with taints:
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: km-agent-daemonset
spec:
  template:
    spec:
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
      # Custom tolerations can be added via Helm values
When installing on nodes with custom taints, specify tolerations using Helm parameters:
helm install kloudmate-release kloudmate/km-kube-agent \
  --set tolerations[0].key="env" \
  --set tolerations[0].operator="Equal" \
  --set tolerations[0].value="production" \
  --set tolerations[0].effect="NoSchedule"

GKE Private Clusters

For private GKE clusters, a firewall rule is required for webhook admission:
GKE Private Cluster Configuration:You need to allow master nodes access to port 9443/tcp on worker nodes for the admission webhook to function properly.See the GKE documentation for adding firewall rules.

Health Monitoring

The agent provides health endpoints for Kubernetes probes:
apiVersion: v1
kind: Pod
spec:
  containers:
    - name: km-agent
      livenessProbe:
        httpGet:
          path: /healthz
          port: 13133
        initialDelaySeconds: 30
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /readyz
          port: 13133
        initialDelaySeconds: 10
        periodSeconds: 5

Resource Requirements

Recommended resource limits for optimal performance:
resources:
  requests:
    memory: "256Mi"
    cpu: "100m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Monitoring Multiple Namespaces

The agent can monitor specific namespaces:
helm install kloudmate-release kloudmate/km-kube-agent \
  --set "monitoredNamespaces={bookinfo,mongodb,cassandra}" \
  --set featuresEnabled.apm=true
Namespaces should be comma-separated. The agent will automatically discover services and pods in these namespaces.

Main Entry Point

The Kubernetes agent has a simplified main function:
cmd/kubeagent/main.go
var (
    version = "0.1.0"
    commit  = "none"
)

func main() {
    appCtx, cancelAppCtx := context.WithCancel(context.Background())
    defer cancelAppCtx()
    
    agent, err := k8sagent.NewK8sAgent(
        &k8sagent.AgentInfo{Version: version, CommitSHA: commit})
    if err != nil {
        log.Fatal(err)
    }
    
    handleSignals(cancelAppCtx, agent)
    
    if err = agent.StartAgent(appCtx); err != nil {
        agent.Logger.Errorf("agent could not be started: %s", err.Error())
    }
    
    agent.AwaitShutdown()
}

Next Steps

Collector Lifecycle

Understand collector lifecycle management in Kubernetes

Deployment Guide

Deploy the agent to your Kubernetes cluster

Configuration

Configure the Kubernetes agent

Troubleshooting

Resolve common Kubernetes deployment issues

Build docs developers (and LLMs) love