Skip to main content

Overview

The OpenTelemetry Collector configuration defines how telemetry data is received, processed, and exported. KloudMate Agent includes platform-optimized configurations for host monitoring, Docker containers, and Kubernetes clusters.
All collector configurations follow the standard OpenTelemetry Collector configuration format.

Configuration Templates

KloudMate Agent ships with three pre-configured templates:

Host Mode

System monitoring for bare metal and VMs

Docker Mode

Container metrics and log collection

Kubernetes Mode

Pod, container, and cluster monitoring

Configuration Structure

Every collector configuration has four main sections:
Basic Structure
receivers:
  # Components that receive telemetry data
  
processors:
  # Components that transform telemetry data
  
exporters:
  # Components that send telemetry data
  
service:
  # Defines pipelines connecting receivers, processors, and exporters

Host Mode Configuration

Optimized for monitoring physical servers and virtual machines:
~/workspace/source/configs/host-col-config.yaml
receivers:
  hostmetrics:
    collection_interval: 60s
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      load:
        cpu_average: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
      filesystem:
        metrics:
          system.filesystem.usage:
            enabled: true
          system.filesystem.utilization:
            enabled: true
      disk: {}
      network: {}            
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

Host Receivers

Collects system-level metrics from the host.
collection_interval
duration
default:"60s"
How often to collect metrics from the system.
scrapers
object
required
Individual metric scrapers to enable:cpu: CPU utilization and frequency
  • system.cpu.utilization: Per-core and total CPU usage
  • system.cpu.frequency: Current CPU frequency
  • system.cpu.logical.count: Number of logical CPUs
load: System load averages
  • system.cpu.load_average.1m: 1-minute load average
  • system.cpu.load_average.5m: 5-minute load average
  • system.cpu.load_average.15m: 15-minute load average
memory: RAM utilization
  • system.memory.usage: Memory used by state (used, cached, free, etc.)
  • system.memory.utilization: Memory utilization percentage
filesystem: Disk usage
  • system.filesystem.usage: Bytes used per mount point
  • system.filesystem.utilization: Disk utilization percentage
disk: Disk I/O
  • system.disk.io: Read/write bytes
  • system.disk.operations: Read/write operations
network: Network I/O
  • system.network.io: Bytes transmitted/received
  • system.network.packets: Packets transmitted/received
  • system.network.errors: Network errors
Receives telemetry from instrumented applications via OTLP protocol.
protocols.grpc.endpoint
string
default:"0.0.0.0:4317"
gRPC endpoint for OTLP protocol.
protocols.http.endpoint
string
default:"0.0.0.0:4318"
HTTP endpoint for OTLP protocol.
Use cases:
  • Receiving traces from OpenTelemetry SDKs
  • Receiving metrics from application exporters
  • Receiving logs from log forwarders

Host Processors

~/workspace/source/configs/host-col-config.yaml:44-102
processors:
  batch:
    send_batch_size: 10000
    timeout: 10s
  resourcedetection:
    detectors: [system]
    system:
      resource_attributes:
        host.name:
          enabled: true
        host.id:
          enabled: true
        os.type:
          enabled: false
  resource:
    attributes:
      - key: service.name
        action: insert
        from_attribute: host.name
  cumulativetodelta:
    include:
      match_type: strict
      metrics:
        - system.network.io
        - system.disk.io
        - system.disk.operations.rate
        - system.network.packets.rate
        - system.network.errors.rate
        - system.network.dropped.rate
  deltatorate:
    metrics:
      - system.network.io
      - system.disk.io
      - system.disk.operations.rate
      - system.network.packets.rate
      - system.network.errors.rate
      - system.network.dropped.rate
Buffers telemetry data before sending to reduce network overhead.
send_batch_size
integer
default:"8192"
Number of spans/metrics/logs to batch before sending.
timeout
duration
default:"200ms"
Maximum time to wait before sending a batch (even if incomplete).
Optimization: Higher batch sizes reduce network overhead but increase memory usage and latency.
Automatically detects and adds resource attributes from the environment.
detectors
array
required
List of detection methods: system, env, docker, gcp, aws, azure
Detected attributes:
  • host.name: Hostname from OS
  • host.id: Unique host identifier
  • host.arch: CPU architecture
  • os.type: Operating system type
Adds, updates, or deletes resource attributes.
resource:
  attributes:
    - key: service.name
      action: insert
      from_attribute: host.name
    - key: environment
      action: insert
      value: production
Actions:
  • insert: Add attribute if not exists
  • update: Update attribute if exists
  • upsert: Insert or update
  • delete: Remove attribute
Converts cumulative metrics (counters) to delta values for rate calculation.
include.metrics
array
required
List of metric names to convert.
Required for: Network I/O, disk I/O, and other counter metrics that need rate calculation.
Converts delta values to per-second rates.Example: Converts “bytes transferred in last minute” to “bytes per second”.
Advanced metric manipulation using transformation language.
transform/ratecalculation/copymetric:
  error_mode: ignore
  metric_statements:
    - context: metric
      statements:
        - copy_metric(name="system.network.io.rate") where metric.name == "system.network.io"
See full implementation in ~/workspace/source/configs/host-col-config.yaml:81-102

Host Exporters

~/workspace/source/configs/host-col-config.yaml:32-42
exporters:
  debug:
    verbosity: basic
  otlphttp:
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 10000
    endpoint: ${env:KM_COLLECTOR_ENDPOINT}
    headers:
        Authorization: ${env:KM_API_KEY}
Logs telemetry data to console for debugging.
verbosity
string
default:"basic"
Logging verbosity: basic, normal, detailed
Disable in production to avoid performance impact.
Sends telemetry to KloudMate backend via OTLP/HTTP protocol.
endpoint
string
required
Backend endpoint URL. Uses ${env:KM_COLLECTOR_ENDPOINT} environment variable.
headers
object
HTTP headers sent with every request. Typically includes Authorization for API key.
sending_queue.enabled
boolean
default:"true"
Enable persistent queue for retry on failure.
sending_queue.num_consumers
integer
default:"10"
Number of parallel workers sending data.
sending_queue.queue_size
integer
default:"5000"
Maximum number of batches to queue before dropping.

Host Pipelines

~/workspace/source/configs/host-col-config.yaml:107-124
service:
  telemetry:
    metrics:
      level: none
  pipelines:
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [resourcedetection, resource, cumulativetodelta, deltatorate, transform/ratecalculation/sumtogauge, transform/ratecalculation/copymetric, batch]
      exporters: [debug, otlphttp]
    logs:
      receivers: [otlp]
      processors: [resourcedetection, resource, batch]
      exporters: [debug, otlphttp]
    traces:
      receivers: [otlp]
      processors: [resourcedetection, resource, batch]
      exporters: [debug, otlphttp]

Docker Mode Configuration

Optimized for monitoring Docker containers and host system:
~/workspace/source/configs/docker-col-config.yaml:22-25
receivers:
  docker_stats:
    endpoint: "unix:///var/run/docker.sock"
    collection_interval: 60s
    timeout: 10s

Docker-Specific Receivers

Collects metrics from running Docker containers via Docker API.
endpoint
string
required
Docker daemon socket endpoint.Default: unix:///var/run/docker.sock
collection_interval
duration
default:"60s"
How often to collect container metrics.
timeout
duration
default:"10s"
Timeout for Docker API requests.
Collected metrics:
  • container.cpu.usage.total: Container CPU usage
  • container.memory.usage.total: Container memory usage
  • container.memory.usage.limit: Container memory limit
  • container.network.io.usage.rx_bytes: Network bytes received
  • container.network.io.usage.tx_bytes: Network bytes transmitted
  • container.blockio.io_service_bytes_recursive: Disk I/O
Requires Docker socket to be mounted: -v /var/run/docker.sock:/var/run/docker.sock:ro
Collects logs from container log files and system logs.
~/workspace/source/configs/docker-col-config.yaml:3-20
filelog:
  include:
    - /var/log/**/*.log
    - /var/lib/docker/containers/**/*.log*
    - ${env:FILELOG_PATHS}
  exclude:
    - /var/log/kmagent*/**/*.log
    - '**/*.gz'
    - '**/*.zip'
    - '**/*.tar'
    - '**/*.xz'
    - '**/.*'
    - '**/*.tmp'
    - '**/*~'
  include_file_name_resolved: true
  include_file_path: true
  include_file_path_resolved: true
  max_log_size: "1MiB"
include
array
required
Glob patterns for log files to monitor.
exclude
array
Glob patterns for log files to ignore.
max_log_size
string
default:"1MiB"
Maximum size of a single log entry.
Collects host-level metrics from the Docker host.
~/workspace/source/configs/docker-col-config.yaml:27-85
hostmetrics:
  collection_interval: 60s
  root_path: /hostfs
  scrapers:
    cpu:
    memory:
    disk:
    network:
    filesystem:
      exclude_fs_types:
        fs_types:
          - autofs
          - binfmt_misc
          - overlay
          - proc
          - sysfs
      exclude_mount_points:
        mount_points:
          - /dev/*
          - /sys/*
          - /var/lib/docker/*
root_path
string
default:"/"
Root filesystem path. In Docker, the host filesystem is typically mounted at /hostfs.
Requires host filesystem mount: -v /:/hostfs:ro

Docker Pipelines

~/workspace/source/configs/docker-col-config.yaml:162-170
service:
  pipelines:
    metrics:
      receivers: [hostmetrics, docker_stats]
      processors: [resourcedetection, resource, cumulativetodelta, deltatorate, transform/ratecalculation/sumtogauge, transform/ratecalculation/copymetric, batch]
      exporters: [otlphttp, debug]
    logs:
      receivers: [filelog]
      processors: [resourcedetection, resource, batch]
      exporters: [otlphttp, debug]

Kubernetes Mode Configuration

Optimized for Kubernetes cluster monitoring via DaemonSet:

Kubernetes Receivers

Collects metrics from Kubelet Stats API.
~/workspace/source/configs/daemonset-col-config.yaml:511-543
kubeletstats:
  auth_type: serviceAccount
  collection_interval: 30s
  endpoint: ${env:KM_NODE_NAME}:10250
  extra_metadata_labels:
    - container.id
  insecure_skip_verify: true
  k8s_api_config:
    auth_type: serviceAccount
  metric_groups:
    - volume
    - node
    - pod
    - container
  metrics:
    k8s.container.cpu_limit_utilization:
      enabled: true
    k8s.container.cpu_request_utilization:
      enabled: true
    k8s.container.memory_limit_utilization:
      enabled: true
    k8s.container.memory_request_utilization:
      enabled: true
auth_type
string
default:"serviceAccount"
Authentication method for Kubelet API: serviceAccount, tls, none
endpoint
string
required
Kubelet endpoint. Uses ${env:KM_NODE_NAME}:10250 to connect to local node’s Kubelet.
metric_groups
array
required
Groups of metrics to collect: node, pod, container, volume
Collected metrics:
  • k8s.node.cpu.utilization: Node CPU usage
  • k8s.node.memory.usage: Node memory usage
  • k8s.pod.cpu.utilization: Pod CPU usage
  • k8s.pod.memory.usage: Pod memory usage
  • k8s.container.cpu.utilization: Container CPU usage
  • k8s.container.memory.usage: Container memory usage
  • k8s.volume.available: Volume available bytes
  • k8s.volume.capacity: Volume total capacity
Collects logs from Kubernetes pod containers.
~/workspace/source/configs/daemonset-col-config.yaml:248-257
filelog/containers:
  exclude:
    - /**/*.gz
    - /var/log/pods/km-agent_*/**/*.log
    - ${env:KM_XLOG_PATHS:-/___no_exclude___}
  include:
    - /var/log/pods/*/*/*.log
  include_file_name_resolved: true
  include_file_path: true
  include_file_path_resolved: true
  max_log_size: 1MiB
include
array
required
Glob patterns for pod log files. Default: /var/log/pods/*/*/*.log
exclude
array
Patterns to exclude (agent’s own logs, compressed files, etc.)
Log operators: The receiver includes 20+ operators for parsing JSON logs, extracting timestamps, log levels, trace context, and more.See full implementation in ~/workspace/source/configs/daemonset-col-config.yaml:259-424
Collects host-level metrics from Kubernetes nodes.
~/workspace/source/configs/daemonset-col-config.yaml:425-506
hostmetrics:
  collection_interval: 60s
  scrapers:
    cpu:
    disk:
    filesystem:
      exclude_fs_types:
        fs_types:
          - overlay
          - proc
          - sysfs
      exclude_mount_points:
        mount_points:
          - /var/lib/kubelet/*
    load:
    memory:
    network:
    paging:
    process:
    processes:
    system:

Kubernetes Processors

Adds Kubernetes-specific attributes to telemetry data.
~/workspace/source/configs/daemonset-col-config.yaml:80-108
k8sattributes:
  auth_type: serviceAccount
  passthrough: false
  filter:
    node_from_env_var: KM_NODE_NAME
  extract:
    metadata:
      - k8s.pod.name
      - k8s.pod.uid
      - k8s.deployment.name
      - k8s.namespace.name
      - k8s.node.name
      - k8s.pod.start_time
      - k8s.statefulset.uid
      - k8s.replicaset.uid
      - k8s.daemonset.uid
      - k8s.deployment.uid
      - k8s.job.uid
      - k8s.pod.ip
  pod_association:
    - sources:
        - from: resource_attribute
          name: k8s.pod.uid
auth_type
string
default:"serviceAccount"
Kubernetes API authentication method.
filter.node_from_env_var
string
Filter to only process pods from the node specified in this environment variable (for DaemonSet deployments).
extract.metadata
array
required
List of Kubernetes metadata attributes to extract and add to telemetry.
Extracted attributes:
  • Pod: name, UID, namespace, IP, start time
  • Owner: Deployment, DaemonSet, StatefulSet, ReplicaSet, Job
  • Node: name
  • Container: name
Adds or modifies specific attributes.
~/workspace/source/configs/daemonset-col-config.yaml:10-46
attributes/metrics:
  actions:
    - key: k8s.cluster.name
      value: ${env:KM_CLUSTER_NAME}
      action: insert

attributes/logs:
  actions:
    - key: k8s.cluster.name
      from_attribute: k8s.cluster.name
      action: upsert
    - key: service.namespace
      from_attribute: k8s.namespace.name
      action: upsert
Common use cases:
  • Add cluster name to all metrics
  • Copy Kubernetes namespace to service.namespace
  • Set service instance ID from pod UID
Advanced metric manipulation for Kubernetes-specific calculations.
~/workspace/source/configs/daemonset-col-config.yaml:109-154
metricstransform/system:
  transforms:
    - action: insert
      experimental_match_labels:
        os.type: linux
      include: system.memory.utilization
      match_type: strict
      new_name: system.memory.utilization.consumed
      operations:
        - action: aggregate_label_values
          aggregated_values:
            - used
            - cached
          aggregation_type: sum
          label: state
          new_value: consumed
Example: Combines “used” and “cached” memory states into a single “consumed” metric for Linux systems.
Uses OpenTelemetry Transformation Language for complex log and metric processing.
~/workspace/source/configs/daemonset-col-config.yaml:193-206
transform/addservicename:
  error_mode: ignore
  log_statements:
    - context: log
      statements:
        - set(attributes["service.name"], resource.attributes["k8s.container.name"]) where resource.attributes["k8s.container.name"] != nil
Common transformations:
  • Extract service name from container name
  • Copy attributes between resource and log/metric contexts
  • Delete temporary attributes
Groups logs by specific attributes for better correlation.
~/workspace/source/configs/daemonset-col-config.yaml:77-79
groupbyattrs/filelog:
  keys:
    - k8s.pod.uid
Use case: Groups logs from the same pod together for better trace correlation.

Kubernetes Pipelines

~/workspace/source/configs/daemonset-col-config.yaml:544-602
service:
  pipelines:
    logs/containers:
      receivers:
        - filelog/containers
      processors:
        - resource
        - resource/add_node_name
        - resource/cluster
        - attributes/logs
        - groupbyattrs/filelog
        - k8sattributes
        - transform/addservicename
        - transform/copyservicefromlogattributes
        - batch
      exporters:
        - otlphttp
    
    metrics/hostmetrics:
      receivers:
        - hostmetrics
      processors:
        - resourcedetection
        - resource
        - resource/hostmetrics
        - resource/cluster
        - k8sattributes
        - transform/ostype
        - attributes/logs
        - metricstransform/system
        - transform/deleteostype
        - attributes/metrics
        - transform/ratecalculation/copymetric
        - cumulativetodelta
        - deltatorate
        - transform/ratecalculation/sumtogauge
        - batch
      exporters:
        - otlphttp
    
    metrics/kubeletstats:
      receivers:
        - kubeletstats
      processors:
        - resourcedetection
        - resource/add_node_name
        - resource
        - k8sattributes
        - resource/cluster
        - attributes/logs
        - transform/ratecalculation/copymetric
        - transform/ratecalculation/sumtogauge
        - attributes/metrics
        - batch
      exporters:
        - otlphttp

Advanced Configuration

Custom Processors

You can add custom processors to existing pipelines:
processors:
  probabilistic_sampler:
    sampling_percentage: 10  # Sample 10% of traces

service:
  pipelines:
    traces:
      processors: [probabilistic_sampler, batch, resourcedetection]

Resource Detection

Enable additional resource detectors:
Extended Resource Detection
processors:
  resourcedetection:
    detectors:
      - env        # Environment variables
      - system     # Host information
      - docker     # Docker container info
      - gcp        # Google Cloud Platform
      - ec2        # AWS EC2
      - ecs        # AWS ECS
      - azure      # Azure
    timeout: 5s
    override: false

Multi-Backend Exporters

Send telemetry to multiple backends:
Multiple Exporters
exporters:
  otlphttp/kloudmate:
    endpoint: ${env:KM_COLLECTOR_ENDPOINT}
    headers:
      Authorization: ${env:KM_API_KEY}
  
  otlphttp/backup:
    endpoint: https://backup-collector.company.com:4318
    headers:
      Authorization: ${env:BACKUP_API_KEY}
  
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: km_agent

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      processors: [batch]
      exporters: [otlphttp/kloudmate, otlphttp/backup, prometheus]

Memory Limiter

Prevent OOM by limiting memory usage:
Memory Protection
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128

service:
  pipelines:
    metrics:
      processors: [memory_limiter, batch, resourcedetection]

Configuration Validation

Validate your configuration before deploying:
1

Syntax Validation

Check YAML syntax:
yamllint config.yaml
2

Schema Validation

Test configuration loading:
kmagent start --config=/path/to/config.yaml --dry-run
3

Component Validation

Verify all referenced components are available:
otelcol validate --config=/path/to/config.yaml
Always validate configurations in a test environment before deploying to production.

Environment Variable Reference

All collector configurations use these environment variables:
# Exporter endpoint
export KM_COLLECTOR_ENDPOINT="https://otel.kloudmate.com:4318"

# API key for authentication
export KM_API_KEY="your-api-key"

Troubleshooting

Symptom: Collector fails to start with configuration errorCommon causes:
  • Invalid YAML syntax
  • Referenced component not available in distribution
  • Required environment variables not set
  • Exporter endpoint unreachable
Solution:
# Check logs
journalctl -u kmagent -n 50 --no-pager

# Validate YAML
yamllint /etc/kmagent/config.yaml

# Test connectivity
curl -v ${KM_COLLECTOR_ENDPOINT}
Symptom: Collector runs but no data appears in KloudMateCommon causes:
  • Receivers not configured correctly
  • Pipeline not connecting receivers to exporters
  • Batch processor timeout too long
  • Network/firewall blocking exporter
Solution:
# Enable debug exporter temporarily
# Add to exporters section:
exporters:
  debug:
    verbosity: detailed

# Add to pipeline:
service:
  pipelines:
    metrics:
      exporters: [debug, otlphttp]

# Check output
journalctl -u kmagent -f | grep -A 5 "ResourceMetrics"
Symptom: Collector consuming excessive memoryCommon causes:
  • Batch size too large
  • Queue size too large
  • No memory limiter configured
  • Too many metrics/logs being collected
Solution:
# Add memory limiter
processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

# Reduce batch size
processors:
  batch:
    send_batch_size: 1000
    timeout: 5s

# Reduce queue size
exporters:
  otlphttp:
    sending_queue:
      queue_size: 1000
Symptom: Metrics/logs missing Kubernetes metadataCommon causes:
  • k8sattributes processor not in pipeline
  • ServiceAccount lacks required permissions
  • Pod association not configured
  • Wrong node filter in DaemonSet
Solution:
# Check ServiceAccount permissions
kubectl describe serviceaccount km-agent -n km-agent

# Check ClusterRole
kubectl describe clusterrole km-agent

# Verify k8sattributes processor
kubectl logs -n km-agent ds/km-agent | grep k8sattributes

Best Practices

Start Simple

Begin with minimal configuration and add components incrementally.

Use Memory Limiter

Always configure memory_limiter processor to prevent OOM.

Batch Aggressively

Use larger batch sizes (5000-10000) to reduce network overhead.

Monitor Collector

Enable collector’s own telemetry to monitor performance.

Test Changes

Always validate configuration changes in non-production first.

Version Control

Store configurations in Git for change tracking.

Next Steps

Agent Configuration

Configure agent behavior and authentication

Configuration Structure

Understand configuration architecture

Build docs developers (and LLMs) love