Skip to main content
This guide helps you diagnose and resolve common issues with the KloudMate Agent across all supported platforms.

Quick Diagnostics

Check Status

Verify the agent is running

Review Logs

Examine recent log entries

Test Connectivity

Verify network access to endpoints

Installation Issues

Linux: Package Installation Fails

If dpkg or rpm reports missing dependencies:Debian/Ubuntu:
scripts/install_linux.sh
sudo apt-get install -f -y
RHEL/CentOS:
scripts/install_linux.sh
sudo dnf install -y kmagent
# or
sudo yum install -y kmagent
The installation script handles this automatically:
scripts/install_linux.sh
if ! sudo KM_API_KEY="$KM_API_KEY" KM_COLLECTOR_ENDPOINT="$KM_COLLECTOR_ENDPOINT" dpkg -i "$TMP_PACKAGE"; then
  echo "⚠️  Fixing dependencies with apt-get..."
  sudo apt-get install -f -y
fi
The installation requires both KM_API_KEY and KM_COLLECTOR_ENDPOINT:
scripts/install_linux.sh
if [ -z "$KM_API_KEY" ] || [ -z "$KM_COLLECTOR_ENDPOINT" ]; then
  echo "❌ KM_API_KEY and KM_COLLECTOR_ENDPOINT must be set as environment variables"
  exit 1
fi
Solution:
export KM_API_KEY="your-api-key"
export KM_COLLECTOR_ENDPOINT="https://otel.kloudmate.com:4318"
bash -c "$(curl -L https://cdn.kloudmate.com/scripts/install_linux.sh)"
If GitHub API is unreachable:
 Error: Could not fetch latest release information from GitHub.
Causes:
  • Network connectivity issues
  • GitHub API rate limiting
  • Firewall blocking GitHub access
Solution:
# Test GitHub connectivity
curl -s https://api.github.com/repos/kloudmate/km-agent/releases/latest

# If rate limited, wait and retry
# Or manually download the package from releases page
wget https://github.com/kloudmate/km-agent/releases/download/v1.0.0/kmagent_1.0.0_amd64.deb
sudo dpkg -i kmagent_1.0.0_amd64.deb

Docker: Container Fails to Start

scripts/install_docker.sh
if ! command -v docker &> /dev/null || ! docker --version &> /dev/null; then
  echo -e "\nDocker is not installed on the system"
  echo -e "\nPlease install docker first: https://docs.docker.com/engine/install/\n"
  exit 1
fi
Solution: Install Docker first
# Ubuntu/Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh

# Verify installation
docker --version
If you see “permission denied while trying to connect to Docker daemon”:Solution:
# Add user to docker group
sudo usermod -aG docker $USER

# Log out and back in, or run:
newgrp docker

# Verify
docker ps
The agent uses host networking mode and may conflict with existing services.Check for conflicts:
# Check if port 13133 is in use
sudo netstat -tlnp | grep 13133

# Stop conflicting container
docker stop km-agent
docker rm km-agent

Kubernetes: Pods Not Starting

If pods are stuck in ImagePullBackOff:
kubectl describe pod -n km-agent <pod-name>
Common causes:
  • Network issues pulling from ghcr.io
  • Image tag doesn’t exist
  • Authentication required
Solution:
# Verify image exists
docker pull ghcr.io/kloudmate/km-kube-agent:latest

# Check pod events
kubectl get events -n km-agent --sort-by='.lastTimestamp'

# Force image pull
kubectl rollout restart daemonset/km-agent -n km-agent
If pods continuously restart:Check logs:
kubectl logs -n km-agent <pod-name> --previous
Common issues:
  • Missing API_KEY configuration
  • Invalid collector endpoint
  • Resource limits too low
Solution:
# Verify Helm values
helm get values kloudmate-release -n km-agent

# Check for missing values
kubectl get configmap -n km-agent km-agent-configmap-daemonset -o yaml

# Update configuration
helm upgrade kloudmate-release kloudmate/km-kube-agent -n km-agent \
  --set API_KEY="your-api-key" \
  --set COLLECTOR_ENDPOINT="https://otel.kloudmate.com:4318"
If you see errors about Instrumentation CRD not found:Solution:
README.md
# Install CRD first
kubectl apply -f https://raw.githubusercontent.com/kloudmate/km-agent/refs/heads/develop/deployment/helm/km-kube-agent/crds/crd-otel-instrumentation.yaml

# Then install the agent
helm install kloudmate-release kloudmate/km-kube-agent --namespace km-agent --create-namespace
For private GKE clusters, webhook communication may fail:
README.md
# Add firewall rule for port 9443
gcloud compute firewall-rules create allow-webhook \
  --allow tcp:9443 \
  --source-ranges <master-cidr> \
  --target-tags <node-tag>
See GKE documentation for details.
If DaemonSet pods aren’t running on all nodes:Check node taints:
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Add tolerations:
README.md
helm install kloudmate-release kloudmate/km-kube-agent --namespace km-agent --create-namespace \
  --set tolerations[0].key="env" \
  --set tolerations[0].operator="Equal" \
  --set tolerations[0].value="production" \
  --set tolerations[0].effect="NoSchedule"

Runtime Issues

Agent Not Starting

1

Check Service Status

Linux:
sudo systemctl status kmagent
sudo journalctl -u kmagent -n 50
Docker:
docker ps -a -f name=km-agent
docker logs km-agent --tail 50
Kubernetes:
kubectl get pods -n km-agent
kubectl logs -n km-agent <pod-name> --tail 50
2

Verify Configuration

Check that required configuration is present:
# Linux
cat /etc/kmagent/config.yaml

# Docker
docker inspect km-agent | grep -A 10 Env

# Kubernetes
kubectl describe configmap -n km-agent km-agent-configmap-daemonset
3

Check File Permissions

Linux only:
# Binary should be executable
ls -la /usr/bin/kmagent

# Config should be readable
ls -la /etc/kmagent/config.yaml

Collector Continuously Restarting

The agent manages the collector lifecycle and will restart it if configuration changes or errors occur.
internal/agent/agent.go
runErr := collector.Run(ctx)
if runErr != nil {
    a.collectorError = runErr.Error()
    a.logger.Errorw("collector run loop exited with error", "error", runErr)
} else {
    a.collectorError = ""
    a.logger.Info("collector run loop exited normally")
}
Diagnose:
sudo journalctl -u kmagent | grep "collector run loop exited with error"
Common causes:
  • Invalid collector configuration
  • Port conflicts
  • Permission issues accessing resources
  • Invalid receiver configurations

Configuration Updates Not Applied

The agent checks for configuration updates periodically:
internal/agent/agent.go
if a.cfg.ConfigCheckInterval <= 0 {
    a.logger.Debug("config check interval not set, skipping update checks")
    return
}
If ConfigCheckInterval is 0 or negative, configuration updates are disabled.
Verify update settings:
# Check environment or config file
grep -i config_check /etc/kmagent/config.yaml
Monitor update checks:
# Look for these log messages
grep "config update checker started" /var/log/syslog
grep "checking for configuration updates" /var/log/syslog
grep "configuration changed, restarting collector" /var/log/syslog

Connectivity Issues

Cannot Reach Collector Endpoint

1

Test Network Connectivity

# Test HTTPS connectivity
curl -v https://otel.kloudmate.com:4318

# Test with API key
curl -v -H "Authorization: your-api-key" https://otel.kloudmate.com:4318
2

Check Firewall Rules

Ensure outbound HTTPS (443) and port 4318 are allowed:
# Check iptables (Linux)
sudo iptables -L OUTPUT -v -n

# Check firewalld (RHEL/CentOS)
sudo firewall-cmd --list-all
3

Verify DNS Resolution

nslookup otel.kloudmate.com
dig otel.kloudmate.com
4

Check Proxy Settings

If behind a proxy:
export HTTPS_PROXY=http://proxy.example.com:8080
export NO_PROXY=localhost,127.0.0.1

Configuration Update API Unreachable

The agent reports status to the update endpoint:
internal/updater/updater.go
req, err := http.NewRequestWithContext(reqCtx, "POST", u.cfg.ConfigUpdateURL, bytes.NewBuffer(jsonData))
Test connectivity:
curl -X POST https://api.kloudmate.com/agents/config-check \
  -H "Authorization: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"hostname":"test","platform":"linux"}'
Check for errors:
ERROR failed to fetch config updates after retries
ERROR config update API returned non-OK status: 401
ERROR Periodic config check failed

Performance Issues

High CPU Usage

Reduce collection scope or increase batch processing:
processors:
  batch:
    send_batch_size: 10000
    timeout: 10s
helm upgrade kloudmate-release kloudmate/km-kube-agent -n km-agent \
  --set resources.limits.cpu=1000m \
  --set resources.limits.memory=1Gi

High Memory Usage

Check current usage:
docker stats km-agent --no-stream
Possible causes:
  • Large batch sizes
  • Too many receivers enabled
  • Memory leaks (upgrade to latest version)

Data Collection Issues

No Metrics/Logs Appearing

1

Verify Receivers Are Enabled

Check collector configuration:
receivers:
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:
      memory:
      disk:

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
2

Check Pipeline Configuration

Ensure receivers are included in service pipelines:
# Linux
grep -A 20 "service:" /etc/kmagent/config.yaml
3

Verify Exporter Configuration

exporters:
  otlphttp:
    endpoint: ${env:KM_COLLECTOR_ENDPOINT}
    headers:
      Authorization: ${env:KM_API_KEY}
4

Test Data Export

Check collector logs for export errors:
grep -i "export" /var/log/kmagent.log
grep -i "error" /var/log/kmagent.log

Kubernetes Metrics Missing

For Kubernetes deployments, ensure both DaemonSet and Deployment are running:
kubectl get pods -n km-agent -o wide
The DaemonSet collects node-level metrics, while the Deployment collects cluster-level metrics.

Uninstallation Issues

Files Remain After Uninstall

The uninstall script cleans up residual files:
scripts/uninstall_linux.sh
sudo rm -f /usr/local/bin/kmagent
sudo rm -f /usr/bin/kmagent
sudo rm -rf /etc/kmagent/
sudo rm -rf /var/log/kmagent/
sudo rm -rf /var/lib/kmagent/
Verify cleanup:
find / -name '*kmagent*' 2>/dev/null
systemctl list-units | grep kmagent

Docker Container Won’t Remove

scripts/install_docker.sh
if [ "$(docker ps -aq -f name=km-agent)" ]; then
    docker stop km-agent
    docker rm km-agent
fi
Force removal:
docker rm -f km-agent
docker rmi ghcr.io/kloudmate/km-agent:latest

Getting Help

GitHub Issues

Report bugs and request features

Email Support

Community Slack

Get help from the community

Documentation

Comprehensive guides and references

Information to Include

When reporting issues, include:
1

Environment Details

  • Platform (Linux, Docker, Kubernetes, Windows)
  • OS version and distribution
  • Agent version
  • Collector version
2

Logs

# Linux
sudo journalctl -u kmagent -n 100 > agent-logs.txt

# Docker
docker logs km-agent > agent-logs.txt 2>&1

# Kubernetes
kubectl logs -n km-agent <pod-name> > agent-logs.txt
3

Configuration

Redact sensitive information (API keys) before sharing:
cat /etc/kmagent/config.yaml | sed 's/KM_API_KEY:.*/KM_API_KEY: [REDACTED]/'
4

Steps to Reproduce

Provide clear, detailed steps to reproduce the issue

Next Steps

Monitoring

Set up monitoring and health checks

Upgrading

Upgrade to the latest version

Build docs developers (and LLMs) love