This guide helps you diagnose and resolve common issues with the KloudMate Agent across all supported platforms.
Quick Diagnostics
Check Status Verify the agent is running
Review Logs Examine recent log entries
Test Connectivity Verify network access to endpoints
Installation Issues
Linux: Package Installation Fails
If dpkg or rpm reports missing dependencies: Debian/Ubuntu: sudo apt-get install -f -y
RHEL/CentOS: sudo dnf install -y kmagent
# or
sudo yum install -y kmagent
The installation script handles this automatically: if ! sudo KM_API_KEY=" $KM_API_KEY " KM_COLLECTOR_ENDPOINT=" $KM_COLLECTOR_ENDPOINT " dpkg -i " $TMP_PACKAGE " ; then
echo "⚠️ Fixing dependencies with apt-get..."
sudo apt-get install -f -y
fi
API Key or Endpoint Not Set
The installation requires both KM_API_KEY and KM_COLLECTOR_ENDPOINT: if [ -z " $KM_API_KEY " ] || [ -z " $KM_COLLECTOR_ENDPOINT " ]; then
echo "❌ KM_API_KEY and KM_COLLECTOR_ENDPOINT must be set as environment variables"
exit 1
fi
Solution: export KM_API_KEY = "your-api-key"
export KM_COLLECTOR_ENDPOINT = "https://otel.kloudmate.com:4318"
bash -c "$( curl -L https://cdn.kloudmate.com/scripts/install_linux.sh)"
Unable to Fetch Latest Release
If GitHub API is unreachable: ❌ Error: Could not fetch latest release information from GitHub.
Causes:
Network connectivity issues
GitHub API rate limiting
Firewall blocking GitHub access
Solution: # Test GitHub connectivity
curl -s https://api.github.com/repos/kloudmate/km-agent/releases/latest
# If rate limited, wait and retry
# Or manually download the package from releases page
wget https://github.com/kloudmate/km-agent/releases/download/v1.0.0/kmagent_1.0.0_amd64.deb
sudo dpkg -i kmagent_1.0.0_amd64.deb
Docker: Container Fails to Start
scripts/install_docker.sh
if ! command -v docker & > /dev/null || ! docker --version & > /dev/null; then
echo -e "\nDocker is not installed on the system"
echo -e "\nPlease install docker first: https://docs.docker.com/engine/install/\n"
exit 1
fi
Solution: Install Docker first# Ubuntu/Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Verify installation
docker --version
If you see “permission denied while trying to connect to Docker daemon”: Solution: # Add user to docker group
sudo usermod -aG docker $USER
# Log out and back in, or run:
newgrp docker
# Verify
docker ps
The agent uses host networking mode and may conflict with existing services. Check for conflicts: # Check if port 13133 is in use
sudo netstat -tlnp | grep 13133
# Stop conflicting container
docker stop km-agent
docker rm km-agent
Kubernetes: Pods Not Starting
If pods are stuck in ImagePullBackOff: kubectl describe pod -n km-agent < pod-nam e >
Common causes:
Network issues pulling from ghcr.io
Image tag doesn’t exist
Authentication required
Solution: # Verify image exists
docker pull ghcr.io/kloudmate/km-kube-agent:latest
# Check pod events
kubectl get events -n km-agent --sort-by= '.lastTimestamp'
# Force image pull
kubectl rollout restart daemonset/km-agent -n km-agent
If pods continuously restart: Check logs: kubectl logs -n km-agent < pod-nam e > --previous
Common issues:
Missing API_KEY configuration
Invalid collector endpoint
Resource limits too low
Solution: # Verify Helm values
helm get values kloudmate-release -n km-agent
# Check for missing values
kubectl get configmap -n km-agent km-agent-configmap-daemonset -o yaml
# Update configuration
helm upgrade kloudmate-release kloudmate/km-kube-agent -n km-agent \
--set API_KEY="your-api-key" \
--set COLLECTOR_ENDPOINT="https://otel.kloudmate.com:4318"
If you see errors about Instrumentation CRD not found: Solution: # Install CRD first
kubectl apply -f https://raw.githubusercontent.com/kloudmate/km-agent/refs/heads/develop/deployment/helm/km-kube-agent/crds/crd-otel-instrumentation.yaml
# Then install the agent
helm install kloudmate-release kloudmate/km-kube-agent --namespace km-agent --create-namespace
GKE Private Cluster Webhook Issues
For private GKE clusters, webhook communication may fail: # Add firewall rule for port 9443
gcloud compute firewall-rules create allow-webhook \
--allow tcp:9443 \
--source-ranges < master-cid r > \
--target-tags < node-ta g >
See GKE documentation for details.
Pods Not Scheduled on Tainted Nodes
If DaemonSet pods aren’t running on all nodes: Check node taints: kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
Add tolerations: helm install kloudmate-release kloudmate/km-kube-agent --namespace km-agent --create-namespace \
--set tolerations[0].key="env" \
--set tolerations[0].operator="Equal" \
--set tolerations[0].value="production" \
--set tolerations[0].effect="NoSchedule"
Runtime Issues
Agent Not Starting
Check Service Status
Linux: sudo systemctl status kmagent
sudo journalctl -u kmagent -n 50
Docker: docker ps -a -f name=km-agent
docker logs km-agent --tail 50
Kubernetes: kubectl get pods -n km-agent
kubectl logs -n km-agent < pod-nam e > --tail 50
Verify Configuration
Check that required configuration is present: # Linux
cat /etc/kmagent/config.yaml
# Docker
docker inspect km-agent | grep -A 10 Env
# Kubernetes
kubectl describe configmap -n km-agent km-agent-configmap-daemonset
Check File Permissions
Linux only: # Binary should be executable
ls -la /usr/bin/kmagent
# Config should be readable
ls -la /etc/kmagent/config.yaml
Collector Continuously Restarting
The agent manages the collector lifecycle and will restart it if configuration changes or errors occur.
runErr := collector . Run ( ctx )
if runErr != nil {
a . collectorError = runErr . Error ()
a . logger . Errorw ( "collector run loop exited with error" , "error" , runErr )
} else {
a . collectorError = ""
a . logger . Info ( "collector run loop exited normally" )
}
Diagnose:
sudo journalctl -u kmagent | grep "collector run loop exited with error"
Common causes:
Invalid collector configuration
Port conflicts
Permission issues accessing resources
Invalid receiver configurations
Configuration Updates Not Applied
The agent checks for configuration updates periodically:
if a . cfg . ConfigCheckInterval <= 0 {
a . logger . Debug ( "config check interval not set, skipping update checks" )
return
}
If ConfigCheckInterval is 0 or negative, configuration updates are disabled .
Verify update settings:
# Check environment or config file
grep -i config_check /etc/kmagent/config.yaml
Monitor update checks:
# Look for these log messages
grep "config update checker started" /var/log/syslog
grep "checking for configuration updates" /var/log/syslog
grep "configuration changed, restarting collector" /var/log/syslog
Connectivity Issues
Cannot Reach Collector Endpoint
Test Network Connectivity
# Test HTTPS connectivity
curl -v https://otel.kloudmate.com:4318
# Test with API key
curl -v -H "Authorization: your-api-key" https://otel.kloudmate.com:4318
Check Firewall Rules
Ensure outbound HTTPS (443) and port 4318 are allowed: # Check iptables (Linux)
sudo iptables -L OUTPUT -v -n
# Check firewalld (RHEL/CentOS)
sudo firewall-cmd --list-all
Verify DNS Resolution
nslookup otel.kloudmate.com
dig otel.kloudmate.com
Check Proxy Settings
If behind a proxy: export HTTPS_PROXY = http :// proxy . example . com : 8080
export NO_PROXY = localhost , 127 . 0 . 0 . 1
Configuration Update API Unreachable
The agent reports status to the update endpoint:
internal/updater/updater.go
req , err := http . NewRequestWithContext ( reqCtx , "POST" , u . cfg . ConfigUpdateURL , bytes . NewBuffer ( jsonData ))
Test connectivity:
curl -X POST https://api.kloudmate.com/agents/config-check \
-H "Authorization: your-api-key" \
-H "Content-Type: application/json" \
-d '{"hostname":"test","platform":"linux"}'
Check for errors:
ERROR failed to fetch config updates after retries
ERROR config update API returned non-OK status: 401
ERROR Periodic config check failed
High CPU Usage
Reduce collection scope or increase batch processing: processors :
batch :
send_batch_size : 10000
timeout : 10s
Increase Resource Limits (Kubernetes)
helm upgrade kloudmate-release kloudmate/km-kube-agent -n km-agent \
--set resources.limits.cpu=1000m \
--set resources.limits.memory=1Gi
High Memory Usage
Check current usage:
docker stats km-agent --no-stream
Possible causes:
Large batch sizes
Too many receivers enabled
Memory leaks (upgrade to latest version)
Data Collection Issues
No Metrics/Logs Appearing
Verify Receivers Are Enabled
Check collector configuration: receivers :
hostmetrics :
collection_interval : 30s
scrapers :
cpu :
memory :
disk :
service :
pipelines :
metrics :
receivers : [ hostmetrics ]
Check Pipeline Configuration
Ensure receivers are included in service pipelines: # Linux
grep -A 20 "service:" /etc/kmagent/config.yaml
Verify Exporter Configuration
exporters :
otlphttp :
endpoint : ${env:KM_COLLECTOR_ENDPOINT}
headers :
Authorization : ${env:KM_API_KEY}
Test Data Export
Check collector logs for export errors: grep -i "export" /var/log/kmagent.log
grep -i "error" /var/log/kmagent.log
Kubernetes Metrics Missing
For Kubernetes deployments, ensure both DaemonSet and Deployment are running:
kubectl get pods -n km-agent -o wide
The DaemonSet collects node-level metrics, while the Deployment collects cluster-level metrics.
Uninstallation Issues
Files Remain After Uninstall
The uninstall script cleans up residual files:
scripts/uninstall_linux.sh
sudo rm -f /usr/local/bin/kmagent
sudo rm -f /usr/bin/kmagent
sudo rm -rf /etc/kmagent/
sudo rm -rf /var/log/kmagent/
sudo rm -rf /var/lib/kmagent/
Verify cleanup:
find / -name '*kmagent*' 2> /dev/null
systemctl list-units | grep kmagent
Docker Container Won’t Remove
scripts/install_docker.sh
if [ "$( docker ps -aq -f name=km-agent)" ]; then
docker stop km-agent
docker rm km-agent
fi
Force removal:
docker rm -f km-agent
docker rmi ghcr.io/kloudmate/km-agent:latest
Getting Help
GitHub Issues Report bugs and request features
Community Slack Get help from the community
Documentation Comprehensive guides and references
When reporting issues, include:
Environment Details
Platform (Linux, Docker, Kubernetes, Windows)
OS version and distribution
Agent version
Collector version
Logs
# Linux
sudo journalctl -u kmagent -n 100 > agent-logs.txt
# Docker
docker logs km-agent > agent-logs.txt 2>&1
# Kubernetes
kubectl logs -n km-agent < pod-nam e > > agent-logs.txt
Configuration
Redact sensitive information (API keys) before sharing: cat /etc/kmagent/config.yaml | sed 's/KM_API_KEY:.*/KM_API_KEY: [REDACTED]/'
Steps to Reproduce
Provide clear, detailed steps to reproduce the issue
Next Steps
Monitoring Set up monitoring and health checks
Upgrading Upgrade to the latest version