Skip to main content
Clanker provides comprehensive monitoring capabilities to track cluster health and resource utilization.

Overview

Monitoring features include:
  • Node metrics: CPU and memory usage per node
  • Pod metrics: Resource consumption by pods
  • Cluster statistics: Aggregate cluster-wide metrics
  • Container metrics: Per-container resource usage
  • Logs access: View pod and container logs
Metrics require the Kubernetes Metrics Server to be installed in your cluster.

Node metrics

View resource usage for cluster nodes:
# Get metrics for all nodes
clanker k8s stats nodes

# Sort by CPU usage
clanker k8s stats nodes --sort-by cpu

# Sort by memory usage
clanker k8s stats nodes --sort-by memory

# JSON output
clanker k8s stats nodes -o json
Example output:
NAME                                       CPU      CPU%      MEMORY    MEM%
ip-10-0-1-100.us-west-2.compute.internal   245m     12.3%     1456Mi    18.2%
ip-10-0-1-101.us-west-2.compute.internal   189m     9.5%      1123Mi    14.0%
ip-10-0-1-102.us-west-2.compute.internal   312m     15.6%     1789Mi    22.4%
--sort-by
string
Sort results by cpu or memory
-o, --output
string
default:"table"
Output format: table, json, or yaml

Pod metrics

Monitor resource consumption at the pod level:
# Get metrics for pods in default namespace
clanker k8s stats pods

# Get metrics for specific namespace
clanker k8s stats pods -n kube-system

# Get metrics for all namespaces
clanker k8s stats pods -A

# Sort by memory usage
clanker k8s stats pods --sort-by memory
Example output:
NAME                                CPU        MEMORY
nginx-7c6c8f9f5d-4xkzp              5m         32Mi
api-server-5d7b9c8d4f-8hjkl         42m        128Mi
redis-master-0                      8m         64Mi

Specific pod metrics

View detailed metrics for a single pod:
# Get pod metrics
clanker k8s stats pod nginx-7c6c8f9f5d-4xkzp

# Include container-level metrics
clanker k8s stats pod nginx-7c6c8f9f5d-4xkzp --containers

# Different namespace
clanker k8s stats pod coredns-5d78c9869d-abc12 -n kube-system
Example with containers:
Pod: default/nginx-7c6c8f9f5d-4xkzp
  CPU: 5m
  Memory: 32Mi

Containers:
  nginx: CPU 5m, Memory 32Mi

Cluster-wide metrics

Get aggregated statistics for the entire cluster:
# View cluster totals
clanker k8s stats cluster

# JSON output for automation
clanker k8s stats cluster -o json
Example output:
Cluster Metrics:
  Nodes: 3 (Ready: 3)
  CPU: 746m / 6000m (12.4%)
  Memory: 4368Mi / 24000Mi (18.2%)

Node Details:
NAME                            CPU      CPU%    MEMORY   MEM%
ip-10-0-1-100                   245m     12.3%   1456Mi   18.2%
ip-10-0-1-101                   189m     9.5%    1123Mi   14.0%
ip-10-0-1-102                   312m     15.6%   1789Mi   22.4%

Viewing logs

Access pod and container logs:
# Get recent logs (default: 100 lines)
clanker k8s logs nginx-7c6c8f9f5d-4xkzp

# Specify number of lines
clanker k8s logs api-server-abc123 --tail 50

# Follow logs (stream)
clanker k8s logs api-server-abc123 -f

# Logs from specific container
clanker k8s logs multi-container-pod -c sidecar

# Previous terminated container
clanker k8s logs crashing-pod -p

# Logs since duration
clanker k8s logs api-server-abc123 --since 1h

# Include timestamps
clanker k8s logs api-server-abc123 --timestamps

# All containers in pod
clanker k8s logs multi-container-pod --all-containers

Log options

-c, --container
string
Container name (for multi-container pods)
-f, --follow
boolean
Stream logs in real-time
-p, --previous
boolean
Show logs from previous container instance
--tail
integer
default:"100"
Number of lines to show from end of logs
--since
string
Show logs since duration (e.g., 1h, 30m, 10s)
--timestamps
boolean
Include timestamps in output
--all-containers
boolean
Show logs from all containers

Cluster resources

Retrieve comprehensive resource information:
# Get all resources from specific cluster
clanker k8s resources --cluster my-cluster

# YAML output
clanker k8s resources --cluster my-cluster -o yaml

# Get resources from all EKS clusters in region
clanker k8s resources
This fetches:
  • Nodes
  • Pods
  • Services
  • Persistent Volumes
  • ConfigMaps

Metrics implementation

Metrics are collected via the Kubernetes Metrics Server:
cmd/k8s.go:1552
func runStatsNodes(cmd *cobra.Command, args []string) error {
	ctx := context.Background()

	// Run kubectl top nodes
	kubectlArgs := []string{"top", "nodes"}
	kubectlCmd := exec.CommandContext(ctx, "kubectl", kubectlArgs...)
	output, err := kubectlCmd.CombinedOutput()
	if err != nil {
		return fmt.Errorf("failed to get node metrics: %w\n%s", err, string(output))
	}

	if k8sOutputFormat == "json" || k8sOutputFormat == "yaml" {
		metrics := parseNodeMetricsOutput(string(output))
		formatted, _ := json.MarshalIndent(metrics, "", "  ")
		fmt.Println(string(formatted))
	} else {
		fmt.Print(string(output))
	}

	return nil
}
Telemetry subsystem handles advanced metrics queries:
internal/k8s/telemetry/telemetry.go:111
func (s *SubAgent) handleClusterMetrics(ctx context.Context, opts QueryOptions) (*Response, error) {
	result, err := s.metrics.GetClusterMetrics(ctx)
	if err != nil {
		return &Response{
			Type:    ResponseTypeError,
			Message: fmt.Sprintf("Failed to get cluster metrics: %v", err),
			Error:   err,
		}, nil
	}

	return &Response{
		Type: ResponseTypeResult,
		Data: result,
		Message: fmt.Sprintf("Cluster metrics: %d nodes, CPU %s/%s (%.1f%%), Memory %s/%s (%.1f%%)",
			result.NodeCount, result.UsedCPU, result.TotalCPU, result.CPUPercent,
			result.UsedMemory, result.TotalMemory, result.MemoryPercent),
	}, nil
}

Installing Metrics Server

If metrics are unavailable, install the Metrics Server:
# Apply Metrics Server manifest
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify installation
kubectl get deployment metrics-server -n kube-system

# Wait for metrics to be available
kubectl top nodes
EKS and GKE clusters may require additional configuration for Metrics Server.

Monitoring best practices

Set up alerts: Use metrics to establish baseline performance and alert on anomalies.
Monitor trends: Track resource usage over time to identify capacity planning needs.
Check logs regularly: Review application logs for errors and warnings.
Use namespaces: Organize workloads by namespace for easier monitoring.

Advanced monitoring

Prometheus and Grafana

For production monitoring, integrate with Prometheus:
# Install Prometheus using Helm
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack

# Access Grafana dashboard
kubectl port-forward svc/prometheus-grafana 3000:80

CloudWatch Container Insights (EKS)

Enable Container Insights for EKS clusters:
# Install CloudWatch agent
eksctl utils install-cloudwatch-insights --cluster my-cluster

# View metrics in AWS CloudWatch console

Google Cloud Monitoring (GKE)

GKE clusters automatically integrate with Google Cloud Monitoring:
# View in Google Cloud Console
gcloud logging read "resource.type=k8s_cluster" --limit 50

Troubleshooting metrics

Metrics not available

If kubectl top returns an error:
# Check Metrics Server status
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml

# View Metrics Server logs
kubectl logs -n kube-system -l k8s-app=metrics-server

# Restart Metrics Server
kubectl rollout restart deployment metrics-server -n kube-system

High resource usage

Investigate resource-intensive workloads:
# Find top consumers
clanker k8s stats pods -A --sort-by memory

# Check specific pod
clanker k8s stats pod high-memory-pod --containers

# View logs for issues
clanker k8s logs high-memory-pod --tail 100

Next steps

Ask mode

Query metrics with natural language

Cluster management

Scale clusters based on metrics

Build docs developers (and LLMs) love