Monitor your Kubernetes cluster with a complete observability stack featuring Prometheus for metrics, Grafana for visualization, and Loki for log aggregation.
Prometheus and Grafana
Prerequisites
Running Kubernetes cluster
Helm installed
kubectl configured
Installing Lens (Optional)
Lens provides a desktop GUI for Kubernetes cluster management:
wget https://api.k8slens.dev/binaries/Lens-5.3.3-latest.20211223.1.amd64.deb
dpkg -i Lens-5.3.3-latest.20211223.1.amd64.deb
Copy your kubeconfig file from the manager node to ~/.kube/config on your workstation, then launch Lens with the lens command.
Install Prometheus Stack with Helm
Install kubectl (if needed)
snap install kubectl --classic
Install Helm
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
Add Prometheus Repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
Create Namespace
kubectl create ns monitoring
Install kube-prometheus-stack
helm install prometheus --namespace monitoring prometheus-community/kube-prometheus-stack
Verify Installation
kubectl get pods -n monitoring
Accessing Grafana
Check Grafana Service
kubectl get svc -n monitoring
Port Forward to Grafana
kubectl port-forward -n monitoring service/prometheus-grafana 3000:80
Access Dashboard
Open your browser and navigate to: Default credentials:
Username : admin
Password : prom-operator
The kube-prometheus-stack includes pre-configured dashboards for:
Cluster overview
Node metrics
Pod metrics
Persistent volumes
Kubernetes API server
Loki for Log Management
Loki is a log aggregation system designed to work seamlessly with Grafana.
Installation
Add Grafana Repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm search repo loki
Generate Custom Values
helm show values grafana/loki-stack > loki-values.yaml
Configure Loki Values
Edit loki-values.yaml with the following key configurations: loki :
enabled : true
isDefault : true
url : http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
readinessProbe :
httpGet :
path : /ready
port : http-metrics
initialDelaySeconds : 45
livenessProbe :
httpGet :
path : /ready
port : http-metrics
initialDelaySeconds : 45
promtail :
enabled : true
config :
logLevel : info
serverPort : 3101
clients :
- url : http://{{ .Release.Name }}:3100/loki/api/v1/push
grafana :
enabled : true
sidecar :
datasources :
enabled : true
maxLines : 1000
image :
tag : 10.3.3
service :
type : NodePort
prometheus :
enabled : false
fluent-bit :
enabled : false
filebeat :
enabled : false
logstash :
enabled : false
Deploy Loki Stack
helm upgrade --install --values loki-values.yaml loki grafana/loki-stack -n grafana-loki --create-namespace
Verify Installation
kubectl get pods -n grafana-loki
Deploy Log Generator
Create a test application to generate logs:
kubectl apply -f - <<'EOF'
apiVersion : apps/v1
kind : Deployment
metadata :
name : log-generator
namespace : default
labels :
app : log-generator
spec :
replicas : 1
selector :
matchLabels :
app : log-generator
template :
metadata :
labels :
app : log-generator
spec :
containers :
- name : log-generator
image : busybox
imagePullPolicy : IfNotPresent
command : [ "/bin/sh" , "-c" ]
args :
- >
while true; do
ts=$(date -u +"%Y-%m-%dT%H:%M:%SZ");
echo "{\"timestamp\":\"${ts}\",\"level\":\"info\",\"message\":\"Hello from log-generator! Testing Loki JSON logs.\"}";
sleep 5;
done
resources :
limits :
cpu : "100m"
memory : "64Mi"
requests :
cpu : "50m"
memory : "32Mi"
---
apiVersion : v1
kind : Service
metadata :
name : log-generator
namespace : default
labels :
app : log-generator
spec :
selector :
app : log-generator
ports :
- port : 8080
targetPort : 8080
protocol : TCP
type : ClusterIP
EOF
Accessing Loki Grafana
Get NodePort
kubectl get svc loki-grafana -n grafana-loki -o jsonpath="{.spec.ports[0].nodePort}"
Get Credentials
# Username
kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-user}" | base64 --decode
# Password
kubectl get secret loki-grafana -n grafana-loki -o jsonpath="{.data.admin-password}" | base64 --decode
Access Dashboard
http://<NODE-IP>:<NODE-PORT>
LogQL Query Examples
Loki uses LogQL for querying logs:
Namespace Logs
Label Selector
Filter with Text
Exclude Text
Container Logs
Event Logs
Node Logs
Grafana Dashboards
Pre-installed Dashboards
The kube-prometheus-stack includes:
Kubernetes / Compute Resources / Cluster : Overall cluster metrics
Kubernetes / Compute Resources / Namespace (Pods) : Per-namespace pod metrics
Kubernetes / Compute Resources / Node (Pods) : Per-node metrics
Node Exporter / Nodes : Detailed node statistics
Creating Custom Dashboards
Navigate to Dashboards
In Grafana, click Dashboards → New Dashboard
Add Panel
Click Add new panel
Write PromQL Query
Example queries: # CPU usage by pod
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Memory usage by namespace
sum(container_memory_usage_bytes) by (namespace)
# Pod restart count
kube_pod_container_status_restarts_total
Configure Visualization
Select chart type (Graph, Gauge, Table, etc.) and customize
Alerting with Prometheus
Create PrometheusRule
apiVersion : monitoring.coreos.com/v1
kind : PrometheusRule
metadata :
name : high-cpu-usage
namespace : monitoring
spec :
groups :
- name : cpu_alerts
interval : 30s
rules :
- alert : HighCPUUsage
expr : sum(rate(container_cpu_usage_seconds_total[5m])) by (pod) > 0.8
for : 5m
labels :
severity : warning
annotations :
summary : "High CPU usage detected"
description : "Pod {{ $labels.pod }} is using more than 80% CPU"
kubectl apply -f high-cpu-alert.yaml
When to Use Loki
Choose Loki when:
You want a simple, scalable, and cost-effective logging solution
You’re operating in a cloud-native environment
You’re already using Grafana and Prometheus
Your log analysis needs are straightforward
You need to correlate logs with metrics in a single interface
Best Practices
Set appropriate retention policies for metrics and logs
Use persistent volumes for Prometheus and Loki data
Configure resource limits for monitoring components
Create alerts for critical metrics
Regularly review and clean up unused dashboards
Use label selectors efficiently to optimize queries
Enable authentication and authorization for Grafana
Export important dashboards as code for version control
Monitoring systems can consume significant resources. Always set resource limits and monitor the monitoring stack itself.
References