226 lines
6.6 KiB
Markdown
226 lines
6.6 KiB
Markdown
---
|
|
title: "Monitoring Your Kubernetes Cluster with Prometheus and Grafana"
|
|
description: "A comprehensive guide to setting up a robust monitoring solution for your Kubernetes cluster using Prometheus and Grafana."
|
|
pubDate: "2023-09-25"
|
|
heroImage: "/blog/images/posts/prometheus-dashboard.svg"
|
|
category: "Monitoring"
|
|
tags: ["kubernetes", "prometheus", "grafana", "monitoring", "observability"]
|
|
draft: false
|
|
---
|
|
|
|
# Monitoring Your Kubernetes Cluster with Prometheus and Grafana
|
|
|
|
In today's complex Kubernetes environments, having a robust monitoring solution is not just nice to have—it's essential. This guide will walk you through setting up Prometheus and Grafana to monitor your K3s or any other Kubernetes cluster.
|
|
|
|
## Why Prometheus and Grafana?
|
|
|
|
- **Prometheus**: An open-source systems monitoring and alerting toolkit that collects and stores metrics as time series data
|
|
- **Grafana**: A multi-platform open-source analytics and interactive visualization web application that provides charts, graphs, and alerts when connected to supported data sources
|
|
|
|
Together, they form a powerful monitoring stack that provides insights into your cluster's health and performance.
|
|
|
|
## Prerequisites
|
|
|
|
Before we begin, ensure you have:
|
|
|
|
- A running Kubernetes cluster (this guide uses K3s)
|
|
- `kubectl` configured to communicate with your cluster
|
|
- Helm 3 installed
|
|
|
|
## Installation using Helm
|
|
|
|
The easiest way to install Prometheus and Grafana is using the kube-prometheus-stack Helm chart, which includes:
|
|
|
|
- Prometheus Operator
|
|
- Prometheus instance
|
|
- Alertmanager
|
|
- Grafana
|
|
- Node Exporter
|
|
- Kube State Metrics
|
|
|
|
Let's create a namespace and install the stack:
|
|
|
|
```bash
|
|
# Create a dedicated namespace
|
|
kubectl create namespace monitoring
|
|
|
|
# Add the Prometheus community Helm repository
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm repo update
|
|
|
|
# Install the kube-prometheus-stack
|
|
helm install prometheus prometheus-community/kube-prometheus-stack \
|
|
--namespace monitoring \
|
|
--set grafana.adminPassword=your-strong-password
|
|
```
|
|
|
|
Replace `your-strong-password` with a secure password for the Grafana admin user.
|
|
|
|
## Accessing the Dashboards
|
|
|
|
By default, the services are not exposed outside the cluster. To access them, you can use port-forwarding:
|
|
|
|
### Grafana
|
|
|
|
```bash
|
|
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
|
|
```
|
|
|
|
Then access Grafana at http://localhost:3000 with username `admin` and the password you specified during installation.
|
|
|
|
### Prometheus
|
|
|
|
```bash
|
|
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
|
|
```
|
|
|
|
Access the Prometheus UI at http://localhost:9090.
|
|
|
|
## Setting Up Ingress (Optional)
|
|
|
|
For production environments, you'll want to set up proper ingress. Here's an example using Nginx ingress:
|
|
|
|
```yaml
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: grafana-ingress
|
|
namespace: monitoring
|
|
annotations:
|
|
kubernetes.io/ingress.class: nginx
|
|
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
spec:
|
|
rules:
|
|
- host: grafana.example.com
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: prometheus-grafana
|
|
port:
|
|
number: 80
|
|
tls:
|
|
- hosts:
|
|
- grafana.example.com
|
|
secretName: grafana-tls
|
|
```
|
|
|
|
Apply this with `kubectl apply -f ingress.yaml` after replacing `grafana.example.com` with your domain.
|
|
|
|
## Important Dashboards for Kubernetes
|
|
|
|
Grafana comes with several pre-installed dashboards, but here are some essential ones you should import:
|
|
|
|
1. **Kubernetes Cluster Overview** (ID: 10856)
|
|
2. **Node Exporter Full** (ID: 1860)
|
|
3. **Kubernetes Resource Requests** (ID: 13770)
|
|
|
|
To import a dashboard:
|
|
|
|
1. Go to Grafana UI
|
|
2. Click on "+" icon in the sidebar
|
|
3. Select "Import"
|
|
4. Enter the dashboard ID
|
|
5. Click "Load"
|
|
6. Select the Prometheus data source
|
|
7. Click "Import"
|
|
|
|
## Setting Up Alerts
|
|
|
|
Let's set up a basic alert for node CPU usage:
|
|
|
|
1. In Grafana, go to Alerting > Alert Rules
|
|
2. Click "New Alert Rule"
|
|
3. Configure the query: `instance:node_cpu_utilisation:rate5m > 0.8`
|
|
4. Set the condition to: `IS ABOVE 0.8`
|
|
5. Set evaluation interval: `1m`
|
|
6. Set "For": `5m` (alert will fire if condition is true for 5 minutes)
|
|
7. Add labels and annotations as needed
|
|
8. Save the rule
|
|
|
|
## Best Practices
|
|
|
|
1. **Resource Limits**: Set appropriate resource requests and limits for Prometheus and Grafana
|
|
2. **Retention Period**: Configure the retention period based on your storage capacity
|
|
3. **Persistent Storage**: Use persistent volumes for Prometheus data
|
|
4. **Federation**: For large clusters, consider Prometheus federation
|
|
5. **Custom Metrics**: Set up custom metrics for your applications using client libraries
|
|
|
|
## Advanced Configuration
|
|
|
|
For a production environment, you'll want to customize the Helm values. Create a `values.yaml` file:
|
|
|
|
```yaml
|
|
prometheus:
|
|
prometheusSpec:
|
|
retention: 15d
|
|
resources:
|
|
requests:
|
|
memory: 2Gi
|
|
cpu: 500m
|
|
limits:
|
|
memory: 4Gi
|
|
cpu: 1000m
|
|
storageSpec:
|
|
volumeClaimTemplate:
|
|
spec:
|
|
storageClassName: standard
|
|
accessModes: ["ReadWriteOnce"]
|
|
resources:
|
|
requests:
|
|
storage: 50Gi
|
|
|
|
grafana:
|
|
persistence:
|
|
enabled: true
|
|
size: 10Gi
|
|
resources:
|
|
requests:
|
|
memory: 256Mi
|
|
cpu: 100m
|
|
limits:
|
|
memory: 512Mi
|
|
cpu: 200m
|
|
```
|
|
|
|
Then update your Helm release:
|
|
|
|
```bash
|
|
helm upgrade prometheus prometheus-community/kube-prometheus-stack \
|
|
--namespace monitoring \
|
|
-f values.yaml
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Insufficient Resources**: If pods are crashing, check if they have enough resources allocated
|
|
2. **Connectivity Issues**: Ensure services can communicate with each other
|
|
3. **Data Retention**: If Prometheus is losing data, check the storage configuration
|
|
4. **Target Scraping**: If metrics aren't appearing, check Prometheus targets status
|
|
|
|
### Useful Commands
|
|
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -n monitoring
|
|
|
|
# Check Prometheus targets
|
|
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
|
|
# Then visit http://localhost:9090/targets
|
|
|
|
# View Prometheus logs
|
|
kubectl logs -n monitoring deploy/prometheus-operator
|
|
|
|
# View Grafana logs
|
|
kubectl logs -n monitoring deploy/prometheus-grafana
|
|
```
|
|
|
|
## Conclusion
|
|
|
|
You now have a robust monitoring solution for your Kubernetes cluster. With Prometheus collecting metrics and Grafana visualizing them, you'll have deep insights into your cluster's performance and health.
|
|
|
|
In future articles, we'll explore more advanced topics like custom exporters, alert integrations, and high availability setups for your monitoring stack. |