289 lines
7.4 KiB
Markdown
Executable File
289 lines
7.4 KiB
Markdown
Executable File
---
|
|
title: Setting Up Prometheus Monitoring in Kubernetes
|
|
description: A comprehensive guide to implementing Prometheus monitoring in your Kubernetes cluster
|
|
pubDate: 2025-04-19
|
|
heroImage: /blog/images/posts/prometheusk8.png
|
|
category: devops
|
|
tags:
|
|
- kubernetes
|
|
- monitoring
|
|
- prometheus
|
|
- grafana
|
|
- observability
|
|
readTime: 9 min read
|
|
---
|
|
|
|
# Setting Up Prometheus Monitoring in Kubernetes
|
|
|
|
Effective monitoring is crucial for maintaining a healthy Kubernetes environment. Prometheus has become the de facto standard for metrics collection and alerting in cloud-native environments. This guide will walk you through setting up a complete Prometheus monitoring stack in your Kubernetes cluster.
|
|
|
|
## Why Prometheus?
|
|
|
|
Prometheus offers several advantages for Kubernetes monitoring:
|
|
|
|
- **Pull-based architecture**: Simplifies configuration and security
|
|
- **Powerful query language (PromQL)**: For flexible data analysis
|
|
- **Service discovery**: Automatically finds targets in dynamic environments
|
|
- **Rich ecosystem**: Wide range of exporters and integrations
|
|
- **CNCF graduated project**: Strong community and vendor support
|
|
|
|
## Components of the Monitoring Stack
|
|
|
|
We'll set up a complete monitoring stack consisting of:
|
|
|
|
1. **Prometheus**: Core metrics collection and storage
|
|
2. **Alertmanager**: Handles alerts and notifications
|
|
3. **Grafana**: Visualization and dashboards
|
|
4. **Node Exporter**: Collects host-level metrics
|
|
5. **kube-state-metrics**: Collects Kubernetes state metrics
|
|
6. **Prometheus Operator**: Simplifies Prometheus management in Kubernetes
|
|
|
|
## Prerequisites
|
|
|
|
- A running Kubernetes cluster (K3s, EKS, GKE, etc.)
|
|
- kubectl configured to access your cluster
|
|
- Helm 3 installed
|
|
|
|
## Installation Using Helm
|
|
|
|
The easiest way to deploy Prometheus is using the kube-prometheus-stack Helm chart, which includes all the components mentioned above.
|
|
|
|
### 1. Add the Prometheus Community Helm Repository
|
|
|
|
```bash
|
|
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
|
helm repo update
|
|
```
|
|
|
|
### 2. Create a Namespace for Monitoring
|
|
|
|
```bash
|
|
kubectl create namespace monitoring
|
|
```
|
|
|
|
### 3. Configure Values
|
|
|
|
Create a `values.yaml` file with your custom configuration:
|
|
|
|
```yaml
|
|
prometheus:
|
|
prometheusSpec:
|
|
retention: 15d
|
|
resources:
|
|
requests:
|
|
memory: 256Mi
|
|
cpu: 100m
|
|
limits:
|
|
memory: 2Gi
|
|
cpu: 500m
|
|
storageSpec:
|
|
volumeClaimTemplate:
|
|
spec:
|
|
storageClassName: standard
|
|
accessModes: ["ReadWriteOnce"]
|
|
resources:
|
|
requests:
|
|
storage: 20Gi
|
|
|
|
alertmanager:
|
|
alertmanagerSpec:
|
|
storage:
|
|
volumeClaimTemplate:
|
|
spec:
|
|
storageClassName: standard
|
|
accessModes: ["ReadWriteOnce"]
|
|
resources:
|
|
requests:
|
|
storage: 10Gi
|
|
|
|
grafana:
|
|
persistence:
|
|
enabled: true
|
|
storageClassName: standard
|
|
size: 10Gi
|
|
adminPassword: "prom-operator" # Change this!
|
|
|
|
nodeExporter:
|
|
enabled: true
|
|
|
|
kubeStateMetrics:
|
|
enabled: true
|
|
```
|
|
|
|
### 4. Install the Helm Chart
|
|
|
|
```bash
|
|
helm install prometheus prometheus-community/kube-prometheus-stack \
|
|
--namespace monitoring \
|
|
--values values.yaml
|
|
```
|
|
|
|
### 5. Verify the Installation
|
|
|
|
Check that all the pods are running:
|
|
|
|
```bash
|
|
kubectl get pods -n monitoring
|
|
```
|
|
|
|
## Accessing the UIs
|
|
|
|
By default, the components don't have external access. You can use port-forwarding to access them:
|
|
|
|
### Prometheus UI
|
|
|
|
```bash
|
|
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
|
|
```
|
|
|
|
Then access Prometheus at http://localhost:9090
|
|
|
|
### Grafana
|
|
|
|
```bash
|
|
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
|
|
```
|
|
|
|
Then access Grafana at http://localhost:3000 (default credentials: admin/prom-operator)
|
|
|
|
### Alertmanager
|
|
|
|
```bash
|
|
kubectl port-forward -n monitoring svc/prometheus-alertmanager 9093:9093
|
|
```
|
|
|
|
Then access Alertmanager at http://localhost:9093
|
|
|
|
## For Production: Exposing Services
|
|
|
|
For production environments, you'll want to set up proper ingress. Here's an example using a basic Ingress resource:
|
|
|
|
```yaml
|
|
apiVersion: networking.k8s.io/v1
|
|
kind: Ingress
|
|
metadata:
|
|
name: prometheus-ingress
|
|
namespace: monitoring
|
|
annotations:
|
|
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
|
spec:
|
|
rules:
|
|
- host: prometheus.example.com
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: prometheus-operated
|
|
port:
|
|
number: 9090
|
|
- host: grafana.example.com
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: prometheus-grafana
|
|
port:
|
|
number: 80
|
|
- host: alertmanager.example.com
|
|
http:
|
|
paths:
|
|
- path: /
|
|
pathType: Prefix
|
|
backend:
|
|
service:
|
|
name: prometheus-alertmanager
|
|
port:
|
|
number: 9093
|
|
```
|
|
|
|
## Configuring Alerting
|
|
|
|
### 1. Set Up Alert Rules
|
|
|
|
Alert rules can be created using the PrometheusRule custom resource:
|
|
|
|
```yaml
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: PrometheusRule
|
|
metadata:
|
|
name: node-alerts
|
|
namespace: monitoring
|
|
labels:
|
|
release: prometheus
|
|
spec:
|
|
groups:
|
|
- name: node.rules
|
|
rules:
|
|
- alert: HighNodeCPU
|
|
expr: instance:node_cpu_utilisation:rate1m > 0.8
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High CPU usage on {{ $labels.instance }}"
|
|
description: "CPU usage is above 80% for 5 minutes on node {{ $labels.instance }}"
|
|
```
|
|
|
|
### 2. Configure Alert Receivers
|
|
|
|
Configure Alertmanager to send notifications by creating a Secret with your configuration:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: alertmanager-prometheus-alertmanager
|
|
namespace: monitoring
|
|
stringData:
|
|
alertmanager.yaml: |
|
|
global:
|
|
resolve_timeout: 5m
|
|
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
|
|
|
|
route:
|
|
group_by: ['job', 'alertname', 'namespace']
|
|
group_wait: 30s
|
|
group_interval: 5m
|
|
repeat_interval: 12h
|
|
receiver: 'slack-notifications'
|
|
routes:
|
|
- receiver: 'slack-notifications'
|
|
matchers:
|
|
- severity =~ "warning|critical"
|
|
|
|
receivers:
|
|
- name: 'slack-notifications'
|
|
slack_configs:
|
|
- channel: '#alerts'
|
|
send_resolved: true
|
|
title: '{{ template "slack.default.title" . }}'
|
|
text: '{{ template "slack.default.text" . }}'
|
|
type: Opaque
|
|
```
|
|
|
|
## Custom Dashboards
|
|
|
|
Grafana comes pre-configured with several useful dashboards, but you can import more from [Grafana.com](https://grafana.com/grafana/dashboards/).
|
|
|
|
Some recommended dashboard IDs to import:
|
|
- 1860: Node Exporter Full
|
|
- 12740: Kubernetes Monitoring
|
|
- 13332: Prometheus Stats
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Insufficient Resources**: Prometheus can be resource-intensive. Adjust resource limits if pods are being OOMKilled.
|
|
2. **Storage Issues**: Ensure your storage class supports the access modes you've configured.
|
|
3. **ServiceMonitor not working**: Check that the label selectors match your services.
|
|
|
|
## Conclusion
|
|
|
|
You now have a fully functional Prometheus monitoring stack for your Kubernetes cluster. This setup provides comprehensive metrics collection, visualization, and alerting capabilities essential for maintaining a healthy and performant cluster.
|
|
|
|
In future articles, we'll explore advanced topics like custom exporters, recording rules for performance, and integrating with other observability tools like Loki for logs and Tempo for traces. |