7.4 KiB

Executable File

Raw Blame History

title

description

pubDate

heroImage

Setting Up Prometheus Monitoring in Kubernetes

Effective monitoring is crucial for maintaining a healthy Kubernetes environment. Prometheus has become the de facto standard for metrics collection and alerting in cloud-native environments. This guide will walk you through setting up a complete Prometheus monitoring stack in your Kubernetes cluster.

Why Prometheus?

Prometheus offers several advantages for Kubernetes monitoring:

Pull-based architecture: Simplifies configuration and security
Powerful query language (PromQL): For flexible data analysis
Service discovery: Automatically finds targets in dynamic environments
Rich ecosystem: Wide range of exporters and integrations
CNCF graduated project: Strong community and vendor support

Components of the Monitoring Stack

We'll set up a complete monitoring stack consisting of:

Prometheus: Core metrics collection and storage
Alertmanager: Handles alerts and notifications
Grafana: Visualization and dashboards
Node Exporter: Collects host-level metrics
kube-state-metrics: Collects Kubernetes state metrics
Prometheus Operator: Simplifies Prometheus management in Kubernetes

Prerequisites

A running Kubernetes cluster (K3s, EKS, GKE, etc.)
kubectl configured to access your cluster
Helm 3 installed

Installation Using Helm

The easiest way to deploy Prometheus is using the kube-prometheus-stack Helm chart, which includes all the components mentioned above.

1. Add the Prometheus Community Helm Repository

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

2. Create a Namespace for Monitoring

kubectl create namespace monitoring

3. Configure Values

Create a values.yaml file with your custom configuration:

prometheus:
  prometheusSpec:
    retention: 15d
    resources:
      requests:
        memory: 256Mi
        cpu: 100m
      limits:
        memory: 2Gi
        cpu: 500m
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 20Gi

alertmanager:
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

grafana:
  persistence:
    enabled: true
    storageClassName: standard
    size: 10Gi
  adminPassword: "prom-operator"  # Change this!
  
nodeExporter:
  enabled: true

kubeStateMetrics:
  enabled: true

4. Install the Helm Chart

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values values.yaml

5. Verify the Installation

Check that all the pods are running:

kubectl get pods -n monitoring

Accessing the UIs

By default, the components don't have external access. You can use port-forwarding to access them:

Prometheus UI

kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090

Then access Prometheus at http://localhost:9090

Grafana

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Then access Grafana at http://localhost:3000 (default credentials: admin/prom-operator)

Alertmanager

kubectl port-forward -n monitoring svc/prometheus-alertmanager 9093:9093

Then access Alertmanager at http://localhost:9093

For Production: Exposing Services

For production environments, you'll want to set up proper ingress. Here's an example using a basic Ingress resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: monitoring
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  rules:
  - host: prometheus.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-operated
            port:
              number: 9090
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-grafana
            port:
              number: 80
  - host: alertmanager.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-alertmanager
            port:
              number: 9093

Configuring Alerting

1. Set Up Alert Rules

Alert rules can be created using the PrometheusRule custom resource:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: node-alerts
  namespace: monitoring
  labels:
    release: prometheus
spec:
  groups:
  - name: node.rules
    rules:
    - alert: HighNodeCPU
      expr: instance:node_cpu_utilisation:rate1m > 0.8
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage on {{ $labels.instance }}"
        description: "CPU usage is above 80% for 5 minutes on node {{ $labels.instance }}"

2. Configure Alert Receivers

Configure Alertmanager to send notifications by creating a Secret with your configuration:

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-prometheus-alertmanager
  namespace: monitoring
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m
      slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
    
    route:
      group_by: ['job', 'alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'slack-notifications'
      routes:
      - receiver: 'slack-notifications'
        matchers:
          - severity =~ "warning|critical"
    
    receivers:
    - name: 'slack-notifications'
      slack_configs:
      - channel: '#alerts'
        send_resolved: true
        title: '{{ template "slack.default.title" . }}'
        text: '{{ template "slack.default.text" . }}'    
type: Opaque

Custom Dashboards

Grafana comes pre-configured with several useful dashboards, but you can import more from Grafana.com.

Some recommended dashboard IDs to import:

1860: Node Exporter Full
12740: Kubernetes Monitoring
13332: Prometheus Stats

Troubleshooting

Common Issues

Insufficient Resources: Prometheus can be resource-intensive. Adjust resource limits if pods are being OOMKilled.
Storage Issues: Ensure your storage class supports the access modes you've configured.
ServiceMonitor not working: Check that the label selectors match your services.

Conclusion

You now have a fully functional Prometheus monitoring stack for your Kubernetes cluster. This setup provides comprehensive metrics collection, visualization, and alerting capabilities essential for maintaining a healthy and performant cluster.

In future articles, we'll explore advanced topics like custom exporters, recording rules for performance, and integrating with other observability tools like Loki for logs and Tempo for traces.

7.4 KiB Executable File Raw Blame History