argobox/src/content/posts/prometheus-monitoring.md

---
title: "Monitoring Your Kubernetes Cluster with Prometheus and Grafana"
description: "A comprehensive guide to setting up a robust monitoring solution for your Kubernetes cluster using Prometheus and Grafana."
pubDate: "2023-09-25"
heroImage: "/blog/images/posts/prometheus-dashboard.svg"
category: "Monitoring"
tags: ["kubernetes", "prometheus", "grafana", "monitoring", "observability"]
draft: false
---

# Monitoring Your Kubernetes Cluster with Prometheus and Grafana

In today's complex Kubernetes environments, having a robust monitoring solution is not just nice to have—it's essential. This guide will walk you through setting up Prometheus and Grafana to monitor your K3s or any other Kubernetes cluster.

## Why Prometheus and Grafana?

- **Prometheus**: An open-source systems monitoring and alerting toolkit that collects and stores metrics as time series data
- **Grafana**: A multi-platform open-source analytics and interactive visualization web application that provides charts, graphs, and alerts when connected to supported data sources

Together, they form a powerful monitoring stack that provides insights into your cluster's health and performance.

## Prerequisites

Before we begin, ensure you have:

- A running Kubernetes cluster (this guide uses K3s)
- `kubectl` configured to communicate with your cluster
- Helm 3 installed

## Installation using Helm

The easiest way to install Prometheus and Grafana is using the kube-prometheus-stack Helm chart, which includes:

- Prometheus Operator
- Prometheus instance
- Alertmanager
- Grafana
- Node Exporter
- Kube State Metrics

Let's create a namespace and install the stack:

```bash
# Create a dedicated namespace
kubectl create namespace monitoring

# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set grafana.adminPassword=your-strong-password
```

Replace `your-strong-password` with a secure password for the Grafana admin user.

## Accessing the Dashboards

By default, the services are not exposed outside the cluster. To access them, you can use port-forwarding:

### Grafana

```bash
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
```

Then access Grafana at http://localhost:3000 with username `admin` and the password you specified during installation.

### Prometheus

```bash
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
```

Access the Prometheus UI at http://localhost:9090.

## Setting Up Ingress (Optional)

For production environments, you'll want to set up proper ingress. Here's an example using Nginx ingress:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  rules:
  - host: grafana.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-grafana
            port:
              number: 80
  tls:
  - hosts:
    - grafana.example.com
    secretName: grafana-tls
```

Apply this with `kubectl apply -f ingress.yaml` after replacing `grafana.example.com` with your domain.

## Important Dashboards for Kubernetes

Grafana comes with several pre-installed dashboards, but here are some essential ones you should import:

1. **Kubernetes Cluster Overview** (ID: 10856)
2. **Node Exporter Full** (ID: 1860)
3. **Kubernetes Resource Requests** (ID: 13770)

To import a dashboard:

1. Go to Grafana UI
2. Click on "+" icon in the sidebar
3. Select "Import"
4. Enter the dashboard ID
5. Click "Load"
6. Select the Prometheus data source
7. Click "Import"

## Setting Up Alerts

Let's set up a basic alert for node CPU usage:

1. In Grafana, go to Alerting > Alert Rules
2. Click "New Alert Rule"
3. Configure the query: `instance:node_cpu_utilisation:rate5m > 0.8`
4. Set the condition to: `IS ABOVE 0.8`
5. Set evaluation interval: `1m`
6. Set "For": `5m` (alert will fire if condition is true for 5 minutes)
7. Add labels and annotations as needed
8. Save the rule

## Best Practices

1. **Resource Limits**: Set appropriate resource requests and limits for Prometheus and Grafana
2. **Retention Period**: Configure the retention period based on your storage capacity
3. **Persistent Storage**: Use persistent volumes for Prometheus data
4. **Federation**: For large clusters, consider Prometheus federation
5. **Custom Metrics**: Set up custom metrics for your applications using client libraries

## Advanced Configuration

For a production environment, you'll want to customize the Helm values. Create a `values.yaml` file:

```yaml
prometheus:
  prometheusSpec:
    retention: 15d
    resources:
      requests:
        memory: 2Gi
        cpu: 500m
      limits:
        memory: 4Gi
        cpu: 1000m
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: standard
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

grafana:
  persistence:
    enabled: true
    size: 10Gi
  resources:
    requests:
      memory: 256Mi
      cpu: 100m
    limits:
      memory: 512Mi
      cpu: 200m
```

Then update your Helm release:

```bash
helm upgrade prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml
```

## Troubleshooting

### Common Issues

1. **Insufficient Resources**: If pods are crashing, check if they have enough resources allocated
2. **Connectivity Issues**: Ensure services can communicate with each other
3. **Data Retention**: If Prometheus is losing data, check the storage configuration
4. **Target Scraping**: If metrics aren't appearing, check Prometheus targets status

### Useful Commands

```bash
# Check pod status
kubectl get pods -n monitoring

# Check Prometheus targets
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
# Then visit http://localhost:9090/targets

# View Prometheus logs
kubectl logs -n monitoring deploy/prometheus-operator

# View Grafana logs
kubectl logs -n monitoring deploy/prometheus-grafana
```

## Conclusion

You now have a robust monitoring solution for your Kubernetes cluster. With Prometheus collecting metrics and Grafana visualizing them, you'll have deep insights into your cluster's performance and health.

In future articles, we'll explore more advanced topics like custom exporters, alert integrations, and high availability setups for your monitoring stack.