We are going to customise Prometheus alerts by using external labels.
The Problem: One Prometheus Instance per Kubernetes Cluster
I’ve recently deployed the second Kubernetes cluster into the homelab environment, and realised that if I send alerts to the same Slack channel, I can’t tell which environment the alert somes from. I therefore need a way to identify the cluster that fires the alerts, ideally getting the cluster name passed to Alertmanager.
The Solution: External Labels
Starting with Prometheus 2.27, it is possible to expand environment variables in external labels. If the feature is enabled, then Prometheus would replace ${var}
or $var
in the external_labels
values according to the values of the current environment variables. According to documentation, references to undefined variables are replaced by the empty string.
Pre-requisites
We are using our Kubernetes homelab to configure Prometheus and Alertmanager.
Download Files from GitHub
Prometheus and Alertmanager configuration files used in this article are hosted on GitHub. Clone the following repository:
$ git clone https://github.com/lisenet/kubernetes-homelab.git
Note that this homelab project is under development, therefore please refer to GitHub for any source code changes.
Use External Labels with Prometheus Alerts
Create Prometheus Secret to Store Cluster Name
Create a secret called prometheus-cluster-name that contains the cluster name the Prometheus instance is running in.
$ kubectl -n monitoring create secret generic \ prometheus-cluster-name --from-literal=CLUSTER_NAME=kubernetes-homelab
Update Prometheus Deployment Configuration
Edit prometheus-deployment.yml
, enable the feature expand-external-labels and instruct Prometheus to read environment variables from the secret prometheus-cluster-name:
--- apiVersion: apps/v1 kind: Deployment [...] containers: - name: prometheus image: prom/prometheus:v2.29.0 imagePullPolicy: IfNotPresent args: - "--storage.tsdb.retention.time=28d" - "--config.file=/etc/prometheus/prometheus.yml" - "--storage.tsdb.path=/prometheus/" - "--enable-feature=expand-external-labels" envFrom: - secretRef: name: prometheus-cluster-name ports: - containerPort: 9090 protocol: TCP [...]
Update Prometheus ConfigMap
Edit prometheus-config-map.yml
and add external labels to global Prometheus configuration, also specify alert relabel configuration:
--- apiVersion: v1 kind: ConfigMap [...] prometheus.yml: |- global: evaluation_interval: 60s scrape_interval: 15s scrape_timeout: 10s external_labels: cluster: ${CLUSTER_NAME} rule_files: - /etc/prometheus/prometheus.rules alerting: alert_relabel_configs: - source_labels: [cluster] action: replace regex: (.*) replacement: "$1" target_label: cluster alertmanagers: - static_configs: - targets: - 'alertmanager.monitoring.svc:9093' [...]
For each configured alert, add a blank cluster label. Note that alert relabeling is applied to alerts before they are sent to the Alertmanager.
data: prometheus.rules: |- groups: - name: node.alerts rules: - alert: KubernetesHostHighCPUUsage expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 for: 15m labels: severity: warning context: node cluster: annotations: summary: High load on node description: "Node {{ $labels.instance }} has more than 90% CPU load"
Apply Changes to Prometheus
$ kubectl apply -f ./kubernetes-homelab/prometheus/
Configure Alertmanager Slack Receiver
Edit alertmanager-config-map.yml
and customise the alerting to include the cluster name:
--- apiVersion: v1 kind: ConfigMap [...] receivers: - name: 'slack_homelab' slack_configs: - api_url: https://hooks.slack.com/services/XYZXYZXYZ/ABCABCABC/1234567890 channel: '#homelab' send_resolved: true title: "[{{ .Status | toUpper }}] {{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}" text: |- {{ range .Alerts }}*Description:* {{ .Annotations.description }} *Context:* {{ .Labels.context }} *Cluster:* {{ .Labels.cluster }} *Severity:* {{ .Labels.severity }} {{ end }}
Apply changes to Alertmanager:
$ kubectl apply -f ./kubernetes-homelab/alertmanager/
When a new alert arrives, it should contain the cluster name:
References
https://prometheus.io/docs/prometheus/2.29/feature_flags/
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#alert_relabel_configs
Just conf prometheus.yml like this:
“`yaml
global:
…
external_labels:
cluster: kubernetes-homelab
“`
Done!
I manage more than one cluster I’m afraid and they aren’t all called “kubernetes-homelab”. But I see your point for hardcoding it if you only have one cluster.
if you have to add cluster to your prometheus rules anyway, then templating the rules is easier than all the rest.
Whatever works best for you! There are many ways to skin a cat.