1

I have several devices that output a metric. If the metric is above a threshold (different for each device), I alert. At the end of my alert_rules.yml file I have a catch-all that alerts for any device outputing a metric value of >1. This is to find devices that can have their individual rules added earlier in the rules file (devices may come and go).

The problem I have is, all devices trigger this >1 identify rule. Even ones that have rules defined above them (as expected). How can I assure that devices only trigger either their own rule (with their custom limit), or the identify rule, not both?

1 Answers1

1

You have two general ways to do that:

  1. Exclude already accounted devices from catch-them-all rule.

Depending on structure of your system this maybe easy or difficult (or sometimes almost unachievable), but without example your rule it's hard to give more details. In simplest case expression for last alert will look like this:

my_metric{device !~ "device1|device2|device3"} > 1
  1. Enable <inhibit_rule> that will mute catch-them-all alert for devices that already have their own alert firing

As documentation explains

An inhibition rule mutes an alert (target) matching a set of matchers when an alert (source) exists that matches another set of matchers.

So in your case it'll be something like

inhibit_rules:
  - source_matchers: [alertname=~"device.*-threshold"]
    target_matchers: [alertname="catch-em-all"]
    equal: [device]

This will mute alert with name catch-em-all if alert with name matching regex pattern device.*-threshold and same label device is already firing. Adjust to your names before using.

markalex
  • 363
  • 2
  • 13