Skip to content

monitoring: provisioning alerts fixes

Administrator requested to merge monitoring/provisioning-alerts-fixes into master

Created by: bobheadxi

follow-up to https://github.com/sourcegraph/sourcegraph/pull/11082

  • updates solutions to account for the dual bounds
  • improve prometheus alert description by rendering it for one bound specifically (instead of both), such that we get the following output for dual-bound alerts:
  - record: alert_count
    labels:
      description: 'frontend: 80%+ container memory usage (1d average) by instance (not available on server)'
      level: warning
      name: provisioning_container_memory_usage_1d_high
      service_name: frontend
    expr: |-
      clamp_max(clamp_min(floor(
      ((((avg_over_time(cadvisor_container_memory_usage_percentage_total{name=~".*frontend.*",name!~".*(_POD_|_jaeger-agent_).*"}[1d])) / 80) OR on() vector(0)) >= 0) OR on() vector(1)
      ), 0), 1) OR on() vector(1)
  - record: alert_count
    labels:
      description: 'frontend: less than 30% container memory usage (1d average) by instance (not available on server)'
      level: warning
      name: provisioning_container_memory_usage_1d_low
      service_name: frontend
    expr: |-
      clamp_max(clamp_min(floor(
      (((30 / clamp_min(avg_over_time(cadvisor_container_memory_usage_percentage_total{name=~".*frontend.*",name!~".*(_POD_|_jaeger-agent_).*"}[1d]), 0.0000001)) OR on() vector(0)) >= 0) OR on() vector(1)
      ), 0), 1) OR on() vector(1)

Merge request reports

Loading