monitoring: provisioning alerts fixes
Created by: bobheadxi
follow-up to https://github.com/sourcegraph/sourcegraph/pull/11082
- updates solutions to account for the dual bounds
- improve prometheus alert description by rendering it for one bound specifically (instead of both), such that we get the following output for dual-bound alerts:
- record: alert_count
labels:
description: 'frontend: 80%+ container memory usage (1d average) by instance (not available on server)'
level: warning
name: provisioning_container_memory_usage_1d_high
service_name: frontend
expr: |-
clamp_max(clamp_min(floor(
((((avg_over_time(cadvisor_container_memory_usage_percentage_total{name=~".*frontend.*",name!~".*(_POD_|_jaeger-agent_).*"}[1d])) / 80) OR on() vector(0)) >= 0) OR on() vector(1)
), 0), 1) OR on() vector(1)
- record: alert_count
labels:
description: 'frontend: less than 30% container memory usage (1d average) by instance (not available on server)'
level: warning
name: provisioning_container_memory_usage_1d_low
service_name: frontend
expr: |-
clamp_max(clamp_min(floor(
(((30 / clamp_min(avg_over_time(cadvisor_container_memory_usage_percentage_total{name=~".*frontend.*",name!~".*(_POD_|_jaeger-agent_).*"}[1d]), 0.0000001)) OR on() vector(0)) >= 0) OR on() vector(1)
), 0), 1) OR on() vector(1)