monitoring: provisioning alerts not consistently applied
Created by: bobheadxi
- Sourcegraph version: dot-com, k8s
- Platform information:
Steps to reproduce:
- go to https://sourcegraph.com/-/debug/grafana/d/syntect-server/syntect-server?panelId=2&fullscreen&orgId=1 , see provisioning alert there (currently named
80%+ or less than 30% container cpu usage total (1d average) across all cores by instance
) - go to https://sourcegraph.com/-/debug/grafana/d/frontend/frontend?panelId=2&fullscreen&orgId=1 , no such alert is defined for
frontend
Expected behavior:
https://github.com/sourcegraph/sourcegraph/pull/11082 adds provisioning warnings like the one described in step 1 above, that every service with the provisioning dashboard should have. each service should get 6 alerts (1x 5m alert, 2x 1d alert, for each of memory, cpu)
Actual behavior:
- in dot-com, it seems quite a few containers don't have this alert
- ~20
provisioning.*
alerts defined: https://sourcegraph.com/-/debug/grafana/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Prometheus%22,%7B%22expr%22:%22count(alert_count%7Bname%3D~%5C%22provisioning.*%5C%22%7D)%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D (this seems to fluctuate)
- ~20
- in k8s, it seems more do, but a few still don't
- sourcegraph.sgdev.org (a
sourcegraph/server
deployment from what I understand) has more provisioning alerts defined than both dot-com and k8s- 84
provisioning.*
alerts defined: https://sourcegraph.sgdev.org/-/debug/grafana/explore?orgId=1&left=%5B%22now-1h%22,%22now%22,%22Prometheus%22,%7B%22expr%22:%22count(alert_count%7Bname%3D~%5C%22provisioning.*%5C%22%7D)%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D (this seems like it should be the right number, at 6 alerts per service)
- 84
cc @slimsag