monitoring: migrate out-of-band alerts to the generator
Created by: bobheadxi
Migrate all out-of-band alert rules defined by deploy-sourcegraph
to the generator, such that each alert now has a panel as part of a service dashboard like all our other alerts, with a few changes:
- previously multi-service alerts are now defined per-service as per monitoring pillars (enforced by the generator), since a multi-service alert would require multi-service dashboards
- k8s monitoring in a new group (currently, only pods availability)
And a few additions:
- go_gc_duration to go along with go_goroutines + new "golang runtime monitoring" group for relevant services
- an entire prometheus + alertmanager dashboard (to go with
prometheus_metrics_bloat
)
And a few exceptions:
- gitserver alerts were not migrated (seems captured by existing gitserver observables)
- k8s node disk space remaining alerts were not migrated (seems nonfunctional - see thread)
Closes https://github.com/sourcegraph/sourcegraph/issues/12117
Try it out
kubectl port-forward svc/prometheus 9090:30090 -n prod
- Check out
monitoring/migrate-out-of-band-alerts
go generate ./monitoring
./dev/grafana.sh
- http://localhost:3370
TODOs
-
A few alerts use for
, so I'm going to go complete #12336 (closed) first and add the appropriatefor
parameters here -
Write up possible solutions for alerts that might have some -
Improve panel options where possible