monitoring: migrate out-of-band alerts to the generator (!12391) · Merge requests · Administrator / sourcegraph · GitLab

Created by: bobheadxi

Migrate all out-of-band alert rules defined by deploy-sourcegraph to the generator, such that each alert now has a panel as part of a service dashboard like all our other alerts, with a few changes:

previously multi-service alerts are now defined per-service as per monitoring pillars (enforced by the generator), since a multi-service alert would require multi-service dashboards
k8s monitoring in a new group (currently, only pods availability)

And a few additions:

go_gc_duration to go along with go_goroutines + new "golang runtime monitoring" group for relevant services
an entire prometheus + alertmanager dashboard (to go with prometheus_metrics_bloat)

And a few exceptions:

gitserver alerts were not migrated (seems captured by existing gitserver observables)
k8s node disk space remaining alerts were not migrated (seems nonfunctional - see thread)

Closes https://github.com/sourcegraph/sourcegraph/issues/12117

Try it out

kubectl port-forward svc/prometheus 9090:30090 -n prod
Check out monitoring/migrate-out-of-band-alerts
go generate ./monitoring
./dev/grafana.sh
http://localhost:3370

TODOs

A few alerts use for, so I'm going to go complete #12336 first and add the appropriate for parameters here
Write up possible solutions for alerts that might have some
Improve panel options where possible