Skip to content

monitoring: migrate out-of-band alerts to the generator

Administrator requested to merge monitoring/migrate-out-of-band-alerts into master

Created by: bobheadxi

Migrate all out-of-band alert rules defined by deploy-sourcegraph to the generator, such that each alert now has a panel as part of a service dashboard like all our other alerts, with a few changes:

  • previously multi-service alerts are now defined per-service as per monitoring pillars (enforced by the generator), since a multi-service alert would require multi-service dashboards
  • k8s monitoring in a new group (currently, only pods availability)

And a few additions:

  • go_gc_duration to go along with go_goroutines + new "golang runtime monitoring" group for relevant services
  • an entire prometheus + alertmanager dashboard (to go with prometheus_metrics_bloat)

And a few exceptions:

  • gitserver alerts were not migrated (seems captured by existing gitserver observables)
  • k8s node disk space remaining alerts were not migrated (seems nonfunctional - see thread)

Closes https://github.com/sourcegraph/sourcegraph/issues/12117

Try it out

  1. kubectl port-forward svc/prometheus 9090:30090 -n prod
  2. Check out monitoring/migrate-out-of-band-alerts
  3. go generate ./monitoring
  4. ./dev/grafana.sh
  5. http://localhost:3370

TODOs

  • A few alerts use for, so I'm going to go complete #12336 first and add the appropriate for parameters here
  • Write up possible solutions for alerts that might have some
  • Improve panel options where possible

Merge request reports

Loading