Skip to content

dot-com: too many alerts when sourcegraph.com goes down

Created by: bobheadxi

Earlier today, Sourcegraph.com went down (thread, bug, fix). Within a span of 30 minutes (3pm to 3:30pm GMT+8), I ack'd about 25 alerts in OpsGenie:

  • 11 site24x7 alerts
  • 8 blackbox alerts
  • 6 built-in alerts

This seems like quite a lot of alerts, and I feel like we can either figure out some kind of dedup strategy or try and remove some of them. Reducing site24x7 alerts or fine-tuning blackbox alerts (https://github.com/sourcegraph/sourcegraph/issues/13627) would probably help out a lot here right off the bat as well.