Skip to content

executors: alert when no jobs processed but queue > 0

Warren Gifford requested to merge nsc/executor-progress-alert into main

Created by: Strum355

Adds an alert for, after a certain time period, when the executor queue size is greater than 0 and the number of completed or errored job attempts is not increasing, then we likely have some issue causing executors to stall. This is complementary to https://github.com/sourcegraph/sourcegraph/pull/38767 (which needs to be improved to handle gaps as per https://sourcegraph.slack.com/archives/C07KZF47K/p1660661668690719)

Closes https://github.com/sourcegraph/sourcegraph/issues/40409

The orange line below represents the alert case (before alert duration threshold is applied), yellow is the queue size, blue is the in-flight jobs.

image

Test plan

Ran the queries against dotcom metrics data as per above

Merge request reports

Loading