Skip to content
Snippets Groups Projects

codeintel: alert when all executor jobs are failing

Merged Warren Gifford requested to merge nsc-ef/executor-errors-over-time into main

Created by: Strum355

Creates alert for executors error rate that alerts when the rate of errors is 100%, indicating some global misconfiguration (as happened before with src-cli related issues).

The alert is a bit special in that it uses a different query to the panel, one based on last_over_time aggregate. We do this as we dont want the alert to mark itself as resolved if there happens to be a period over the defined window where there are no auto indexing jobs (aka when the error rate is "technically" < 100%).

The screenshot below illustrates how the alert query maintains the last value over a predefined window, so that if no executor jobs are processing but over the error rate was 100% before, we will continue alerting as the absence of running jobs does not imply the issue is resolved.

image

Closes https://github.com/sourcegraph/sourcegraph/issues/30494

Test plan

Only modifies dashboards/alerts, n/a

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading