Skip to content

monitoring(repo-updater): add duration on critical rate limit alerts, update next steps

Warren Gifford requested to merge monitoring-update-ratelimit-critical-alerts into main

Created by: bobheadxi

For https://github.com/sourcegraph/sourcegraph/issues/36434#issuecomment-1148000738

  1. @michaellzc noted that the pod restart advice is not great:

Restarting the pod doesn't guarantee a new public IP, and it's possible customers are using a public gateway (e.g. Cloud NAT) with VM without a public IP address

  1. It's also noted that this comes up quite frequently - I think on larger instances running up against the limit is quite common and even expected, and Sourcegraph should for the most part continue working even if it exhausts its rate limits. It becomes a critical issue only if the rate limit is exhausted immediately after a rate limit reset. We can detect this by checking to see if the rate limit is below the threshold for ~most of a window.

Test plan

n/a

Merge request reports

Loading