Skip to content

Massively reduced the number of code host API requests Sourcegraph performs

Administrator requested to merge sg/reduce-rate-limit-consumption into master

Created by: slimsag

TL;DR

Prior to this change, for every search result you viewed we would perform 4 code host API requests and this led to us running out of our rate limit rapidly. It causes:

After this change, we massively reduce the number of code host API requests we perform because it turned out in 99% of cases (not just when searching) we were doing them for literally no reason due to unfiled technical debt.

IMPORTANT: This change does NOT change any user-facing behavior. It only prevents us from doing needless work. The behavior observed in all cases before and after this change is identical.

Detailed explanation

Repository revision resolutions (ResolveRev and GetCommit) occur very often in our codebase. For example, every search result you view makes two of these calls.

Prior to this change, each revision resolution would consume two code host API requests (which are heavily rate limited) due to the fact that they must perform a repo-updater /repo-lookup request in order to determine what the remote URL for the repository is when updating the gitserver repo mirror.

We already knew this was costly, and in fact that is why there is a CachedGitRepo variant of GitRepo in this same file. Basically, if you don't need the remote URL we can spare ourselves tons of work and, most importantly, code host API reservations.

But revision resolution wasn't able to use CachedGitRepo because we operated under the assumption that when resolving a revision you do want the repository to update. For example, when resolving master or mybranch you want the repository updated to reflect the latest state of those branches.

Times changed, repo-updater changed how we did things, and we started operating with more explicit requests to update repositories via repoupdater.DefaultClient.EnqueueRepoUpdate.

At some point, gitserver became not responsible for updating repositories unless the rev in question was not known to it (i.e. mybranch may be stale if already known to gitserver until repo-updater updates it, but gitserver will explicitly update the repository if mybranch is unknown to gitserver). You can see this logic clearly in gitserver's ensureRevision function.

What this means is that despite the fact that we always use a GitRepo for ResolveRev and GetCommit requests, in 99% of cases gitserver already has the rev and will not update it, but we'll still ask repo-updater to do all of that work and make those code-host requests just cuz'.

After this change, the user observed behavior remains the exact same: gitserver doesn't update revs it is already aware of (but note our UI does ask it to in specific locations via EnqueueRepoUpdate). We just don't ask repo-updater to do all that work for nothing anymore.

Helps #2618

Merge request reports

Loading