Core Services: 3.6 Tracking Issue
Created by: keegancsmith
Goal
99th percentile latency of the search query “print” at 42req/s with 20k repos is under 2s.
Availability
- @mrnugget: Not working July 8th-10th
- @tsenart All month
- @keegancsmith All month
Dependencies
We will want to avoid conflicting with the work the Search team is doing https://github.com/sourcegraph/sourcegraph/issues/4582
Completed
Our main focus in this iteration was to load test the large-scale instance at k8s.sgdev.org and to analyze profiling data to spot bottlenecks and possible improvements.
The first major theme we found were large and needless allocations in the GraphQL layer, that were due to multiple parts of the code relying on a list of all repositories. That caused out-of-memory errors, GC pressure and resulted in increased request latency.
Solving this issue by reducing the number of allocations and changing what we allocate took up a large part of this iteration.
After that, we focused on the next component, Zoekt, by profiling it and trying to understand its behavior under load.
We now have multiple ideas for improving Zoekt's performance, including horizontal scaling.
-
Start by adding a benchmark for the (r *searchResolver) doResults
method in the GraphQL layer-
Use *searchbackend.Zoekt.DisableCache = true
inBenchmarkSearchResults
-
Create a database-hitting integration version of BenchmarkSearchResults
-
-
Add a backend.ListRepoIDs
method to be used inresolveRepositories
. This could use a roaring bitset to encode the IDs in a compressed way. Update after investigation on June 28th: we cannot use only the repository IDs, because a lot of places in the search code path use the repository name, includinggitserver
. -
externalrepospec not a pointer in types.Repo -
Avoiding allocating the repositoryRevision slice if all revisions are empty. (@tsenart) -
Before search results are returned from doResults
the repositories-IDs need to be turned into repositories ("hydrated"). This requires abackend.LoadByIDs
. Update after investigation on June 28th: the easier and "dumb" thing (because it includes N+1 queries) to do would be to load "minimal repositories" with onlyID
andName
andExternalRepo
fields. Whenever a method on*repositoryResolver
that's not ID, Name, etc. is called, we load the information from the database and fill up the*repositoryResolver.Repo
-
Dogfood traces for common paths so that these are useful (don’t have to jump straight to logs) -
Clone status indicator #4120 (closed) #3413 (closed) -
Fix status indicator to show actual cloning status (#4591) -
Follow #4591 up with investigating possible performance improvements described here -
Remove feature flag and enable feature by default
-
-
Possible follow-up on #4685: delete all existing repoupdater
migrations now that they've been in two versions of Sourcegraph (#4886) -
Adding a new external service that syncs slowly results in 504 Gateway Timeout (#4511 (closed))
Won't do
- Abandoned because at the moment it wouldn't bring us that much closer to our goal, compared to other ideas that we will be pursuing. We might still do this in the future:
-
Fix ACL layer to work with repository IDs instead of names (https://github.com/sourcegraph/sourcegraph/issues/4812) -
Add support for repository IDs to Zoekt. - Record sourcegraph ID in zoekt shards
- Support specifying a set of sourcegraph IDs in
zoekt/query.Q
-
IDs need to be stable for "restoring of previously deleted repos" to work in repo-updater/Zoekt
-
Backlogged
-
Repositories can end up in bad state and harm overall system #4565 (closed) -
Introduce some form of rate limiting in the GraphQL API to prevent slowing down requests for everyone. -
Refactor the "scatter/gather" pattern to remove duplication in gitserver.Client
(see comment) -
Refactor repoupdater.Client.do
method to use the same interface as thebitbucketserver.Client.send
method (see comment here)
Moved to 3.7
-
Determine resource allocation "equation" and update docs for admins to provision them accordingly for their Sourcegraph installation (e.g. https://confluence.atlassian.com/bitbucketserver/scaling-bitbucket-server-776640073.html) - Note: We don't know enough yet to do this.
-
Experiment with more efficient posting lists in Zoekt https://github.com/sourcegraph/zoekt/pull/10 - Note: Partially done.