Code Insights handles large monorepos
Created by: Joelkw
Problem to solve
Code Insights is a powerful tool that all our customers want to use – but those with large monorepos often hit scale issues with the size of individual repos' results set or the scale of commits and historical data that needs to backfill, rendering the feature unstable.
Measure of success
We will know we've solved this problem when Code Insights is able to run successfully – meaning, return correct results, in the same order of magnitude of time as it takes on thousands of smaller repos – for the customers below. We're actively working with the first of these customers now, so we are confident we'll be able to test this with the customer.
Solution summary
We will approach this problem first with a couple of key known improvements:
- Enabling streaming search (slite writeup of work done so far – this will be stabilized and moved from experimental and flagged to available)
- Improving the stability/reducing the load on the interconnected components of our backend (worker, job queue, gitserver).
Some prep work for this item has already begun, as we're aiming to wrap this up in May. A few relevant issues (this list kept sporadically up to date, as we primarily track this work in two-week iterations):
- https://github.com/sourcegraph/sourcegraph/issues/33294
- https://github.com/sourcegraph/sourcegraph/issues/32969
- https://github.com/sourcegraph/sourcegraph/issues/25062
- https://github.com/sourcegraph/sourcegraph/issues/33925
- https://github.com/sourcegraph/sourcegraph/issues/33290
What specific customers are we iterating on the problem and solution with?
- https://github.com/sourcegraph/accounts/issues/574
- https://github.com/sourcegraph/accounts/issues/578
- https://github.com/sourcegraph/accounts/issues/284
Impact on use cases
This enables Code Insights to serve customers with large monorepos, and thus improves our ability to serve all five use cases dramatically for these customers, as Code Insights is a major component of all our use cases.
Delivery plan
-
Test initial experimental implementation with https://github.com/sourcegraph/accounts/issues/574 -
Evaluate additional improvements that need to be made (if need be) -
Make streaming search-powered Code Insights generally available for large monorepo customers