codeintel: Determine nearest upload efficiently in query path
Created by: efritz
This PR should significantly reduce the cost of the find nearest upload operation during the prologue of every code intelligence request. This PR closes #14486.
What does the current version do?
On each definitions, references, hover, and diagnostics request, the precise code intelligence has to figure out which processed uploads can service the request. The set of uploads that can be used are the closest (by commit distance) upload with an overlapping root and the requested indexer (if supplied).
On each processed upload, we mark the repository as "dirty", which communicates to a periodic routine that that repository's commit graph should be pulled, and the set of visible uploads from each commit should be updated and stored in postgres for fast lookup.
In the query path, we can get a very fast lookup if we know about the requested commit. This means that a user can browse the current head before the CI has uploaded an index for it - and we don't have the indexed information to be able to determine the closest upload for this commit. As it is now, we mark the repository as dirty and block until its commit graph has been re-decorated.
This is an expensive operation in a latency-sensitive path which necessarily obtains a global lock. We need to do much better here.
What does the new version do?
Instead of blocking on refreshing the entire commit graph, we can solve a smaller problem more efficiently, and simply mark the repository as dirty to refresh it for subsequent requests.
Instead of pulling the entire commit graph back, we only pull a set of n commits on the ancestor of the currently requested commit. This is likely (but not guaranteed) to contain a commit we have already seen. Then, we can use the upload visibility information we have indexed for those commits, and propagate them forward through the commit graph rooted at the target commit.
This is an approximation that has a lot of arbitrarily-chosen defaults for some settings knobs. This change contains a few todos that should be done sooner than later, but are not blockers (make settings knobs more easily tunable).
How to review
By commit, as follows:
Add migration adds a migration to add the columns ancestor_visible
and overwritten
to the lsif_nearests_uploads
table. These columns represent, respectively, whether or not the commit is visible by looking at ancestors (rather than descendants only), and whether or not the commit was visible during the graph traversal, but there is a closer upload in the opposite direction.
Add new fields to uploadMeta adds the new columns to the UploadMeta
struct in the codeintel store. Adds the condition that overwritten = false
to all existing queries of that table.
Add FindClosestDumpsFromGraphFragment adds the method FindClosestDumpsFromGraphFragment
to the codeintel store. This is similar to FindClosestDumps
, which uses the contents of lsif_nearest_uploads
to determine the set of visible uploads for a given commit. The latter method only works if the commit has been indexed; otherwise the method will always return an empty list as it's disconnected from any known commit. The new method tries to solve a simpler problem more efficiently. Given a graph containing your target commit, we determine which of the commits in that graph has visibility information. Then, we propagate the visibility information down the graph, which is rooted at the target commit. This can be done much faster than updating the graph graph, which requires taking a global lock and an expensive postgres write.
Tests in this change are updated so that every graph that works for FindClosestDumps
will also work for FindClosestDumpsFromGraphFragment
. An additional test for determining the nearest upload from a partial graph is also added.
This change also heavily changes calculateVisibleUploads
to retain additional information used to update the nearest uploads table. This method should be read on its own rather than comparing it to the previous version - it's been cleaned up so its logic flows clearly and linearly.
Add options to CommitGraph adds an options field to the CommitGraph
method of the codeintel gitserver package. This allows us to optionally specify a root commit and a limit. Exiting callers in the commits package are updated. An additional test is added for parseParent sanity to ensure the documented behavior is what it does.
Update API to take a gitserver client instead of a commit updater removes a (soon to be) unused commitUpdater field and adds a (soon to be used) gitserver client field. This change was extracted so the noise is separately reviewable.
Update FindClosestDumps in API layer updates how the nearest upload is determined in the query path. Instead of waiting on the repository to be updated, we call the new methods updated in the codeintel gitserver package and the codeintel store. See the inline documentation for details.
drive-by: Fix assertion Replace a fmt.Printf
with a t.Fatal
in a test.