codeintel: Faster online find closest commit operations
Created by: efritz
Check out this graph:
We can do better. FindClosestCommit will try to determine if we know about this commit (by virtue of being in the lsif_closest_commits
table), and if so which bundles we should use to answer subsequent queries.
This table is updated each time the worker processes a new bundle. This queries gitserver for the complete commit graph, runs a O(n) algorithm to determine for each commit what bundles are visible, and writes the data back to Postgres. Doing this offline is fine.
We also update this table in the situation where a user navigates to a commit for which we haven't seen yet (possibly gitserver has updated, but there's been no index uploaded so that view of the commit graph is stale). In this case we want to provide code intelligence, so we run the same algorithm above. This gives us the correct results, but according to the graph above, in an efficient manner.
I believe this may be one of the several culprits responsible for the behavior indicated in https://github.com/sourcegraph/sourcegraph/issues/13733.
My idea is to mark the repository as dirty so it will get updated (for subsequent requests), then fetch the minimal amount of information from postgres/gitserver to answer only this request, but not serialize the results anywhere. I think we can narrow the focus with a few observations:
- This can only happen for commits that are newer than any uploaded index, otherwise the last upload would have updated the repository to include this commit.
Here's the information we have:
- An (oudated) commit graph in postgres
- The set of bundles that should be queried for each commit in that graph
- The (updated) commit graph in gitserver
I think all we need to do get the proper set of visible bundles is to query from the currently viewing commit up to any commit that is in postgres. We then return the results of the closest commit in postgres.
Outstanding questions:
- What git command will give us this? (I think
git rev-list --use-bitmap-index
gives us what we want) - Do we need to add additional data in postgres (dates?) to construct the git command? (Can we bound
git rev-list
to a set number, or can we use other data to reduce the output we need to consider? Remember this will be a command issues in gitserver, so we can't easily do interactive things.)
Additional work:
- I don't think this request should block even on the fallback case, and it should return an error response that the code intel extension can somehow respond to (show an indicator that something is being calculated). This should only affect the first request for a commit we haven't seen, which shouldn't be back-to-back for the same user.