codeintel: More intelligently read from result chunks (!19010) · Merge requests · Warren Gifford / sourcegraph

Warren Gifford requested to merge ef/18620 into main Mar 10, 2021

Created by: efritz

We take the following steps to resolve locations from a set of result ids for a particular index:

For each id, find the result chunk index in which it's stored.
Deduplicate the set of indexes such that we keep the order of the first occurrence of each index.
Open up all result chunks whose id is in this set; do this in batches and look up the ID in each result chunk; construct a mapping of result id -> document path -> range ids.
Iterate the map and remove the first $offset results and keep the next $limit results (by some order).
Gather and deduplicate the document paths from this map.
In batches, load a set of documents and resolve each range id to an actual extent within the document.

Note: (by some order) used to mean by result id in the order they were given, then by paths ordered lexicographically. This caused us on each page to return a random smattering of document ids to open.

This PR changes that order to be by path lexicographically, then by result id. This causes us to open fewer documents per page and open the same document fewer times in the same result set. Fixes https://github.com/sourcegraph/sourcegraph/issues/18620.

codeintel: More intelligently read from result chunks

Merge request reports