Search backend: a better multiline type
Created by: camdencheek
This replaces MultilineMatch
with a slightly more general HunkMatch
.
One of the design limitations of MultilineMatch
is that it requires duplicating the content of the full line for every matched range. One of the things that is nice about LineMatch
is that it can match the same line many times, but only send the content once. When we implemented MultilineMatch
, this was known, but I naively thought it would be fine since we don't send that many results with a bunch of matches per line.
I was wrong. The thing is, for very large lines, there are often also very many matches. This means as the line gets longer and there are more likely matches, we also send the larger line more times. This bad n^2 behavior causes us to hit our max payload size limit for very long lines because they are sent very many times.
So, instead, we get HunkMatch
. A HunkMatch
is like a MultilineMatch
, except it can contain any number of matched ranges (which are allowed to cross line boundaries) within that hunk.
This satisfies all the following design constraints:
- We can reconstruct the exact matched text for each range
- We have the extended line contents of each match available for consumers like
src
CLI - (NEW) We never need to send content more than once
- All of our backends return the information needed to create this type (after the last couple of buildup PRs)
- This type can be losslessly and efficiently converted to
[]*LineMatch
to maintain compatibility with the current API contract
As a bonus, this structure should make it much easier to implement streamed highlighting since the matches are already grouped into lines, so consumers can just request the decorated version of the range represented by the hunk.
Stacked on #36119
Test plan
Updated all tests and double checked output. Manually tested a few queries (in particular multiline queries). Ran backend integration tests. Added a couple of unit tests.