Search backend: a better multiline type (!36124) · Merge requests · Administrator / sourcegraph

Administrator requested to merge backend-integration/cc/hunks into main May 26, 2022

Created by: camdencheek

This replaces MultilineMatch with a slightly more general HunkMatch.

One of the design limitations of MultilineMatch is that it requires duplicating the content of the full line for every matched range. One of the things that is nice about LineMatch is that it can match the same line many times, but only send the content once. When we implemented MultilineMatch, this was known, but I naively thought it would be fine since we don't send that many results with a bunch of matches per line.

I was wrong. The thing is, for very large lines, there are often also very many matches. This means as the line gets longer and there are more likely matches, we also send the larger line more times. This bad n^2 behavior causes us to hit our max payload size limit for very long lines because they are sent very many times.

So, instead, we get HunkMatch. A HunkMatch is like a MultilineMatch, except it can contain any number of matched ranges (which are allowed to cross line boundaries) within that hunk.

This satisfies all the following design constraints:

We can reconstruct the exact matched text for each range
We have the extended line contents of each match available for consumers like src CLI
(NEW) We never need to send content more than once
All of our backends return the information needed to create this type (after the last couple of buildup PRs)
This type can be losslessly and efficiently converted to []*LineMatch to maintain compatibility with the current API contract

As a bonus, this structure should make it much easier to implement streamed highlighting since the matches are already grouped into lines, so consumers can just request the decorated version of the range represented by the hunk.

Stacked on #36119

Test plan

Updated all tests and double checked output. Manually tested a few queries (in particular multiline queries). Ran backend integration tests. Added a couple of unit tests.

Search backend: a better multiline type

Test plan

Merge request reports