Search backend: remove plan expansion of `file:contains.content()`
Created by: camdencheek
This modifies the evaluation of the file:contains.content()
predicate to no longer expand ahead of time. There are very few cases where this would work correctly in the past because the number of files we expanded into was enormous, and they all had to be scoped in the query by a repo, which made for extremely complex OR
queries that would just cause stack overflows when we tried to process them.
There are two cases where we support file:contains.content()
:
- Text search
- Diff search
Text search is implemented in a very efficient manner. Basically, for a user input like file:contains(abc) def
, we execute the search as if the user searched for abc and def
, then we filter out the ranges that matched abc
(but keep any that match def
). This lets us to take full advantage of our existing, efficient AND
/OR
machinery.
Diff search is implemented in an extremely inefficient manner. For each result that comes through, we execute an unindexed search on the files matched in the diff at the matched commit and ensure that they contain all the patterns specified by the file:contains.content()
predicate. This is slow, but diff search is also slow, and I expect that the file:contains.content()
feature for diff search is hardly used, if at all, so I think it's fine. I don't want to put the effort into supporting this natively in diff search right now.
Stacked on https://github.com/sourcegraph/sourcegraph/pull/39383
This is the last predicate that used the query expansion machinery, so I will remove that in the next PR.
Test plan
Added tests, backend integration test changes reflect changes in behavior.