Skip to content

Support or-expressions for repo: and file: parameters

Created by: rvantonder

This is a subtask of #10623 (closed).

Currently we support only and and or operators on search patterns, for content. The search query evaluator enforces this constraint by processing any and/or query and separating parameters (strings of the form field:value) and patterns (strings that are not of the form field:value, for a recognized field), using PartitionSearchPattern which ensures sure that no and or or operators exist for parameters.

When we extend support for file: or repo: patterns, we need to consider at least the following example syntactic forms (this is not all of them, but they demonstrate the issue and scope):

  • (repo:foo or repo:bar) some pattern
  • (repo:foo or repo:bar or repo:foobar) some pattern
  • (repo:foo or (repo:bar or (file:foobar or file:baz)) some pattern

This issue's objective is to gradually support or-expressions for repo: and file: filters for simple cases by using an equivalent regex.

Task:

  • Objective 1: rewrite simple repo:foo or repo:bar expressions to a regex-equivalent repo:foo|bar query, and wlog for file:
  • Objective 2: Onboard developers to understand the new parser code, associated query visitors/transformer operations, and query evaluation.

Background:

Our code currently does not support a native understanding to evaluate or expressions for parameters, but only content, as per above, which happens in the evaluatePatternExpression for each expression. Underneath the hood, how this works is that for an expression like

repo:foo any:other field:value (bar or baz) is effectively translated to (repo:foo any:other field:value baz) or (repo:foo any:other field:value baz). I.e., a distributive property holds where we can expand or expressions on patterns for any existing parameters.

Suppose we support a query like (repo:foo or repo:bar) (foobar or baz). If we leave the parameter part as-is and do our partitioning/expansion, we end up with:

(repo:foo or repo:bar) foobar or (repo:foo or repo:bar) baz

Supposing we added code to rewrite this query (such code does not currently exist in Sourcegraph), we could expand the above to:

(repo:foo foobar) or (repo:bar foobar) or (repo:foo baz) or (repo:bar baz)

And we can today run such a query with multiple invocations to evaluatePatternExpression, and union the result, and it would work. This is a good general strategy, but we can do better for simpler cases. The cases where we can do better is when we can rewrite queries like repo:foo or repo:bar to our already-supported regex syntax: repo:"foo|bar". It should be clear that we could skip ahead to this form for the query (repo:foo or repo:bar) (foobar or baz), which becomes:

repo:"foo|repobar" (foobar or baz), which we can feed directly to evaluate as-is and PartitionSearchPattern will work as before without any extra work.

Part 1:

Identify and rewrite simple cases of or-expressions applied to repo: and rewrite these to the equivalent regexp value. Such cases should handle at least the following expressions:

  • (repo:foo or repo:bar) => repo:"foo|bar"
  • (repo:foo or repo:bar or repo:baz or ... => repo:"foo|bar|baz|...

wherever they may occur in the pattern. I.e., we should be able to handle:

  • pattern pattern (repo:foo or (repo:bar or repo:baz)) => pattern pattern (repo:foo or (repo:bar|baz)

And then successively applying the function to the above will reduce:

pattern pattern (repo:foo or (repo:bar|baz)) => pattern pattern (repo:foo|bar|baz)

It may be assumed that any input query is unambiguously specified with parentheses. Cases where we detect ambiguity, or cannot identify the cases above, should not be handled, and we can return an error.

Implementation guidance:

The query transformation should be rewritten using the visitor (example uses) or mapper (example uses) functions as appropriate. My best intuition right now is to start with a custom visitor that traverses the query, rebuilding the query when visiting each expression and performing the regexp-rewrite when it can. Doing it this way means that we only do one traversal of the tree (instead of a traversal for each step to reduce). See how and/or queries are reduced for a similar idea.

The visitor interface may or may not need to be expanded based on the above, we'll play that by ear.

Implement the query transformer as a standalone function and unit test it. We will hold off applying this transformation for now in the main and/or parsing code, because the surrounding code is currently in flux and being integration tested, and I'd like to isolate sources of failure.

Follow up tasks

There are syntactic cases where we cannot reduce or-expressions that operate on repos to the regexp equivalence. For example:

(repo:foo or (repo:bar file:baz))

cannot reduce to any form of repo:foo|bar because file:baz is scoped only to repo:bar. To handle this or expression generically, we would have to follow the general strategy above (search both sides of the or expression using a function similar to evaluatePatternExpression. We will focus on this later--the important part right now is that we can fairly straightforwardly handle a lot of cases by translating to regexp.

Separately, and-expressions are a different case altogether, so we can assume this is out of scope.