Skip to content

search: add DNF conversion for search expressions

Administrator requested to merge rvt/hierarchical-search-proto into master

Created by: rvantonder

First main part of search expressions: adds a way to expand scoped/nested queries for evaluation. Copying the doc comment:

dnf returns the Disjunctive Normal Form of a query (a flat sequence of
or-expressions) by applying the distributive property on (possibly nested)
or-expressions. For example, the query:

(repo:a (file:b OR file:c))
in DNF becomes:
(repo:a file:b) OR (repo:a file:c)

Using the DNF expression makes it easy to support general nested queries that
imply scope, like the one above: We simply evaluate all disjuncts and union
the results. Note that various optimizations are possible
during evaluation, but those are separate query pre- or postprocessing steps
separate from this general transformation.

dnf isn't used or validated by the search code yet: I need to still make changes for evaluating these queries (next PRs).


Some notes:

Why DNF?

Queries can be evaluated in various ways (e.g., directly without DNF expansion, or by doing CNF expansion, etc.) depending how a system or engine is architected, size of the queries, and what is considered 'typical' for the domain of inputs. My assessment is that using DNF is the most appropriate choice based on our current search architecture right now--it's simple to evaluate, the sorts queries we expect will remain efficient, and the expressions are compatible with partial evaluation (i.e., for streaming). The foremost reason I see why we can't evaluate queries with scope directly is because we have both an indexed and unindexed code path, which can be toggled--evaluating directly would mean bookkeeping things like whether a repo is indexed or not as we evaluate expressions for files or content, and then calling those endpoints, which would substantially complicate the top-level evaluator logic. When we use a general DNF form it acts as a fallback that always works, and can rely on some properties of our regex support to simplify certain or-expressions directly, which would avoid DNF expansion in many cases:

What about regex OR support which we already have?

We can exploit the fact that file:b OR file:c is equivalent to regex repo:b|c, but that is something we can process before calling this function. We will always need a general way to evaluate scoped/nested expressions because not all params have an exploitable property like a shared regex expression. For example, this plausible query cannot be simplified in that way:

repo:a (file:b OR "match me") => repo:a file:b OR repo:a "match me"

another example is:

repo:a (lang:ts OR lang:go)

which will work out of the box, without needing to, for example, first convert these to file: equivalents and then construct a regex (which would be a nice optimization, but it is tedious to go after nice optimizations before supporting the general capability).

Merge request reports

Loading