Skip to content

indexed-search: Performance improvements for concatenated regex queries

Created by: keegancsmith

A customer reported indexed-search running slower than expected for regexes of the form (AX|AY|AZ) (or groups of constant strings). We suggested adjusting the regex to A(X|Y|Z). This did indeed improve the performance of the query, but it is still slower than expected. As such, we should investigate potential performance improvements for evaluating these sort of queries.

Context

Zoekt's matching engine transforms regex's into a tree of substring queries. These substring queries can efficiently consult zoekt's index to find potentially matching documents. So a regex like foo(bar|baz) will become the query (AND (content "foo") (OR (content "bar") (content "baz"))). My guess at the performance issue is that (OR ...) could be more efficiently rule out poor matches.

Ideas

  1. Investigate how to more efficiently evaluate / reduce the work done for (OR ...) in zoekt's matchtree evaluater.
  2. More generally see if we can more efficiently consult the zoekt index for regular expressions with groups.
  3. Translate (X|Y|Z) where X, Y and Z are strings into just substring search. This avoids the cost of running the regex engine in this common case.