Skip to content

new diff/commit search: improve performance for case-insensitive search

Administrator requested to merge cc/ignore-case-perf into main

Created by: camdencheek

Go's implementation of case-insensitive search is quite slow. In searcher, we transform both the regexp pattern and the text to be searched (using an optimized implementation in assembly) before doing the search. The same performance issue affects diff/commit search.

Fixes #25177 (closed)

This PR:

  • Moves bytesToLowerASCII and lowerRegexpASCII into a new, shared internal/search/casetransform package
  • Modifies the gitserver protocol regex matchers to take an expression and a case-sensitive toggle, allowing gitserver to use the optimized version of case-insensitive search
  • Mints a casetransform.Regexp type that is a light wrapper around regexp.Regexp that encodes the requirements of the optimization into its method calls (no need to remember whether the regex pattern has already been lower-cased, or whether we need to lower-case the input before matching with the compiled regex)
  • Creates some benchmarks to test that the optimization works similarly well for diff/commit search as it does for searcher

Anecdotal real-life results: type:diff repo:github.com/sourcegraph/sourcegraph$ camden takes 10 seconds with the optimization on my machine vs 16 seconds without

Benchmark results (more significant because generating/parsing the diff is not included):

goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/internal/gitserver/search
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkDiffSearchCaseInsensitiveOptimization/small_diff/with_optimization-16            340771              3358 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/small_diff/without_optimization-16         157194              7487 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/many_matches/with_optimization-16                 9270            120307 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/many_matches/without_optimization-16              1674            716881 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/few_matches/with_optimization-16                 19105             59558 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/few_matches/without_optimization-16               2677            428236 ns/op
PASS
ok      github.com/sourcegraph/sourcegraph/internal/gitserver/search    9.006s

Merge request reports

Loading