new diff/commit search: improve performance for case-insensitive search
Created by: camdencheek
Go's implementation of case-insensitive search is quite slow. In searcher, we transform both the regexp pattern and the text to be searched (using an optimized implementation in assembly) before doing the search. The same performance issue affects diff/commit search.
Fixes #25177 (closed)
This PR:
- Moves
bytesToLowerASCII
andlowerRegexpASCII
into a new, sharedinternal/search/casetransform
package - Modifies the gitserver protocol regex matchers to take an expression and a case-sensitive toggle, allowing gitserver to use the optimized version of case-insensitive search
- Mints a
casetransform.Regexp
type that is a light wrapper aroundregexp.Regexp
that encodes the requirements of the optimization into its method calls (no need to remember whether the regex pattern has already been lower-cased, or whether we need to lower-case the input before matching with the compiled regex) - Creates some benchmarks to test that the optimization works similarly well for diff/commit search as it does for searcher
Anecdotal real-life results:
type:diff repo:github.com/sourcegraph/sourcegraph$ camden
takes 10 seconds with the optimization on my machine vs 16 seconds without
Benchmark results (more significant because generating/parsing the diff is not included):
goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/internal/gitserver/search
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkDiffSearchCaseInsensitiveOptimization/small_diff/with_optimization-16 340771 3358 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/small_diff/without_optimization-16 157194 7487 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/many_matches/with_optimization-16 9270 120307 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/many_matches/without_optimization-16 1674 716881 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/few_matches/with_optimization-16 19105 59558 ns/op
BenchmarkDiffSearchCaseInsensitiveOptimization/large_diff/few_matches/without_optimization-16 2677 428236 ns/op
PASS
ok github.com/sourcegraph/sourcegraph/internal/gitserver/search 9.006s