Symbols perf: use sqlite3
Created by: chrismwendt
Background
Previously, the symbols service was pretty slow on big repositories (linear in the repository size). See https://github.com/sourcegraph/sourcegraph/issues/2257 for analysis.
GIF of kubernetes/kubernetes code intel before this PR
This vastly improves the speed of searches (it uses sqlite3 and indexes the symbol name column for nearly instant queries). This has no effect on the speed of initial indexing of symbols.
GIF of kubernetes/kubernetes code intel after this PR
Benchmark results
$ env CTAGS_COMMAND=$PWD/cmd/symbols/universal-ctags-dev go test -v -bench=. -test.run=sdfsfd ./cmd/symbols/internal/symbols
goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/cmd/symbols/internal/symbols
BenchmarkSearch/indexing_go-langserver@391-8 10 140234483 ns/op
BenchmarkSearch/indexing_moby@6e5-8 1 7930319865 ns/op
BenchmarkSearch/searching_go-langserver@391_^sortedImportRecord$-8 30000 44872 ns/op
BenchmarkSearch/searching_go-langserver@391_1234doesnotexist1234-8 5000 281762 ns/op
BenchmarkSearch/searching_moby@6e5_^fsCache$-8 20000 80045 ns/op
BenchmarkSearch/searching_moby@6e5_1234doesnotexist1234-8 30 46515624 ns/op
PASS
ok github.com/sourcegraph/sourcegraph/cmd/symbols/internal/symbols 36.963s
I forced the non exact regexes to be run over all symbols by using 1234doesnotexist1234
as the regex. Exact matches (^...$
) are much faster and scale better than non exact matches that don't exist.
The current implementation takes ~40ms to search for 1234doesnotexist1234
in moby/moby, and takes ~46ms using sqlite3, which I think is acceptable.
Memory usage improvements
This also reduces the memory consumption (it streams symbols from the ctags process to sqlite3 without needing to build up a giant []Symbol
in Go). This should reduce the frequency of the symbols service getting OOM killed (cc @ggilmore, just FYI).
- Memory usage while indexing kubernetes/kubernetes before this PR: ~200 MB
- Memory usage while indexing kubernetes/kubernetes after this PR: ~26 MB
TypeScript build script
This also uses TypeScript for a build script. It's much safer than shell, and I'm curious if anyone has thoughts on this.
Review
This is one of the largest changes I've made, and I'd be happy to do a video chat
Here are the areas I'd like review on (feel free to review anything, these are just my guesses for what you will want to look at):
- @keegancsmith
- Correct usage of concurrency primitives, especially in the functions that return channels for streaming symbols (I'm pretty new to this area of Go, so there is a high probability I've done things that are plain wrong)
- Correct usage of sqlite3, sqlf, and sqlx
- Database schema
- @slimsag
- Dev build system changes (mostly
build.ts
, some changes to Procfile and other shell scripts) - General Go idioms, control flow, organization of functions, methods, etc.
- Dev build system changes (mostly
- @felixfbecker
- Usage of ts-node,
./dev/ts-script
, andcmd/symbols/build.ts
- Usage of ts-node,
- Anyone: am I missing anything that you'd expect to see?
I'm hoping to address feedback until both @slimsag and @keegancsmith are confident that this is a robust implementation and are happy with the result before merging.
TODO
-
Add teststhey already exist -
Add a few more comments, minor refactoring -
Have a few people run this branch locally to see what parts of the build break (missing libs, missing sqlite3, etc.), since my machine is dirty (both macOS and Linux)
Issue references
Resolves https://github.com/sourcegraph/sourcegraph/issues/2257 Fixes https://github.com/sourcegraph/sourcegraph/issues/1625