codeintel: Reduce memory when calculating commit graph
Created by: efritz
This is an effort to reduce memory pressure as described in https://github.com/sourcegraph/sourcegraph/issues/16030.
What changed?
We join lsif_nearest_uploads with lsif_uploads to bring back the root and indexer values of each upload to determine which uploads can "shadow" another's visibility on the git commit graph. We bring these fields back with full information, although we only need to determine if they are the same or not.
We instead bring back an md5 hash of those two fields. If the root and indexer fields are the same, the hash will also be the same. And, with acceptable likelihood, two different hashes have a different root or indexer.
This saves us many duplicate copies of the same strings in memory, which was causing a lot of pressure.
Results
TL;DR: This change will make the graph update procedure 2.5x slower, but will take only 1/4 of the memory.
Before:
$ go test -run='^BenchmarkCalculateVisibleUploads$' -bench=. -benchmem -v
goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore
BenchmarkCalculateVisibleUploads
BenchmarkCalculateVisibleUploads-16 1 3466152664 ns/op 415819920 B/op 63756 allocs/op
PASS
ok github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore 3.726s
After:
$ go test -run='^BenchmarkCalculateVisibleUploads$' -bench=. -benchmem -v
goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore
BenchmarkCalculateVisibleUploads
BenchmarkCalculateVisibleUploads-16 1 8871662674 ns/op 107866832 B/op 63830 allocs/op
PASS
ok github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore 9.182s
Benchmark test (run on main@HEAD)
func BenchmarkCalculateVisibleUploads(b *testing.B) {
var (
numCommits = 5000
numUploads = 2000
numRoots = 500
)
var commits []string
for i := 0; i < numCommits; i++ {
commits = append(commits, fmt.Sprintf("%40d", i))
}
graph := map[string][]string{}
for i, commit := range commits {
if i == 0 {
graph[commit] = nil
} else {
graph[commit] = commits[i-1 : i]
}
}
var roots []string
for i := 0; i < numRoots; i++ {
roots = append(roots, fmt.Sprintf("sub%d/", i))
}
uploads := map[string][]UploadMeta{}
for i := 0; i < numUploads; i++ {
x := fmt.Sprintf("%40d", int((float64(i*numUploads) / float64(numCommits))))
uploads[x] = append(uploads[x], UploadMeta{
UploadID: i,
Root: roots[i%numRoots],
Indexer: "lsif-go",
})
}
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
_, _ = calculateVisibleUploads(graph, uploads)
}
}