Skip to content

codeintel: Reduce memory when calculating commit graph

Warren Gifford requested to merge ef/commit-graph-mem-optimization into main

Created by: efritz

This is an effort to reduce memory pressure as described in https://github.com/sourcegraph/sourcegraph/issues/16030.

What changed?

We join lsif_nearest_uploads with lsif_uploads to bring back the root and indexer values of each upload to determine which uploads can "shadow" another's visibility on the git commit graph. We bring these fields back with full information, although we only need to determine if they are the same or not.

We instead bring back an md5 hash of those two fields. If the root and indexer fields are the same, the hash will also be the same. And, with acceptable likelihood, two different hashes have a different root or indexer.

This saves us many duplicate copies of the same strings in memory, which was causing a lot of pressure.

Results

TL;DR: This change will make the graph update procedure 2.5x slower, but will take only 1/4 of the memory.

Before:

$ go test -run='^BenchmarkCalculateVisibleUploads$' -bench=. -benchmem -v
goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore
BenchmarkCalculateVisibleUploads
BenchmarkCalculateVisibleUploads-16    	       1	3466152664 ns/op	415819920 B/op	   63756 allocs/op
PASS
ok  	github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore	3.726s

After:

$ go test -run='^BenchmarkCalculateVisibleUploads$' -bench=. -benchmem -v
goos: darwin
goarch: amd64
pkg: github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore
BenchmarkCalculateVisibleUploads
BenchmarkCalculateVisibleUploads-16    	       1	8871662674 ns/op	107866832 B/op	   63830 allocs/op
PASS
ok  	github.com/sourcegraph/sourcegraph/enterprise/internal/codeintel/stores/dbstore	9.182s
Benchmark test (run on main@HEAD)
func BenchmarkCalculateVisibleUploads(b *testing.B) {
	var (
		numCommits = 5000
		numUploads = 2000
		numRoots   = 500
	)

	var commits []string
	for i := 0; i < numCommits; i++ {
		commits = append(commits, fmt.Sprintf("%40d", i))
	}

	graph := map[string][]string{}
	for i, commit := range commits {
		if i == 0 {
			graph[commit] = nil
		} else {
			graph[commit] = commits[i-1 : i]
		}
	}

	var roots []string
	for i := 0; i < numRoots; i++ {
		roots = append(roots, fmt.Sprintf("sub%d/", i))
	}

	uploads := map[string][]UploadMeta{}
	for i := 0; i < numUploads; i++ {
		x := fmt.Sprintf("%40d", int((float64(i*numUploads) / float64(numCommits))))
		uploads[x] = append(uploads[x], UploadMeta{
			UploadID: i,
			Root:     roots[i%numRoots],
			Indexer:  "lsif-go",
		})
	}

	b.ResetTimer()
	b.ReportAllocs()

	for i := 0; i < b.N; i++ {
		_, _ = calculateVisibleUploads(graph, uploads)
	}
}

Merge request reports

Loading