codeintel: Optimize commit graph
Created by: efritz
This is an effort to reduce memory pressure as described in https://github.com/sourcegraph/sourcegraph/issues/16030.
There are a few changes here:
- Instead of maintaining sets of visible uploads with a structure shaped like
map[string][]UploadMeta
, we keep track of it inmap[string]map[string]UploadMeta
. This is possible as when looking in one direction down the commit graph there will only be at most one upload visible with a given indexer and root value. This greatly speeds up the cases in which a repository has many distinct roots, as we no longer have to do a linear search over each commit visible at a commit. This comes at a slight cost of memory, which we continue to optimize.
@Strum355 has suggested either making the remaining map into another map with smaller, more efficient keys (string -> int conversion), and also suggested a radix tree-like structure (or a variant such as a Patricia tree) in order to cut down on the overhead fo storing maps. These may be interesting ideas, but aren't urgently necessary due to the next change.
- Instead of pre-calculating the set of visible uploads for every commit at once, we only calculate the set of visible uploads for some class of commits. The benchmark data added in this PR comes from a customer that was experiencing scale problems. In this case, we only need to store data for 20% of the commit graph. The remaining commits can be calculated on the fly with a cheap traversal of the commit graph (more technical explanations in code comments), meaning that we only need to keep data for 20% of the graph resident in memory.
Benchmark results:
Before: 17998614217551 ns/op 21031275248 B/op 1019199 allocs/op 17999.282s (5 hours)
Progress: 48014817436 ns/op 21558759232 B/op 1849823 allocs/op 49.893s
Progress: 23638566834 ns/op 9836824888 B/op 334060 allocs/op 24.604s
After: 4960440863 ns/op 1139648984 B/op 122357 allocs/op 5.443s