codeintel: Speed up document canonicalization by 100x~300x
Created by: varungandhi-src
Implement Chris's suggestion for faster intersection. This speeds up canonicalizeDocuments
quite a bit (because documentRanges
is quite small, whereas canonicalIDs
is roughly proportional to number of documents), so the slowest part of canonicalization is now canonicalizeRanges
. This patch is essentially the same as the one I mentioned in https://github.com/sourcegraph/sourcegraph/pull/31053#issue-1132430354
60M patch with max ~46 dupes
- before:
canonicalizeDocuments duration=35.962733208s
- after
canonicalizeDocuments duration=357.644125ms
other pieces:
canonicalizeReferenceResults duration=42.093958ms
canonicalizeResultSets duration=398.065625ms
canonicalizeRanges duration=1.450452084s
large patch (120M) with max ~145 dupes
- before:
canonicalizeDocuments duration=3m8.622550417s
- after
canonicalizeDocuments duration=569.585209ms
other pieces:
canonicalizeReferenceResults duration=71.916292ms
canonicalizeResultSets duration=959.010791ms
canonicalizeRanges duration=2.775382167s
I don't fully understand why I didn't see this large speedup when I tried this earlier in https://github.com/sourcegraph/sourcegraph/pull/30978#issuecomment-1035950484. (One difference is I made the earlier measurements with a locally running Sourcegraph instance, whereas these measurements are made locally with a small helper script that just plumbs through the LSIF dump.)
Test plan
- Added some tests for the new method
UnorderedKeys