Skip to content

codeintel: Speed up document canonicalization by 100x~300x

Administrator requested to merge vg/speedup-canonicalizeDocuments into main

Created by: varungandhi-src

Implement Chris's suggestion for faster intersection. This speeds up canonicalizeDocuments quite a bit (because documentRanges is quite small, whereas canonicalIDs is roughly proportional to number of documents), so the slowest part of canonicalization is now canonicalizeRanges. This patch is essentially the same as the one I mentioned in https://github.com/sourcegraph/sourcegraph/pull/31053#issue-1132430354

60M patch with max ~46 dupes
- before:
  canonicalizeDocuments         duration=35.962733208s
 - after
  canonicalizeDocuments         duration=357.644125ms
other pieces:
  canonicalizeReferenceResults  duration=42.093958ms
  canonicalizeResultSets        duration=398.065625ms
  canonicalizeRanges            duration=1.450452084s
 
large patch (120M) with max ~145 dupes
- before:
  canonicalizeDocuments         duration=3m8.622550417s
- after
  canonicalizeDocuments         duration=569.585209ms
other pieces:
  canonicalizeReferenceResults  duration=71.916292ms
  canonicalizeResultSets        duration=959.010791ms
  canonicalizeRanges            duration=2.775382167s

I don't fully understand why I didn't see this large speedup when I tried this earlier in https://github.com/sourcegraph/sourcegraph/pull/30978#issuecomment-1035950484. (One difference is I made the earlier measurements with a locally running Sourcegraph instance, whereas these measurements are made locally with a small helper script that just plumbs through the LSIF dump.)

Test plan

  • Added some tests for the new method UnorderedKeys

Merge request reports

Loading