Skip to content

codeintel: improve LSIF dependents sql query CTE

Administrator requested to merge nsc/ranked_dependents_query into main

Created by: Strum355

Problem & Solution

Addresses serious SQL query degradation where joins on lsif_packages, lsif_references and lsif_uploads would cause essentially non-terminating queries. Fixed by moving the join on lsif_packages outside the CTE.

The crux of the issue appears to be a missed optimization opportunity from Postgres on deferring the JOIN on lsif_references inside the ranked_dependents CTE. In the case of there being a very large number of rows from lsif_references that match the join predicate on lsif_packages, the rank+sort take a large amount of time due to the hugely inflated number of rows, which could've been avoided as lsif_references is not used in the rank+sort.

In other terms, where ƒ(RANK_SORT(A x B) x C) ≡ ƒ(RANK_SORT(A x B x C)) and SIZE(A x B x C) >> SIZE(A x B), it makes no sense to not join on C only later

Old: image

New: image

By manually deferring the join on lsif_references to after the CTE, we're able to drastically reduce the result set needed for ranking, which also reduces the possibility of the sort spilling to disk.

Old: image

New: image

Closes #32691 (closed)

Query + Plan Comparisons & Visualization

Old plan: https://explain.dalibo.com/plan/Dlw

New plan: https://explain.dalibo.com/plan/UI9

Test plan

Q/A pipeline and queries in Datagrip https://buildkite.com/sourcegraph/sourcegraph/builds/143398

Merge request reports

Loading