codeintel: improve LSIF dependents sql query CTE
Created by: Strum355
Problem & Solution
Addresses serious SQL query degradation where joins on lsif_packages, lsif_references and lsif_uploads would cause essentially non-terminating queries. Fixed by moving the join on lsif_packages
outside the CTE.
The crux of the issue appears to be a missed optimization opportunity from Postgres on deferring the JOIN
on lsif_references
inside the ranked_dependents
CTE. In the case of there being a very large number of rows from lsif_references
that match the join predicate on lsif_packages
, the rank+sort take a large amount of time due to the hugely inflated number of rows, which could've been avoided as lsif_references
is not used in the rank+sort.
In other terms, where ƒ(RANK_SORT(A x B) x C) ≡ ƒ(RANK_SORT(A x B x C))
and SIZE(A x B x C) >> SIZE(A x B)
, it makes no sense to not join on C
only later
By manually deferring the join on lsif_references
to after the CTE, we're able to drastically reduce the result set needed for ranking, which also reduces the possibility of the sort spilling to disk.
Closes #32691 (closed)
Query + Plan Comparisons & Visualization
Old plan: https://explain.dalibo.com/plan/Dlw
New plan: https://explain.dalibo.com/plan/UI9
Test plan
Q/A pipeline and queries in Datagrip https://buildkite.com/sourcegraph/sourcegraph/builds/143398