Target the same Postgres db for LSIF data.
Created by: efritz
@slimsag, @chrismwendt, and I had a small post-mortem about the pains caused by migrating LSIF to Postgres. See this doc for some additional context. We concluded that we got a lot of pain but no real benefit from adding a second database.
Our plan going forward is to (1) destroy the lsif db for all users that ran this migration (should only be devs and sg.com), (2) create LSIF tables within the same database instance, and (3) ask customers not to go ALL in on day one so we can see the actual performance impact when adding this data to the primary db.
What impact will this have on the main database?
- LSIF is opt-in (feature flagged) and will have zero impact for customers not using LSIF
- Currently, the nearest-commit query can take ~300ms every single hover/def/ref/exists request. Tracking in https://github.com/sourcegraph/sourcegraph/issues/5939
- Cross-repo j2d will execute 1 O(1) query per moniker (usually 1) to fetch the package info
- Cross-repo references currently does a table scan on the
references
table. RFC reference pagination will mitigate this, but won't prevent full table scans. - Data size estimates:
-
commits
: stores 40 char commit IDs in a commit graph. O(number of repositories * age of the Sourcegraph instance) because each repository's most recent 5000 commits will be stored in this table, and newer commits will get added over time. The size is bounded by the sum of all commits in all repositories. -
lsif_data_markers
: records which commits have LSIF data associated with them. At most as big ascommits
, very likely to be a tiny subset ofcommits
. -
packages
andreferences
: the set of all packages (the names) and which repositories depend on other packages represented as a bloom filter for each repository (basically a dependency graph).
-
This will also resolve https://github.com/sourcegraph/sourcegraph/issues/5933.
cc @sourcegraph/core-services - would love your thoughts on this