codeintel: Split lsif payloads into independent columns
Created by: efritz
Fixes https://github.com/sourcegraph/sourcegraph/issues/18251 and https://github.com/sourcegraph/sourcegraph/issues/18289.
This PR changes the shape of the lsif_data_documents
table. Previously, we would encode the entire DocumentData
struct (via gob + gzip) and insert that into a single data
bytea field. Each time any data about a document is required, the entire payload must be fetched and decoded. This wastes buffer space on Postgres, bandwidth, memory and compute in the frontends decompressing and decoding the payload.
This PR splits the data
field into ranges
, hovers
, monikers
, packages
, and diagnostics
. Each reader from the table will pull back only the data they need decoded and give NULL
for the other values, which skips heap fetches and decoding time for fields not read in that request.
An out-of-band migration will migrate legacy format rows into the new format over time. All readers will read both the old data
field unconditionally. This will be NULL
for newly inserted and migrated rows. If the unified data payload is set for a row, its decoded in the legacy format.