codeintel: Split lsif payloads into independent columns (!19659) · Merge requests · Administrator / sourcegraph

Created by: efritz

Fixes https://github.com/sourcegraph/sourcegraph/issues/18251 and https://github.com/sourcegraph/sourcegraph/issues/18289.

This PR changes the shape of the lsif_data_documents table. Previously, we would encode the entire DocumentData struct (via gob + gzip) and insert that into a single data bytea field. Each time any data about a document is required, the entire payload must be fetched and decoded. This wastes buffer space on Postgres, bandwidth, memory and compute in the frontends decompressing and decoding the payload.

This PR splits the data field into ranges, hovers, monikers, packages, and diagnostics. Each reader from the table will pull back only the data they need decoded and give NULL for the other values, which skips heap fetches and decoding time for fields not read in that request.

An out-of-band migration will migrate legacy format rows into the new format over time. All readers will read both the old data field unconditionally. This will be NULL for newly inserted and migrated rows. If the unified data payload is set for a row, its decoded in the legacy format.

codeintel: Split lsif payloads into independent columns

Merge request reports