Skip to content

codeintel: Minimize tuple modifications when updating commit graph data

Warren Gifford requested to merge ef/18285 into main

Created by: efritz

Fixes #18285 (closed).

When updating the lsif_nearest_uploads, lsif_nearest_uploads_links, and lsif_uploads_visible_from_tip tables, we do a mass delete for all data related to a repository, then re-insert the newly calculated data for that repository. We've compressed it fairly well, but that's no match for outpacing Postgres's autovacuum daemon.

Turns out that MVCC is really not a space-efficient when your workload is basically telling Postgres to drop 90% of the data you're about to write to it a few milliseconds later.

Since the commit graph is generally stable (anything outside of the influence of an active branch does not change), we're now only going to touch the rows in the table that we absolutely need to. We still insert the entire data set into Postgres, just now into a temporary table. We then use that temporary table and the current data to insert rows for which there isn't an entry in the index, delete records for which there's no correlated or win the temp table, and update the remaining rows (that have changed). This alone should account for a massive reduction in how much we write to the WAL on these fairly frequent operations.

Merge request reports

Loading