Skip to content

codeintel: Fix duplicate package providers

Administrator requested to merge ef/25691 into main

Created by: efritz

TL;DR: Fixes #25691 (closed). There was a bug with the previous way we linked what uploads refer to each other via the relationships formed in the lsif_packages and lsif_references table. This bug caused repositories that re-declare the same package versions on different commits (which turns out to be very common in practice) to link to every use of that package. Instead of having one canonical package providing a particular library, the majority of commits to the providing repository may claim to provide it. The bug has been squashed.

This PR makes a few changes:

  • Consolidates the logic of the methods UpdateNumReferences (that would update the refcount of uploads by looking at what depends on them) and UpdateDependencyNumReferences (that would increment or decrement the refcount of an upload's dependencies, depending on whether the upload was being inserted or deleted) into UpdaterReferenceCounts, which handles the update for the entire set.
  • Deprecates the column num_references column in favor of reference_count. This is a flood the world solution, but makes a lot of sense since we can just abandon the old one in place and recalculate the num references from scratch. We already have an OOB migration that we can use to backfill the values (we just switch the references within the migrator over to the new column and it's M A G I C).
  • Update all queries over lsif_packages to ensure that we're only looking at the canonical upload provider for a package (the oldest upload with the same scheme, name, version, indexer, and repository root). This includes the reference count queries themselves. The queries should now be well-documented.

Merge request reports

Loading