Skip to content

codeintel: Find correct oldest commit when calculating commit graph

Administrator requested to merge ef/fix-oldest-commit into main

Created by: efritz

Problem:

When we update the commit graph for a repository (which code intel uses to determine which indexes are useful for a request at a given commit), we grab only the relevant parts of the commit graph that occur on or after (chronologically) the earliest commit for which we have an index.

This works assuming that indexes will be uploaded in commit order via CI. This is no longer true once we add dependency indexing, as we'll need to be able to index a specific, possibly historic, commit. Our old (now broken) heuristic finds the commit attached to the oldest upload we know about for a repository and assumes that commit is the oldest we care about.

Solution:

Get rid of the proxy that assumes (a commit's relative commit time) = (an upload's relative upload time) and store the actual commit date on the upload record as early as possible.

This PR adds a field committed_at that gets populated once the upload is processed. We can't populate this field any earlier as processing is the earliest part of the data pipeline in which we are assured gitserver has been refreshed for this repository. This field is back-filled by an out-of-band migration.

When selecting the earliest date to search when updating the commit graph, we now have the precise information we need easily accessible via a quick scan over the lsif_upload records for that repository.

Merge request reports

Loading