Skip to content

batches: add read only external state

Administrator requested to merge aharvey/add-read-only-external-state into main

Created by: LawnGnome

Note for Repo Management (hello!)

I've requested a review from you because this is adding a new place that mutates the repo table, specifically to flip the archived column if Batch Changes notices that a repo is archived before repo-updater. Let me know how you feel about that, please!

Actual description

This PR adds a new "read only" external state that is mostly terminal1 and used when the repository a changeset is open on is archived. No bulk operations can be performed on those changesets, and re-applying the batch spec will not result in any operations.

When reconciling a changeset, we will detect archived repos based on the response from the code host when updating or pushing changesets. (We can also detect this when creating changesets, and the underlying code host calls will return the right error here so the user gets a good error message, but we obviously then won't have a changeset to enter the read-only state!) This PR adds the plumbing for this with a new errcode.IsArchived function, but does not actually add the code host support, which are in #38332 and #38333 for GitLab and GitHub, respectively. (The Bitbuckets do not support archiving repos, and are hence out of scope for this project.)

There are a couple of interesting quirks to this implementation:

  1. We ultimately have to rely on the archived field of the repo table to determine the behaviour, since the only alternative would be to burn code host quota by pinging the code host each time we want to touch a repo. This means that we're at the mercy of how often the external service syncer updates repo metadata, particularly when unarchiving.
  2. It's possible that the reconciler will be the first part of Sourcegraph to learn that a repository has been archived. Because of the previous point, we need to update repo so that we can calculate the derived state correctly — this has been implemented, but definitely feels a little weird, since normally the only thing that will update repo is repo-updater. The other option was to force an immediate sync, but repo-updater doesn't support only syncing one repository on an external service, and the underlying operations are intentionally asynchronous. Performing a direct UPDATE felt like the lesser of the two possible evils.

Related to #26820 (closed), but the actual fix will be a subsequent PR.

Future work

  • Hook changes to the repo.archived field to more efficiently update the external state of changesets, since right now there's going to be a non-trivial delay between a repo being (un)archived and changesets being updated. The major problem here is that it crosses the FOSS/enterprise boundary the "wrong" way, which requires either a nullable callback or some sort of broader solution for notifying disparate parts of the codebase when an event occurs.

Test plan

Added unit tests for basically everything that was changed or added.

Additionally:

  • Attempted to publish changesets to archived repos, and ensured that we got a sensible error.
  • Attempted to update changesets on repos that had been archived, and ensured that the changesets entered the read only state.
  • Attempted to push a commit to repos that had been archived, and ensured that the changesets entered the read only state.
  • Synced read only changesets on repos that had been unarchived, and ensured that the changesets entered the correct, non read only state.

  1. Why isn't it actually terminal? The main reason is that repositories can be unarchived, so we need to be able to revive changesets. The way this has been implemented is by allowing syncs to proceed for read only changesets only to the point where we check if the repo is archived — if it is, then that's the end of the sync; otherwise we perform a full sync, which will reset the changeset's external state.

Merge request reports

Loading