Skip to content

repo-updater: periodically bump priority of uncloned repositories

Administrator requested to merge k/cloning-repo-updater into master

Created by: keegancsmith

The git fetch/clone scheduler is unaware of which repositories are cloned or uncloned. It is only aware of repositories from the code host syncer. A repository can not be cloned even if the scheduler is aware of it for a few reasons:

  • The repository failed to cloned.
  • The repository was removed due to disk pressure.

This commit adds a periodic sync between the list of repositories gitserver has cloned and the state of the scheduler. We walk the list of all repositories in the scheduler and if it is not cloned, we update its "due" date to be as if the repository was new (default 45s until we clone it).

Note: all repositories returned from the repo store are in the scheduler via the code host syncer.

Comparison to repo-updater: make syncer & scheduler aware of uncloned gitserver repositories

#11602

The effect of #11602 is to immediately enqueue all uncloned repos into the queue. It achieves this by making the code host syncer aware of cloned state. I prefer this approach since it keeps the syncer only responsible for state from a code host, rather than some internal gitserver state. Additionally we want to make the scheduler aware of cloned state, which is a different component. This more directly achieves that and keeps the concerns independent.

Note: in the other PR uncloned repos are put straight onto the queue. Here we instead update the priority (due date) in the scheduler. This means there is a delay of 45s before we notice a repo that isn't cloned. However, because it is independent it can run much more regularly than a potentially slow code host sync. So in effect it is faster.

Merge request reports

Loading