cloning does not restart after gitservers are replaced
Created by: davejrt
- Sourcegraph version: 3.16
- Platform information: GKE version 1.14
A large (140,000k) of repositories were waiting to be cloned after the gitserver stateful set had to be redeployed due to disk issues. However the gitservers, nor the repo-updater were acting on the queue on repos after redeployment. Only manually triggering the repos one by one would start a clone. The external source was then removed again.
There was no log evidence to suggest a problem with either service. In the notifications in the UI, the ony message suggested that the repos were waiting to be cloned, with nothing happening.
When removing the external source, and adding it back again, the repo-updater ran out of memory. This was increased, which then appeared to shift the load to postgres, which then had a memory limit increase and the sync completed with the repos from the external source no longer being in the queue.
- Expected behaviour
This is potentially an extreme example, but not outside the realms of possibly for larger customers who need to increase storage on their cluster. At a minimum it should be possible to trigger a re-clone of the repos via the UI or src-cli.
- Actual behaviour
When repos need to be cloned, there is no way to manually kick off this process, or evidence that something is happening when a large number of repos is in the queue