repo-updater: Updates every repository on every syncer sync
Created by: keegancsmith
A customer complained that every repository was being "git fetched" even though the UpdateSchedule had a much later interval. I believe the problem is that we enqueue the repository onto the update queue every time we run the repo metadata syncer!
The fix may be as simple as removing this line https://github.com/sourcegraph/sourcegraph/blob/b02024dc415af27008d42a9811fe119574cce2a9/cmd/repo-updater/repos/scheduler.go#L262-L263
A new repository should be enqueued by the scheduler, so we should still clone new repositories. (This will need to be tested). The only other thing upsert needs to do is update the remote URL on the update queue if it has changed.
Context: Our git scheduler has two parts. The schedule
and the updateQueue
. The updateQueue
is consumed and issues git fetch
/git clone
against gitserver. schedule
stores all repositories and periodically puts items into the updateQueue
to actually be updated. The upsert function linked to above is how the scheduler/updatequeue is informed of the list of repositories.
See https://github.com/sourcegraph/customer/issues/31
This may also explain the issues seen with enabling autoGitUpdates
at https://github.com/sourcegraph/sourcegraph/issues/8458. So is likely closely related to the graphs seen in https://github.com/sourcegraph/sourcegraph/issues/8400#issuecomment-587813653