Skip to content

repo-updater: Add feature-flagged `StreamingSync` method to `Syncer`

Administrator requested to merge core/experiment-stream-sync-1 into master

Created by: mrnugget

(See bottom for context on this PR)

This PR adds a StreamingSync method to the Syncer in repo-updater that inserts/updates repositories one-by-one, while they're being fetched from the code hosts.

As you can see, there's not that much code here: a helper method on the repos store called DeleteReposExcept and the rather (too?) long method called StreamingSync. I've annotated it with comments for my own understanding, but that might also give you context when reading through it.

I've also changed the existing Syncer tests to also run with this new StreamingSync method. The main difference is that when we do a streaming sync, we don't care about the "complete diff" (I don't think that's required, but if we do want that, for debugging purposes, I think that's also possible) in the tests.

See #5145 and previous PR #5366 for more information on the goals/ideas behind this.


Since I've now switched to work on the higher-priority A8N RFC20 and RFC28, I thought that I could already gather feedback on this.

But this PR is complete, as in: tests are passing, it works, but the "only" thing missing is extensive testing/analyzing:

  • Test this on k8s.sgdev.org
  • Analyze the "intermediate phases" of a sync (i.e. do we delete and insert the same repos in a single sync? If so, how many useless writes do we do)
  • It's probably wise to port the naming conflict tests from the TestNewDiff and TestSyncSubset tests to the streaming-sync tests

Merge request reports

Loading