repo-updater: Add feature-flagged `StreamingSync` method to `Syncer`
Created by: mrnugget
(See bottom for context on this PR)
This PR adds a StreamingSync
method to the Syncer
in repo-updater
that inserts/updates repositories one-by-one, while they're being fetched from the code hosts.
As you can see, there's not that much code here: a helper method on the repos store called DeleteReposExcept
and the rather (too?) long method called StreamingSync
. I've annotated it with comments for my own understanding, but that might also give you context when reading through it.
I've also changed the existing Syncer tests to also run with this new StreamingSync
method. The main difference is that when we do a streaming sync, we don't care about the "complete diff" (I don't think that's required, but if we do want that, for debugging purposes, I think that's also possible) in the tests.
See #5145 and previous PR #5366 for more information on the goals/ideas behind this.
Since I've now switched to work on the higher-priority A8N RFC20 and RFC28, I thought that I could already gather feedback on this.
But this PR is complete, as in: tests are passing, it works, but the "only" thing missing is extensive testing/analyzing:
- Test this on k8s.sgdev.org
- Analyze the "intermediate phases" of a sync (i.e. do we delete and insert the same repos in a single sync? If so, how many useless writes do we do)
- It's probably wise to port the naming conflict tests from the
TestNewDiff
andTestSyncSubset
tests to the streaming-sync tests
Merge request reports
Activity
Created by: codecov[bot]
Codecov Report
Merging #5526 into master will increase coverage by
0.11%
. The diff coverage is74.24%
.@@ Coverage Diff @@ ## master #5526 +/- ## ========================================== + Coverage 47.03% 47.15% +0.11% ========================================== Files 741 741 Lines 45421 45617 +196 Branches 2610 2610 ========================================== + Hits 21366 21512 +146 - Misses 21989 22020 +31 - Partials 2066 2085 +19
Impacted Files Coverage Δ cmd/repo-updater/repos/observability.go 92.78% <100%> (+0.67%)
cmd/repo-updater/repos/syncer.go 74.1% <66.15%> (-5.02%)
cmd/repo-updater/repos/store.go 83.24% <77.77%> (-0.29%)
cmd/repo-updater/repos/testing.go 81.18% <83.33%> (+0.37%)
Created by: tsenart
This does mean we lose the source map, but I think that is acceptable.
In A8N, we are actually beginning to rely on the external service IDs in the sources column to be able to load all external services that own a certain repo.
EDIT: See !5611 (merged)R172
Created by: tsenart
One issue is 3. could conflict on name. I think if this case hsould be very rare and is an admin error. We can surface it as a warning and let the db keep flipflopping. WDYT?
Even just one external service (say Github A) can yield duplicate repos. Granted we currently have deduping in the source itself, but assuming we didn't, I think this would be the same issue, so perhaps worth to solve in a single place.
Created by: keegancsmith
Alternative implementation merged :) !5645 (merged)