Skip to content
Snippets Groups Projects

repo-updater: Add feature-flagged `StreamingSync` method to `Syncer`

Created by: mrnugget

(See bottom for context on this PR)

This PR adds a StreamingSync method to the Syncer in repo-updater that inserts/updates repositories one-by-one, while they're being fetched from the code hosts.

As you can see, there's not that much code here: a helper method on the repos store called DeleteReposExcept and the rather (too?) long method called StreamingSync. I've annotated it with comments for my own understanding, but that might also give you context when reading through it.

I've also changed the existing Syncer tests to also run with this new StreamingSync method. The main difference is that when we do a streaming sync, we don't care about the "complete diff" (I don't think that's required, but if we do want that, for debugging purposes, I think that's also possible) in the tests.

See #5145 and previous PR #5366 for more information on the goals/ideas behind this.


Since I've now switched to work on the higher-priority A8N RFC20 and RFC28, I thought that I could already gather feedback on this.

But this PR is complete, as in: tests are passing, it works, but the "only" thing missing is extensive testing/analyzing:

  • Test this on k8s.sgdev.org
  • Analyze the "intermediate phases" of a sync (i.e. do we delete and insert the same repos in a single sync? If so, how many useless writes do we do)
  • It's probably wise to port the naming conflict tests from the TestNewDiff and TestSyncSubset tests to the streaming-sync tests

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Created by: unknwon

    Link to #5414 seems incorrect because that's a dependency update PR...

  • Created by: mrnugget

    Link to #5414 seems incorrect because that's a dependency update PR...

    Fixed

  • Created by: codecov[bot]

    Codecov Report

    Merging #5526 into master will increase coverage by 0.11%. The diff coverage is 74.24%.

    @@            Coverage Diff             @@
    ##           master    #5526      +/-   ##
    ==========================================
    + Coverage   47.03%   47.15%   +0.11%     
    ==========================================
      Files         741      741              
      Lines       45421    45617     +196     
      Branches     2610     2610              
    ==========================================
    + Hits        21366    21512     +146     
    - Misses      21989    22020      +31     
    - Partials     2066     2085      +19
    Impacted Files Coverage Δ
    cmd/repo-updater/repos/observability.go 92.78% <100%> (+0.67%) :arrow_up:
    cmd/repo-updater/repos/syncer.go 74.1% <66.15%> (-5.02%) :arrow_down:
    cmd/repo-updater/repos/store.go 83.24% <77.77%> (-0.29%) :arrow_down:
    cmd/repo-updater/repos/testing.go 81.18% <83.33%> (+0.37%) :arrow_up:
  • Created by: tsenart

    This does mean we lose the source map, but I think that is acceptable.

    In A8N, we are actually beginning to rely on the external service IDs in the sources column to be able to load all external services that own a certain repo.

    EDIT: See !5611 (merged)R172

  • Created by: tsenart

    One issue is 3. could conflict on name. I think if this case hsould be very rare and is an admin error. We can surface it as a warning and let the db keep flipflopping. WDYT?

    Even just one external service (say Github A) can yield duplicate repos. Granted we currently have deduping in the source itself, but assuming we didn't, I think this would be the same issue, so perhaps worth to solve in a single place.

  • Created by: keegancsmith

    Alternative implementation merged :) !5645 (merged)

Please register or sign in to reply
Loading