Skip to content

dependencies api: Batch database updates

Warren Gifford requested to merge ef/batch-upsert-dependency-repos into main

Created by: efritz

This PR refactors some of the dependency API internals to treat the database a little bit more kindly 😅 . Previously, we had a shared worker pool to stream dependencies and upsert/sync them. This means that we can have up to MAX_CONCURRENCY connections upserting into the database at once. Now, we have a shared worker pool to stream dependencies, collect them in memory, synchronously upsert large batches, then sync all new repos in a second shared worker pool.

Refactor changes include:

  • Returns dependency repos in ascending order of ID rather than descending
  • Adds instrumentation to the upsert method in the store
  • Makes the worker pool semaphore shared across all requests
  • Adds a safety check to the batch inserter returningScanner (I really confused myself here before I remembered the proper semantics; it was not intuitive)
  • Parallelize ListDependencies

Test plan

Unit and integration tests: main-dry-run.

Merge request reports

Loading