Skip to content

Improve performance for batch changes with many changesets

Administrator requested to merge es/large-bc-perf into main

Created by: eseliger

I have a local batch change that includes a bit over 10000 changesets and I was always annoyed by how slow it is, so I finally took a look into the trace and found that we're still fetching all changesets in the connection, just to filter by repo permissions then. Also, I found some general DB performance things. Commit-wise review is encouraged :) Hint: Tests come at the end of the git history.

List of changes:

  • Add index on JSON "join" column A GIN index of type jsonb_ops makes the ? operator way faster, this got down the duration of some queries like GetChangesetsStats and GetBatchChangeDiffStat (probably also ListChangesets, CountChangesets, GetRewirerMappings and ListChangesetSyncData, as they also use this operator, but I didn't test them explicitly) for my local database with around 20000 changesets. This makes querying changesets by batch change a lot faster, namely around 100ms.
  • Just fetch required events for labels resolver We always fetched all events when computing labels, which for larger changesets meant some bandwidth usage because the metadata can be quite big, also lots of parsing overhead unmarshalling the JSON. This fixes it by only loading what's required.
  • Use DB-based authz in changeset connection Now that we can query repo permissions directly in the database, we no longer need to fetch all changesets first to then check how many are accessible and then do the page slicing manually. This also makes the code incredibly simpler. Now that we use the count method again, I needed to catch up on option parity between count and list. This change brought down getting a 30 element slice of a batch change with 10k changesets from the API from ~11s to ~160ms.

Merge request reports

Loading