Skip to content

frontend: Increase batch size from 500 to 1250

Administrator requested to merge core/repo-list-batch-size into master

Created by: mrnugget

This is a follow-up to #4279 and adjusts the batch size to a value backed up by benchmarks.

Local benchmark setup

  • Disabled cloning in gitserver
  • Added 4230 repos
  • NO_KEYCLOAK=1 ./enterprise/dev/start.sh

Test script

Since none of the repos is cloned, we ask for the first 20 cloned repositories, because for that we need to traverse all 4230 to get the first 20.

$ cat get_cloned_repos.sh

#!/usr/bin/env bash

time curl 'http://localhost:3080/.api/graphql?Repositories' \
  -H "Authorization: token $SRC_TOKEN" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -s \
  --data-binary '{
  "query": "query Repositories( $first: Int $query: String $cloned: Boolean $cloneInProgress: Boolean $notCloned: Boolean $indexed: Boolean $notIndexed: Boolean ) { repositories( first: $first query: $query cloned: $cloned cloneInProgress: $cloneInProgress notCloned: $notCloned indexed: $indexed notIndexed: $notIndexed ) { nodes { id name createdAt viewerCanAdminister url mirrorInfo { cloned cloneInProgress updatedAt } } totalCount(precise: true) pageInfo { hasNextPage } } }",
  "variables": {
    "cloned": true,
    "cloneInProgress": false,
    "notCloned": false,
    "indexed": true,
    "notIndexed": true,
    "first": 20,
    "query": ""
  }
}' >/dev/null

Per batch I then ran the script 10 times in a loop:

for i in $(seq 1 10); do echo "---- Run ${i} ----\n"; ./get_cloned_repos.sh; done

Results

Batch size:  500, avg req duration after 10 requests: 1.4s
Batch size:  750, avg req duration after 10 requests: 0.984s
Batch size: 1000, avg req duration after 10 requests: 0.848s
Batch size: 1250, avg req duration after 10 requests: 0.663s
Batch size: 1500, avg req duration after 10 requests: 0.686s
Batch size: 1750, avg req duration after 10 requests: 0.702s

With batch sizes greater than 1250 diminishing returns kick in (and memory consumption increases), so I chose 1250

Caveats

  1. Of course, with a larger repo table the increases in performance might continue after 1250, because more roundtrips are saved. But since we're already talking about ~600ms, which is quite tolerable, we can put off further tuning.
  2. Benchmarking this locally in a realiable way is quite hard, since the benchmark runs are quite short and, in my case, the filter operations seemingly triggered macOS's opendirectoryd to go crazy in regards to CPU usage. In short: I can't guarantee that there weren't any interferences when running these tests

Test plan: go test & manual looking at repositories list in browser

Merge request reports

Loading