Skip to content

Approach 2: add repo_statistics and gitserver_repos_statistics tables

Warren Gifford requested to merge mrn/repo-stats into main

Created by: mrnugget

This is the 2nd approach to implementing these statistics tables after we discovered that the original approach in https://github.com/sourcegraph/sourcegraph/pull/39660 lead to contention around the repo_statistics table.

90% of the code in here is the same as in #39660, what changes is that now we have multiple rows in the repo_statistics table:

  • Every time a repo row (and in certain cases: agitserver_repo row) is updated/inserted/deleted, we append (!) a row to repo_statistics with a diff of the total counts before/after the row change. Example: if a repo is deleted we append a row with total = -1 to the repo_statistics table.
  • At query time we use SELECT SUM(total), SUM(cloned), SUM(deleted), ... to get the current total counts.
  • A worker periodically (right now: every 30min) compacts the table by (1) getting the current counts, (2) updating the first row's columns to reflect total counts, (3) deleting all other rows.

Demo video

https://user-images.githubusercontent.com/1185253/185577375-d6d2d7da-f6a4-4aad-b940-3927a4d7dd6b.mp4

Test plan

  • Existing and new unit tests
  • Manual testing

Merge request reports

Loading