Skip to content

gitserver rebalancing/sharding logic should be smarter

Created by: slimsag

If you introduce or remove a gitserver replica, the consistent hash on repo name means almost all repositories will be reassigned to another gitserver (example) which has negative consequences like:

  • Most repositories will be recloned from the code host
  • Most searches will remain fast (no re-indexing will be needed), but search results may load a bit slowly while repositories are cloning.
  • Unindexed searches (non-master branches, commit/diff search, etc.) may be slower while repositories re-clone
  • Users visiting repositories directly on Sourcegraph may be prompted to wait a few seconds while the repository reclones

Example:

you have 10,000 repositories across 3 gitserver instances:

  • gitserver-1 contains repos 0 to 3,333
  • gitserver-2 contains repos 3,333 to 6,666
  • gitserver-3 contains repos 6,666 to 10,000

You introduce a new gitserver-4, something like the following will happen:

  • gitserver-1 now begins cloning repos previously assigned to gitserver-2
  • gitserver-2 now begins cloning repos previously assigned to gitserver-3
  • gitserver-3 now begins cloning repos previously assigned to gitserver-1
  • gitserver-4 now begins cloning 1/4th the repositories

The load will be even in the end, with each having 1/4th, but gitservers 1, 2, and 3 had their repositories unavailable for a period of time because everything got shuffled around and they had to reclone everything. What would be better (and what indexed-search does) is merely shift 1/4th the load to the new 4th replica, without the original replicas (effectively) starting from scratch (i.e., they take into account the data they already have).

Additionally, if a gitserver replica goes down for an extended period of time it becomes an outage of that entire subset of repositories, instead of the load rebalancing across shards.

indexed-search does not have these same issues, because it shards based on the hostname. We should do the same for gitserver - but care must be taken to ensure we respect the existing sharding scheme or migrate it appropriately so there is no service degradation for instances upgrading to this new scheme.