authz: slow down primary key grow pace of "user_pending_permissions" table (!24506) · Merge requests · Warren Gifford / sourcegraph

Warren Gifford requested to merge jc/perms-store-query-before-upsert into main Sep 01, 2021

Created by: unknwon

Context

We're using 32-bit integer (int32 in Go / int4 in Postgres) to store a user ID, and we are using roaring bitmap as the efficient in-memory representation for a collection of user IDs, which only supports uint32. Therefore it was a natural choice to use int4 as the column type for user_pending_permissions table (same as users table).

What is the problem with existing approach?

Just like we use a primary key with auto-increment property for users table, we have the same use case for user_pending_permissions. The problem comes from that Postgres still monotonically increases the primary key with failed inserts (which based on our query, it becomes an upsert).

In our case, if there are lots of users with pending permissions, we're making lots of failed inserts with every call of SetRepoPendingPermissions. The grow pace of the primary key can be simply calculated as follows:

If a repository has 10k users (that are not yet having an account in the Sourcegraph instance, but have access to on the code host), 2,147,483,647 (the largest number a signed 32-bit integer can hold) would only support 214,748 calls of SetRepoPendingPermissions. It may seem a lot at a first glance, but imagine our background syncing process picks 10 repositories every minute, that gives us 214748 / 10 / 60 / 24 ~= 15 days, not to mention the initial syncing round would have a call to SetRepoPendingPermissions for every single repository.

Why no customers have ran into this problem?

My hypothesis:

Most of users on the code host have onboarded to the Sourcegraph instance, which makes the primary key grow pace very slow, e.g. < 100 for every call of SetRepoPendingPermissions, that is about 21474800 / 10 / 60 / 24 ~= 1500 days to reach the largest number (background permissions was introduced about 18 months / 540 days ago, that leaves customers wildcard-v1 days).

How the approach in this PR helps?

Instead of naively relying on upsert for massive existing rows (that causes fail inserts), we first load existing rows and only do inserts for newly discovered rows. This means if there are 20k users on the code host, the primary key should also only grow to around 20k.

authz: slow down primary key grow pace of "user_pending_permissions" table

Context

What is the problem with existing approach?

Why no customers have ran into this problem?

How the approach in this PR helps?

Merge request reports