authz: slow down primary key grow pace of "user_pending_permissions" table
Created by: unknwon
Context
We're using 32-bit integer (int32
in Go / int4
in Postgres) to store a user ID, and we are using roaring bitmap as the efficient in-memory representation for a collection of user IDs, which only supports uint32
. Therefore it was a natural choice to use int4
as the column type for user_pending_permissions
table (same as users
table).
What is the problem with existing approach?
Just like we use a primary key with auto-increment property for users
table, we have the same use case for user_pending_permissions
. The problem comes from that Postgres still monotonically increases the primary key with failed inserts (which based on our query, it becomes an upsert).
In our case, if there are lots of users with pending permissions, we're making lots of failed inserts with every call of SetRepoPendingPermissions
. The grow pace of the primary key can be simply calculated as follows:
If a repository has 10k users (that are not yet having an account in the Sourcegraph instance, but have access to on the code host), 2,147,483,647 (the largest number a signed 32-bit integer can hold) would only support 214,748 calls of SetRepoPendingPermissions
. It may seem a lot at a first glance, but imagine our background syncing process picks 10 repositories every minute, that gives us 214748 / 10 / 60 / 24 ~= 15
days, not to mention the initial syncing round would have a call to SetRepoPendingPermissions
for every single repository.
Why no customers have ran into this problem?
My hypothesis:
Most of users on the code host have onboarded to the Sourcegraph instance, which makes the primary key grow pace very slow, e.g. < 100 for every call of SetRepoPendingPermissions
, that is about 21474800 / 10 / 60 / 24 ~= 1500
days to reach the largest number (background permissions was introduced about 18 months / 540 days ago, that leaves customers wildcard-v1 days).
How the approach in this PR helps?
Instead of naively relying on upsert for massive existing rows (that causes fail inserts), we first load existing rows and only do inserts for newly discovered rows. This means if there are 20k users on the code host, the primary key should also only grow to around 20k.