authz: use ExternalAccountSpec to identify user for pending permissions
Created by: unknwon
This PR is a schema proposal for user_pending_permissions
table, and is not meant to be merged without further completion (upon approval).
Context
(Please correct me if anything wrong)
In our current implementation, the bridge between a Sourcegraph user and a code host (e.g. GitLab) user is the user_external_accounts
table. More specifically, are these three columns: service_type
, service_id
, and account_id
.
Values of these three columns are from ExternalAccountSpec
:
https://github.com/sourcegraph/sourcegraph/blob/6b7e72a311ccd574496b0278f980879a64a48c57/internal/extsvc/types.go#L19-L26
How do we verify repository permissions of a given user?
There are two cases:
- We have a row in
user_external_accounts
table for the user, which tells us which authz provider we should use.- This row could be created in two ways:
- The user signs in via the code host authentication, which also configures a authorization provider (i.e.
"authorization": {...}
field) in the code host connection. - Last time we fetched permission for this user.
- The user signs in via the code host authentication, which also configures a authorization provider (i.e.
- This row could be created in two ways:
- We couldn't find a row associated an existing authz provider for the user, we do
(authz.Provider).FetchAccount
then(db.ExternalAccounts).AssociateUserAndSave
to save the information.
ExternalAccountSpec.ClientID
?
What is As far as I know, at the very least, from the the documentation, it plays no role in authorization (i.e. only matters in authentication process): https://github.com/sourcegraph/sourcegraph/blob/6b7e72a311ccd574496b0278f980879a64a48c57/cmd/frontend/graphqlbackend/schema.graphql#L3119-L3121
Problem
Previously, only the Sourcegraph authz provider (aka "repository explicit permissions API") uses the two pending permissions tables, and we use either username or email as the bind ID to grant user permissions upon sign up.
Ideally, we should continue using email as bind ID for authz providers with RFC 113 ("Move GitLab permissions syncing to the background in repo-updater"). Unfortunately, the most efficient GitLab API we're going to use does not return email addresses for users.
Thus, using username as bind ID become the only option with existing design. However, it has quite a few problems:
- We aren't able to support multiple authz providers for late-binding users, there are chances of username collisions. In fact, we must ensure there is absolutely no chance of collision in terms of permissions.
- It breaks all existing customers who use GitLab and GitHub authz providers for permissions if they don't enforce username matching between Sourcegraph instance and the code host (Bitbucket Server users are enforced username matching as we originally documented, but no luck for GitHub and GitLab users).
Proposal
Adding two extra columns to user_pending_permissions
table, which are service_type
and service_id
(see diff for constraints), and existing bind_id
column can be reused as account_id
for all code host authz providers.
How will the permissions syncing flow look like then?
At background syncing:
- For existing users, we use
user_external_accounts
table to find relations. - For not-yet-created users, we save equivalent information (i.e.
bind_id
,service_type
andservice_id
) touser_pending_permissions
table.
Upon user sign up:
- If a user is authenticated via SSO, great, we look for the rows with the same
bind_id
/account_id
,service_type
andservice_id
inuser_pending_permissions
table. - If a user did self-sign up,
- we first try a
(authz.Provider).FetchAccount
- then
(db.ExternalAccounts).AssociateUserAndSave
, - then look for the rows with the same
bind_id
/account_id
,service_type
andservice_id
inuser_pending_permissions
table.
- we first try a
repo_pending_permissions
table?
Why not adding columns to We use internal database ID (i.e. repo_id
) in this table, no chance of collision.