Skip to content

authz: use ExternalAccountSpec to identify user for pending permissions

Warren Gifford requested to merge jc/pending-perms-external-account-spec into master

Created by: unknwon

This PR is a schema proposal for user_pending_permissions table, and is not meant to be merged without further completion (upon approval).

Context

(Please correct me if anything wrong)

In our current implementation, the bridge between a Sourcegraph user and a code host (e.g. GitLab) user is the user_external_accounts table. More specifically, are these three columns: service_type, service_id, and account_id.

Values of these three columns are from ExternalAccountSpec: https://github.com/sourcegraph/sourcegraph/blob/6b7e72a311ccd574496b0278f980879a64a48c57/internal/extsvc/types.go#L19-L26

How do we verify repository permissions of a given user?

There are two cases:

  1. We have a row in user_external_accounts table for the user, which tells us which authz provider we should use.
    • This row could be created in two ways:
      • The user signs in via the code host authentication, which also configures a authorization provider (i.e. "authorization": {...} field) in the code host connection.
      • Last time we fetched permission for this user.
  2. We couldn't find a row associated an existing authz provider for the user, we do (authz.Provider).FetchAccount then (db.ExternalAccounts).AssociateUserAndSave to save the information.

What is ExternalAccountSpec.ClientID?

As far as I know, at the very least, from the the documentation, it plays no role in authorization (i.e. only matters in authentication process): https://github.com/sourcegraph/sourcegraph/blob/6b7e72a311ccd574496b0278f980879a64a48c57/cmd/frontend/graphqlbackend/schema.graphql#L3119-L3121

Problem

Previously, only the Sourcegraph authz provider (aka "repository explicit permissions API") uses the two pending permissions tables, and we use either username or email as the bind ID to grant user permissions upon sign up.

Ideally, we should continue using email as bind ID for authz providers with RFC 113 ("Move GitLab permissions syncing to the background in repo-updater"). Unfortunately, the most efficient GitLab API we're going to use does not return email addresses for users.

Thus, using username as bind ID become the only option with existing design. However, it has quite a few problems:

  1. We aren't able to support multiple authz providers for late-binding users, there are chances of username collisions. In fact, we must ensure there is absolutely no chance of collision in terms of permissions.
  2. It breaks all existing customers who use GitLab and GitHub authz providers for permissions if they don't enforce username matching between Sourcegraph instance and the code host (Bitbucket Server users are enforced username matching as we originally documented, but no luck for GitHub and GitLab users).

Proposal

Adding two extra columns to user_pending_permissions table, which are service_type and service_id (see diff for constraints), and existing bind_id column can be reused as account_id for all code host authz providers.

How will the permissions syncing flow look like then?

At background syncing:

  • For existing users, we use user_external_accounts table to find relations.
  • For not-yet-created users, we save equivalent information (i.e. bind_id, service_type and service_id) to user_pending_permissions table.

Upon user sign up:

  • If a user is authenticated via SSO, great, we look for the rows with the same bind_id/account_id, service_type and service_id in user_pending_permissions table.
  • If a user did self-sign up,
    1. we first try a (authz.Provider).FetchAccount
    2. then (db.ExternalAccounts).AssociateUserAndSave,
    3. then look for the rows with the same bind_id/account_id, service_type and service_id in user_pending_permissions table.

Why not adding columns to repo_pending_permissions table?

We use internal database ID (i.e. repo_id) in this table, no chance of collision.

Merge request reports

Loading