Tracking: Soft launch of indexed search over user added public code
Created by: ryanslade
Database changes
-
Create the external_service_repos table and migrate external_service sources column data to that table. Update stores to read sources data from that new table. #12789 -
Add namespace_user_id column to the external_services table and update stores to load and store data from that column. #12701 (closed)
Security
-
User owned external services are only allowed for Github.com, Gitlab.com and Bitbucket.org. This should be enforced at the API layer as well as being limited by the UI #13430 -
Should we disable the RepositoryPathPattern
config from user added repos? It could potentially allow users to override the name of existing repos -
Ensure that: -
User added external services can ONLY sync public code https://github.com/sourcegraph/sourcegraph/pull/13626 -
Site admin added external services can add public and private code https://github.com/sourcegraph/sourcegraph/pull/13626
-
-
secrets: use base64 and make interface private #13850 (closed) -
Transparent encryption and decryption for all tables that contain secrets or tokens #13851 (closed) -
The sources json column returned by listReposQueryFmtstr
should not include tokens https://github.com/sourcegraph/sourcegraph/issues/13614
-
-
Repository Settings > Mirroring should not display token in the UI #13852 (closed) -
Encryption bootstrap #13853 (closed) -
Eyeball validate data being encrypted in the database #13854 (closed) -
User documentation: data security policy #13855 (closed) -
Detect and delete public repository added-on-visit when it become private #13978 (closed)
Repo syncing
-
Add last_synced_at and next_sync_at to external services table #12701 (closed) -
On cloud, only fetch external services that were created by site admins here: https://github.com/sourcegraph/sourcegraph/blob/a95ac70cc52c8208233591cfbe896706620fa88c/cmd/repo-updater/shared/main.go#L127 -
Create worker function that syncs a single external service and updates next_sync https://github.com/sourcegraph/sourcegraph/pull/13483 -
We should ignore repositoryPathPattern #13092
-
-
Use internal/workerutil to schedule concurrent syncing of external services based on next_sync time. Worker concurrency should be configurable. On Sourcegraph Cloud we need to enable the full syncer, but only for user owned external services. https://github.com/sourcegraph/sourcegraph/pull/13483 -
Add an alert if our external service job queue grows too large https://github.com/sourcegraph/sourcegraph/issues/14045 -
Complete jobs should be cleaned up from the external_service_sync_jobs
table -
Our Diff function always returns true for Modified, we need to improve this so that we can more efficiently backoff: https://github.com/sourcegraph/sourcegraph/blob/cc9ccc4d217bff1f2d94efab1a4bc5e625cc7d90/cmd/repo-updater/repos/types.go#L642 -
Investigate the impact of calling SyncSubset from within SyncExternalService. We currently do this via the streaming inserter: https://github.com/sourcegraph/sourcegraph/blob/main/cmd/repo-updater/repos/syncer.go#L178 -
Syncer.SyncSubset is only ever used with one repo. Simplify the code. -
The repos.Store interface is very large. Can we reduce it by using the DBStore directly in the Syncer and only exposing methods on the store interface that are required outside of the package? https://github.com/sourcegraph/sourcegraph/issues/14092 -
The streaming inserter which is called on every sync fetches all repos, but only needs external ids. We should switch to a dedicated query #13671 -
SyncExternalService
now has some complicated name conflict resolution. Can we make this simpler be fetching repos from the currently syncing external service AND any possible conflicts before calculating the diff? -
SyncExternalService is provided a Store that is already in a transaction as it is passed in by the worker handler. This means that the streaming inserter also uses a transaction which probably doesn't make sense as all repos will roll back on failure. -
Write some high level documentation describing how repo updater currently works while it is fresh -
New syncer should use it's own logger: https://github.com/sourcegraph/sourcegraph/issues/13718 -
Should SyncRepo
include the external service as a parameter? Perhaps we should be passing in the default GitHub and GitLab repos as a param? This would ensure that there is no way for a repo to be added without a corresponding entry intoexternal_service_repos
Fetching external services
-
Update ExternalServicesStore.List to allow filtering by user. #12704 (closed) -
Update call sites of ExternalServicesStore.List to avoid application-level aggregation. #12760 (closed) -
Update ExternalServicesStore.List to support pagination. #12822 (closed) -
Update repo-updater SyncRateLimiter to support pagination. https://github.com/sourcegraph/sourcegraph/pull/12911 https://github.com/sourcegraph/sourcegraph/pull/13002 -
Update API clients of external services endpoints to support pagination. https://github.com/sourcegraph/sourcegraph/pull/12975 -
Optimize reposourceCloneURLToRepoName #12944 (closed)
Search
-
Amend search logic to fetch a list of repos configured by a user and include them by default when no repo filter is provided.Ensure that repos added by users are included in the search index.
Handling deletion
-
When a user is deleted (both soft and hard), update their external services deleted_at column. #12965 -
Ensure any entry in external_service_repos is deleted in cascade -
When an external service is (soft or hard?) deleted, determine which repos are orphaned by counting the number of references in the external_service_repos table and marked them as deleted. This must be done within the same transaction as the one deleting the external service. We need to be careful that if a repo is SOFT deleted it can be added later without causing this index to fail: "repo_external_unique_idx" UNIQUE, btree (external_service_type, external_service_id, external_id)
#13273
Progressive rollout
-
Add the external_service_user_mode feature flag as part of the configuration. By default, it will be disabled but if it contains public it will be considered enabled for public code. Anything other than public will disable the flag. Use the flag to protect the external service config page and related GraphQL api to users. #13052 -
Add external_service_user_allow_list configuration option. It will contain a list of regex patterns that must be used to limit the users who will be able to create external services if the external_service_user_mode flag is enabled. https://github.com/sourcegraph/sourcegraph/pull/13099 -
Add external_service_user_allow_percentage configuration option. It will contain a value from 0 to 100. If the external_service_user_mode flag is enabled, use that value to limit the number of users that can create external services, by hashing their user id and applying a modulo 100 on it. If the result is under that value, the user has access to that feature. https://github.com/sourcegraph/sourcegraph/pull/13099 -
If a user is already allowed by external_service_user_allow_list, ignore this check https://github.com/sourcegraph/sourcegraph/pull/13099
NOTE: The above three items can all be solved by checking for tag existing on a user. For now, if a user has the AllowUserExternalServicePublic
tag they will be allowed to add their own external services.
-
Create the SQL script that opts in a certain percentage of users. It should be based on a hash of the user id so that running it again with an increased percentage has the desired effect.
This seems to do the trick:
-- Get a repeatable sample of 10%
WITH sample as (SELECT id FROM users TABLESAMPLE BERNOULLI(10) REPEATABLE(1))
-- Update users in the sample
UPDATE users SET tags = array_append(tags, 'AllowUserExternalServicePublic')
WHERE EXISTS (SELECT id from sample)
-- Unless they are already tagged
AND NOT(tags @> ARRAY['AllowUserExternalServicePublic']);
Metrics
-
Track number of code host connections added by users #13589 -
Track total number of repositories added by users #13589 -
Average repos per user can be derived from the above #13589 -
Track whether a user has added any external service (@ebrodymoore will help with this) -
Given the above, we can derive the visits per user with their own external services vs without
-
-
Track when feature flag is enabled(@unknwon: I think the metrics can be exposed regardless) -
Syncer metrics need to include external service id https://github.com/sourcegraph/sourcegraph/pull/13742
Abuse mitigation
-
Update repo-updater to support user-added repo listing limits per user. https://github.com/sourcegraph/sourcegraph/issues/14043 -
Update repo-updater to support user-added repo listing limits in total. https://github.com/sourcegraph/sourcegraph/issues/14043
Load testing
-
TODO
UI Changes
-
Replicate pages for “manage repositories”. #13095 -
Backend changes to allow CRUD of external services by users. #13173 -
Web app should respect feature flag in JSContext per user. #13095