Proposal PR: Remove the management console (馃 unicorns and 馃寛 rainbows here)
Created by: slimsag
The management console today is the single most significant barrier that site admins encounter when trying to configure their Sourcegraph instance, which is a major blocker for customer onboarding and ARR growth. This (fully-functioning) PR proposes removing it, making the user experience better, reducing technical complexity (-8k LOC) and closing 18 open issues in the process.
From admins not saving their password (#5186), confusing browser TLS warnings prompts (#6070), it being difficult to access (#1934), complexity poking firewall holes properly (#2478), a general lack of maintenance and significant technical complexity for us (#5760, #5350, #4235, #4213, #2759, #2404, #2322, #2095, #2090, #1822, #1731, #1685, #1435, #1432), even slowing down our CI builds (#6770) -- the management console overall is a massive source of problems for our users and complexity for us from Engineering, Sales, and overall Business standpoint.
But before we go around suggesting we remove things willy-nilly, why did we add it in the first place and why does it make sense to remove now?
Management console: a brief history
In November of last year in PR #966 we introduced the management console. At the time, Sourcegraph configuration was provided through a JSON configuration file mounted into the container and this presented challenges for site admins going through our onboarding / trial process:
- Admins needed to not only figure out the right properties to plug in for e.g. their auth provider and code host (which can be quite complex), but also needed to do so in their editor without any autocomplete or helpful tooltips to guide them.
- After making some edits, it required they restart all containers for the changes to take effect.
All in all, this was a tedious process and made configuration very difficult. To resolve this, we made all configuration dynamic so that restarting services was not required. However, some configuration properties such as authentication if misconfigured could lock the site admin out of their instance and prevent them from fixing their configuration after. To resolve this, we took inspiration from other similar tools and created the management console.
Our hope was that the management console and overall configuration UI would eventually become more intuitive, UI-based / beginner-friendly (instead of just a JSON editor), and more. In practice, this work was de-prioritized in favor of other more important work.
Additionally, the management console introduced security concerns which meant we had to lock it down heavily with a self-signed TLS cert on new instances and a strong auto-generated basic auth password in order for it to be sufficiently secure. You can read more about this in my prior proposal RFC 63 REVIEW: Authless management console access.
The management console succeeded in lowering the barrier to onboarding at the time we added it, because it meant we could make a majority of the options (site configuration) editable easily at runtime.
What has changed?
We can do better.
We understand our product and the configuration options it must provide more intimately than anyone else. When we added it last year, we closely mimicked best-practices in similar software -- and in the meantime, we have grown to make our product more resilient towards configuration. Nearly every single configuration option since has been made dynamic (editable at runtime without requiring a restart), we have paid more close attention to configuration issues, and most important we have learned what the problems are with the configuration of Sourcegraph on its own merits without mimicking other products.
Before starting this change, I wanted to ensure I had a full grasp on the entire situation from a technical point of view. To do this, I compiled a complete list of all existing critical configuration / management console options and determined whether or not they fit the original criteria for what belongs in the management console (things that, if misconfigured, could lead to the admin being locked out). You can find this document here.
Once this was written, two things became clear to me:
- Not a single critical configuration option can prevent the frontend from starting. This was not true when we added the management console originally and has important implications I will get to later.
-
Only 5 configuration options could lock an admin out of the instance:
- One, the
licenseKey
, I resolved last night in #7181 - The remaining 4 would need to be set to ridiculously stupid values in order to lock an admin out in practice. Removing all auth providers, setting session expiry to a very low value, adding custom HTML which bricks the frontend web UI, etc.
- This is not to say these cases are impossible, but that they are very unlikely in practice and I have not once heard of an admin ever doing this by mistake.
- One, the
Given this data, we can shift around our tradeoffs from what they are today:
- 95% of admins onboarding must go through the painful escape hatch (the management console)
- 5% of admins needing an escape hatch have a nice/friendly UI.
To something more amicable to our goals such as:
- 0% of admins onboarding must go through the painful escape hatch.
- 5% of admins needing an escape hatch still have one, and it's MUCH friendlier than the management console in its current form.
Solution
- The management console is removed - gone.
- All configuration options previously found in the management console are automatically migrated into the site configuration.
- We no longer have both "critical" and "site" configuration: we now have just "site" configuration and one editor / schema.
- In 95% of cases, admins can make all configuration edits through the regular web UI without any trouble.
- In 5% of cases where an admin messed up bad and needs an escape hatch, we provide them a command to edit a file in the Docker container using e.g.
vi
ornano
. This file has your current site configuration contents, and any edits made to it are synchronized with the database. The file is otherwise ephemeral (i.e. DB remains the source of truth for configuration, the file just allows edits as an escape hatch). How this looks:
Editing your site configuration if you cannot access the web UI
If you are having trouble accessing the web UI, you can make edits to your site configuration by editing a file in the container using the following commands:
Single-container Docker instances
docker exec -it $CONTAINER -- nano /site-config.json
Kubernetes cluster instances
kubectl exec -it $FRONTEND_POD -- nano /site-config.json
Perform your edits, type ctrl+x and y to save the changes. They will be applied immediately just as if made through the web UI. If you prefer
vi
, simply replacenano
in the commands above.
Next steps
100% of what I described above is already implemented in this PR. This could not have been written as an RFC because the solution was inherently tied to the constraints we have in our product -- which could not be uncovered without implementing this.
If this proposal is accepted, we can land this in 3.11 easily.
-
Get feedback from Product to confirm they believe this is a good step forward. cc @sqs @christinaforney -
Get help from @beyang to resolve issues in the regression test suite -- it relied on the management console API for configuration and must now go through the app itself instead. -
Write and update both documentation and in-product documentation/links. -
Write paired changes for deploy-sourcegraph and deploy-sourcegraph-docker to remove the management console.
Fixes
- Fixes #5186
- Fixes #6070
- Fixes #1934
- Fixes #2478
- Fixes #5760
- Fixes #5350
- Fixes #4235
- Fixes #4213
- Fixes #2759
- Fixes #2404
- Fixes #2322
- Fixes #2095
- Fixes #2090
- Fixes #1822
- Fixes #1731
- Fixes #1685
- Fixes #1435
- Fixes #1432
- Helps #6770