recent pkg/markdown changes are poor, dangerous, and have caused regressions
Created by: slimsag
TL;DR: We're using docsite/markdown everywhere since Oct. and nobody realizes this, because it being in docsite kind of implies that only docs.sourcegraph.com uses it. You wouldn't expect it to be used to render every Markdown file in Sourcegraph. This package: is not subject to the same review/testing as our main app, is a security concern, and has already caused multiple regressions in our main app.
Proposal: Revert back to a state where we are not importing docsite/markdown. We will soon not need to use this package anyway, because we intend to remove the built in docs.
Back in Oct. @sqs made a change to move cmd/frontend/internal/pkg/markdown
to pkg/markdown
. In this change, a new dependency github.com/sourcegraph/docsite/markdown
was added which is now responsible for markdown rendering. Unfortunately, I missed this change and failed to review it. The commit message:
use docsite markdown package instead of github_flavored_markdown
The github.com/sourcegraph/docsite/markdown package performs roughly the same formatting but uses a newer underlying blackmonday library. This commit removes external dependencies but otherwise is a noop. Removing the dependency on github_flavored_markdown is necessary for a future commit where we need the new version of blackmonday (and it can't coexist with an older version at the same import path).
This change worries me because:
- Our main application now imports a dependency from github.com/sourcegraph/docsite and uses it heavily.
- It is confusing that a change to docsite/markdown also applies to the entire Sourcegraph main app.
- Nobody is testing those changes against Sourcegraph (except maybe @sqs).
-
It has already caused multiple regressions:
- Broken highlighting on commit and diff search entirely
- Panics in the frontend when trying to send email notifications for code discussions
- (Yes, both of these regressions should have been caught otherwise, but that is not the point).
- I am worried about the security implications. For example, @sqs added a HTML markdown function to render JSON schema files that get loaded from disk. In the context of docs.sourcegraph.com that sounds totally fine and safe because we control the markdown -- but when we are rendering user-provided markdown on sourcegraph.com that starts to sound a bit more scary. It's probably not a problem today, but what about when someone modifies this package and doesn't realize their change could regress our security?
- I do not feel like some choices made in the docsite/markdown package are the best for Sourcegraph the main app (such as using bfchroma for syntax highlighting code blocks, which will produce inconsistent results with our main syntax highlighter).