worries about diff highlighting

Created by: slimsag

I have a number of concerns with the new diff highlighting implementation:

This file having no test coverage is concerning. We should be adding test coverage not regressing on it, especially with changes as complex as this.
preview_file_diff_highlighter.go should not exist. The fact that we needed to introduce a new file for this logic inside the GraphQL layer is a strong indication that this code carries a lot of complexity which would be better situated in the internal/highlight package.
preview_file_diff_highlighter.go is entirely untested.
In preview_file_diff_highlighter.go, we call back into the resolver itself to implement the resolver itself. This is incredibly confusing and is going to be brittle in the future: https://github.com/sourcegraph/sourcegraph/pull/10437/files#diff-363d9f0b68e9ec7dd8cd75a3fd5a22c9R31
Binary-file checks like this, this, and this appear to be sprinkled throughout the code. This smells to me of defensive coding, especially because highlight.Code already handles binary files appropriately. These checks should not be needed anywhere I believe, because if the file is binary highlight.Code is going to give an error which should be handled without a prior "is binary" check.
highlight.ParseLinesFromHighlight is exposing concerns that package consumers should not be concerned about. This introduces the idea that one should parse the highlighted HTML outside of the highlight package and manipulate it, which opens a whole can of worms for potential regressions in the future. I can test that internal/highlight does not regress when we update syntect, for example, but I cannot easily test all downstream consumers of it. This is risky AND makes the graphql resolvers much more lengthy which is bad. Instead, there should be something like highlight.Diff as I mentioned in an earlier comment on the PR.

This is a great change and feature to have at the end of the day, but I am worried this change was merged prematurely and will cause us complexity / pain in the future. Can someone commit to addressing these concerns (no need to revert)?

My largest worry here is that we're reducing separation of concerns in the backend code and have further made the graphqlbackend package an even bigger codebase handling even more things, which is a huge problem for us. @mrnugget what are your thoughts on this (since you reviewed/approved)?