Support syntax highlighting on languages with conflicting file extensions
Created by: slimsag
Languages such as C, C++, ObjC, ObjC++, Cuda, D, and more all use .h
as their file extension for example. See here
Today, our syntax highlighting will choose the language based purely on file extension. See here and it merely chooses the first one it finds alphabetically when there are conflicts, which is C. See here
To fix this, we will need to do the following:
- Come up with a system that allows users to configure what specific files in specific repos should be highlighted as. For example, in user settings allow people to define repo and file matchers with glob syntax:
"highlighting": [
{"match": "github.com/my/repo@main/src/cuda/*.h", language: "Cuda"},
{"match": "*org/repo-objc*src/**/*.h", language: "Objective-C"},
]
- Change the
syntect_server
API to accept a language name to highlight the code as. This will override the filepath if present (which is used to lookup the syntax by file extension), and when present syntect_server should usefind_syntax_by_name
to locate the appropriate syntax. - Update gosyntect to use the new API: https://sourcegraph.com/github.com/sourcegraph/gosyntect/-/blob/gosyntect.go#L22-26
- Use the new gosyntect API in sourcegraph/sourcegraph: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/frontend/internal/highlight/highlight.go#L174
This would fix the issue for viewing files directly, but not for some search result types which use this hacky "language map". In order to fix that, we would need to remove that map entirely and instead pass the markdown code block token directly to syntect_server and lookup the syntax using find_syntax_by_token.
Beware however, there are in-flight PRs which would conflict with this work: