Skip to content

Support syntax highlighting on languages with conflicting file extensions

Created by: slimsag

Languages such as C, C++, ObjC, ObjC++, Cuda, D, and more all use .h as their file extension for example. See here

Today, our syntax highlighting will choose the language based purely on file extension. See here and it merely chooses the first one it finds alphabetically when there are conflicts, which is C. See here

To fix this, we will need to do the following:

  1. Come up with a system that allows users to configure what specific files in specific repos should be highlighted as. For example, in user settings allow people to define repo and file matchers with glob syntax:
"highlighting": [
    {"match": "github.com/my/repo@main/src/cuda/*.h", language: "Cuda"},
    {"match": "*org/repo-objc*src/**/*.h", language: "Objective-C"},
]
  1. Change the syntect_server API to accept a language name to highlight the code as. This will override the filepath if present (which is used to lookup the syntax by file extension), and when present syntect_server should use find_syntax_by_name to locate the appropriate syntax.
  2. Update gosyntect to use the new API: https://sourcegraph.com/github.com/sourcegraph/gosyntect/-/blob/gosyntect.go#L22-26
  3. Use the new gosyntect API in sourcegraph/sourcegraph: https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/frontend/internal/highlight/highlight.go#L174

This would fix the issue for viewing files directly, but not for some search result types which use this hacky "language map". In order to fix that, we would need to remove that map entirely and instead pass the markdown code block token directly to syntect_server and lookup the syntax using find_syntax_by_token.

Beware however, there are in-flight PRs which would conflict with this work: