Search results UI: Fix search result match range highlighting after a Unicode character (!35965) · Merge requests · Administrator / sourcegraph

Administrator requested to merge tl/fix-unicode-match-highlight into main May 24, 2022

Created by: tbliu98

Change highlightNode.ts to highlight ranges based on code point index instead of byte index.

In highlightNode.ts, string indexing is used to highlight match ranges in search results. However, the range indexes coming from the search backend are counted in terms of runes, whereas string indexing is based on bytes. JavaScript uses UTF-16 string encoding, which encodes Unicode code points as either one or two hex characters. This was causing highlight ranges to be offset by one when the range came after a two-hex-character-encoded Unicode code point, such as certain emojis (see linked issue for example). Unpacking a string into an array of code points before slicing the range to be highlighted fixes this issue.

Test plan

Added unit test for highlighting a range containing a Unicode character, and also tests for highlighting ranges before and after a Unicode character.

Manually tested using this search query: repo:^github\.com/sourcegraph/sourcegraph$ file:ghe-2.14.11/pull-request-discussion/vanilla/code-view.html emoji and verifying that all occurrences of emoji are correctly highlighted.

App preview:

Check out the client app preview documentation to learn more.

Search results UI: Fix search result match range highlighting after a Unicode character

Test plan

App preview:

Merge request reports