Skip to content

Search results UI: Fix search result match range highlighting after a Unicode character

Administrator requested to merge tl/fix-unicode-match-highlight into main

Created by: tbliu98

Fixes #25088 (closed)

Change highlightNode.ts to highlight ranges based on code point index instead of byte index.

In highlightNode.ts, string indexing is used to highlight match ranges in search results. However, the range indexes coming from the search backend are counted in terms of runes, whereas string indexing is based on bytes. JavaScript uses UTF-16 string encoding, which encodes Unicode code points as either one or two hex characters. This was causing highlight ranges to be offset by one when the range came after a two-hex-character-encoded Unicode code point, such as certain emojis (see linked issue for example). Unpacking a string into an array of code points before slicing the range to be highlighted fixes this issue.

Test plan

Added unit test for highlighting a range containing a Unicode character, and also tests for highlighting ranges before and after a Unicode character.

Manually tested using this search query: repo:^github\.com/sourcegraph/sourcegraph$ file:ghe-2.14.11/pull-request-discussion/vanilla/code-view.html emoji and verifying that all occurrences of emoji are correctly highlighted.

App preview:

Check out the client app preview documentation to learn more.

Merge request reports

Loading