Search results UI: Fix search result match range highlighting after a Unicode character
Created by: tbliu98
Fixes #25088 (closed)
Change highlightNode.ts
to highlight ranges based on code point index instead of byte index.
In highlightNode.ts
, string indexing is used to highlight match ranges in search results. However, the range indexes coming from the search backend are counted in terms of runes, whereas string indexing is based on bytes.
JavaScript uses UTF-16 string encoding, which encodes Unicode code points as either one or two hex characters. This was causing highlight ranges to be offset by one when the range came after a two-hex-character-encoded Unicode code point, such as certain emojis (see linked issue for example). Unpacking a string into an array of code points before slicing the range to be highlighted fixes this issue.
Test plan
Added unit test for highlighting a range containing a Unicode character, and also tests for highlighting ranges before and after a Unicode character.
Manually tested using this search query: repo:^github\.com/sourcegraph/sourcegraph$ file:ghe-2.14.11/pull-request-discussion/vanilla/code-view.html emoji
and verifying that all occurrences of emoji
are correctly highlighted.
App preview:
Check out the client app preview documentation to learn more.