searcher: combine line matches for the same line (!23918) · Merge requests · Warren Gifford / sourcegraph

Warren Gifford requested to merge backend-dry-run/cc/remove-duplicate-previews into main Aug 13, 2021

Created by: camdencheek

Previously, searcher was creating a new LineMatch for every fragment matched. For long lines with many matched fragments, this meant duplicating the content of that line many times in both allocations and when serializing. This was causing significant memory issues in searcher now that we don't have strict limits on the number of matches per file (when streaming is enabled).

At its most regressive, a regex query for . will return one fragment for every character in every file. Previously, for every fragment, we would allocate a string with the line contents and serialize that line content. This means, for the query . and line length n, we were copying and serializing n^2 bytes, vs n bytes after this change.

I have a feeling this will magically fix many of the OOMs I've been debugging, since the query I was using to reproduce the OOM was just the string literal to.

searcher: combine line matches for the same line

Merge request reports