Skip to content

Search backend: increase repo page size from 500 to 4096

Warren Gifford requested to merge cc/larger-repo-page-size into main

Created by: camdencheek

When doing a search that has any repo: filters, we resolve repos a page at a time before searching that page. We do this to 1) limit the up-front cost of resolving all repos before starting the search, and 2) limit the memory overhead of holding all searchable repos in memory.

However, at the scale of something like sourcegraph.com, paging through repos 500 at a time is very slow, even for searches that execute fast (indexed search) because we pay the overhead of resolving a page of repos 5000000 / 500 = 10000 times, and this overhead happens serially. For example, even a simple search like type:repo count:all takes a very long time on sourcegraph.com.

This bumps the page size up by 8x, which I expect to increase the speed of non-global repo paging by approximately 8x.

This will hopefully fix https://github.com/sourcegraph/sourcegraph/issues/39392

Risks:

  • Larger page size increases latency for expensive repo resolve steps like repo:contains.commit.after()
  • Larger page sizes increase memory pressure on the frontend instance. In practice, I expect this to increase the memory cost of a page from ~31KB to ~256KB. I think that will be okay.

If this doesn't help, we can try increasing it further, but I'm hesitant to go too high given our regular memory woes with frontend. Another thing we could try would be to resolve the next page concurrent to executing the search. This should be a fairly easy change and it would allow us to partially deserialize these steps.

Test plan

I can't test this well locally because I can't realistically clone millions of repos. However, I think this is a low-risk change

Merge request reports

Loading