Skip to content

frontend: directly stream archives from gitserver for raw endpoint

Administrator requested to merge core/stream-archives into master

Created by: keegancsmith

I added some instrumentation to the raw endpoint on Sourcegraph.com to see how it is used. Over the last few days effectively only the archive endpoint is used. Additionally only requests for the root path seem to be used.

I then mined some recent logs to see how often the same repository is fetched. It turns out the long tail of repositories are only fetched once, and some repos were fetched upto 4 times.

All this evidence together tells me storing the archive on disk in the frontend is not worth the cost. We can just directly stream the archive from gitserver.

Notes:

  • Removes the main use of the frontend volume. (need to confirm if its the only user). If we can remove the volume, then it simplifies deployment. (The volume has been a source of issues at multiple customers).
  • Faster time to first byte since no intermediate cache.
  • Faster zip transfer since we no compress them.
  • We no longer set "Content-Length" response header. Confirmed this is fine.
  • Rest of raw endpoint still caches archives. But these code paths are rarely used, so will update at a later stage.

Fixes https://github.com/sourcegraph/sourcegraph/issues/9372

Merge request reports

Loading