frontend: directly stream archives from gitserver for raw endpoint
Created by: keegancsmith
I added some instrumentation to the raw endpoint on Sourcegraph.com to see how it is used. Over the last few days effectively only the archive endpoint is used. Additionally only requests for the root path seem to be used.
I then mined some recent logs to see how often the same repository is fetched. It turns out the long tail of repositories are only fetched once, and some repos were fetched upto 4 times.
All this evidence together tells me storing the archive on disk in the frontend is not worth the cost. We can just directly stream the archive from gitserver.
Notes:
- Removes the main use of the frontend volume. (need to confirm if its the only user). If we can remove the volume, then it simplifies deployment. (The volume has been a source of issues at multiple customers).
- Faster time to first byte since no intermediate cache.
- Faster zip transfer since we no compress them.
- We no longer set "Content-Length" response header. Confirmed this is fine.
- Rest of raw endpoint still caches archives. But these code paths are rarely used, so will update at a later stage.
Fixes https://github.com/sourcegraph/sourcegraph/issues/9372
Merge request reports
Activity
Created by: codecov[bot]
Codecov Report
Merging #9410 into master will decrease coverage by
<.01%
. The diff coverage is0%
.@@ Coverage Diff @@ ## master #9410 +/- ## ========================================== - Coverage 41.49% 41.49% -0.01% ========================================== Files 1330 1330 Lines 72508 72510 +2 Branches 6582 6582 ========================================== Hits 30085 30085 - Misses 39628 39630 +2 Partials 2795 2795
Impacted Files Coverage Δ cmd/frontend/internal/app/ui/raw.go 0% <0%> (ø)