Unify git archive (pt 2) (!32101) · Merge requests · Administrator / sourcegraph

Administrator requested to merge mlw-sub-repo-filtering-git-archive into main Mar 03, 2022

Created by: mollylogue

EDIT: This PR has been re-worked to just unify all git archive calls through the same function, git.ArchiveReader. If a sub-repo permissions checker is passed in and the repo has sub-repo perms enabled, an error will be returned. A follow-up PR will implement sub-repo filtering for this endpoint rather than returning an error

Previous behavior: return an error if git.ArchiveReader is called for a repo with sub-repo permissions
New behavior: if sub-repo permissions are enabled and git.ArchiveReader is called, only include the files in the archive that the user has access to.
Note: the places where git.ArchiveReader is called with a nil sub-repo permissions checker is just preserving the legacy behavior (these places didn't have sub-repo perms checking). If we want to add the checker in any of those locations I suggest we do it in a follow-up to this.
Questions: Edit: ✅ means question was answered -- Is it useful to still check if the repo has sub-repo permissions enabled before doing the git ls-files to avoid that extra check? We aren't worrying about that elsewhere (i.e. we only check if the sub repo permissions are enabled globally and not for the specific repo before trying to filter commits for a repo). ✅ -- Do we already have tooling around our zip archives for testing, etc? I just didn't want to write a bunch of new code to check the contents of the zip files in the unit tests if this already exists somewhere in the codebase. ✅ -- It was mentioned in the issue that the git ls-files command could happen on the gitserver side to limit the number of calls to gitserver. My concern with doing it this way is that we currently don't do any sub-repo permissions filtering in gitserver itself, leaving that to the git package which wraps the calls to gitserver. I don't know that we should diverge from that pattern just for this. ✅ -- What is the best way to test this at scale? Do we have an example repo with enough files that we could try this filtering on to see what performance is like? Could we do that in a production-like environment like dogfood? ✅

Test plan

Robust unit tests around different scenarios
Manually set up sub-repo permissions for a test repo and request zip archive, then validate the archive only contains files the user has access to.
Potentially writing an integration test for the above scenario?

Unify git archive (pt 2)

Test plan

Merge request reports