incremental-indexing: fetch git diff's directly, instead of fetching all of the commit objects
Created by: ggilmore
Zoekt's incremental-indexing implementation works by only indexing the files that have changed since the most recently indexed commit.
It currently does this by (pseudocode):
# copy all of the commit objects from gitserver for both commits , and store them on the local zoekt instance
git fetch $OLD_COMMIT $NEW_COMMIT
# analyze the diff output locally to determine what files have changed
git diff $OLD_COMMIT $NEW_COMMIT
There is a big opportunity to save time in this process by eliminating the git fetch
step. Since gitserver
already stores all of the necessary commit information, it seems duplicative to have to copy all of the commits over the network in order to perform a local analysis on the Zoekt instance.
If gitserver was capable of directly providing git diff
output via an API call, Zoekt could use that directly to reconstruct the changed files. Since the git diff
output is a (much smaller) subset of all the commit information necessary to construct it, transmitting that directly can lead to huge time savings.