Skip to content

incremental-indexing: fetch git diff's directly, instead of fetching all of the commit objects

Created by: ggilmore

Zoekt's incremental-indexing implementation works by only indexing the files that have changed since the most recently indexed commit.

It currently does this by (pseudocode):

# copy all of the commit objects from gitserver for both commits , and store them on the local zoekt instance
git fetch $OLD_COMMIT $NEW_COMMIT

# analyze the diff output locally to determine what files have changed 
git diff $OLD_COMMIT $NEW_COMMIT

There is a big opportunity to save time in this process by eliminating the git fetch step. Since gitserver already stores all of the necessary commit information, it seems duplicative to have to copy all of the commits over the network in order to perform a local analysis on the Zoekt instance.

If gitserver was capable of directly providing git diff output via an API call, Zoekt could use that directly to reconstruct the changed files. Since the git diff output is a (much smaller) subset of all the commit information necessary to construct it, transmitting that directly can lead to huge time savings.