repoupdater/github: Fetch GitHub repositories in batches
Created by: mrnugget
This PR adds a new method to the GitHub client, GetRepositoriesByNameWithOwnerFromAPI
, which allows the caller to fetch repositories in batches, using the GraphQL API. It should fix https://github.com/sourcegraph/sourcegraph/issues/3907 but...
IMPORTANT: GitHub's GraphQL API only offers the fetching of repositories in batches by using GraphQL node IDs. But we don't have the node IDs at hand when doing our first sync, we only have a list of "owner/repository-name"
s. That means we can't use the existing getRepositoryByNodeIDFromAPI
Instead I decided to build GraphQL queries on the fly, using aliases to fetch multiple repositories in the same query. Here's an example query:
fragment RepositoryFields on Repository {
id
nameWithOwner
description
url
isPrivate
isFork
isArchived
}
{
repo_sourcegraph_repository_1: repository(owner: "sourcegraph", name: "repository-1") {
... on Repository {
...RepositoryFields
}
}
repo_sourcegraph_repository_2: repository(owner: "sourcegraph", name: "repository-2") {
... on Repository {
...RepositoryFields
}
}
repo_sourcegraph_repository_3: repository(owner: "sourcegraph", name: "repository-3") {
... on Repository {
...RepositoryFields
}
}
}
Of course I tried to find out how many repositories I could query in one request like this, but neither the GitHub API documentation nor the GraphQL spec were any help here (pointers appreciated!). So I did some experimentation and as it turns out, the API stops responding with results when specifying more than 37 repositories in one query.
Since 37 is a weird number I rounded it down to 30 and used that as the batch size.
But still: since I'm a GraphQL newbie I feel like I'm relying on undefined behavior and I'm not sure whether the GitHub API will keep it up.
Please see this PR as an idea/proposal and cause for discussion: I would love to know what you think about using the API like this and what I'm possibly missing.
Test plan: go test
and manual testing with an external service configuration that has 100 repos in