Store Git commits in Postgres
Created by: tsenart
Context
Many features implemented across different infrastructure components at Sourcegraph currently require RPC calls to gitserver to get needed git related information (i.e. commits, etc)
This information being locked behind an RPC layer that runs git commands on the fly is elegant but has a few problems:
- Lots and lots of N+1 queries.
- Slower than having it indexed and queried in Postgres.
- Harder to work with for engineers.
To solve this, we want to experiment with storing commit information of all gitserver repos in Postgres with an appropriate schema. This information would be directly readable, but not writable, by all other services at Sourcegraph, and, later on, potentially by customers directly.
Open questions
- What's the right initial schema? (see discussion below) Do we need table partitioning? (I think so)
- What are the resource requirements for this in Sourcegraph Cloud? Memory and disk.
- If we move forward with this, how do we smoothly roll out this change to customers?
- What is the exact list of use cases that should be refactored to use this capability?
- How do we quickly validate the scalability of Postgres for this use case?