LSIF: Cross-repository database to target Postgres

Created by: tsenart

I won't be able to review this PR this week. @keegancsmith or @mrnugget: Would you have the chance?

Created by: felixfbecker

@efritz let me know when you want me to do another review once the architecture/functionality is approved

Created by: efritz

Regarding Postgres: the idea is to use the same Postgres instance, but a different database, correct? If so, do we have enough documentation in place so that users know to configure two different PGDATASOURCE strings?

Well, that's what I'd like for @sourcegraph/distribution to weigh in on. Conversation in RFCs made me believe that it would be better/safer for a second instance of Postgres, but if another database within the same container works, then that's possible as well.

I'm also looking for advice on all the places where PG_ variables are set in different contexts so that I can take care of everything in one effort.

Created by: slimsag

Architecture-wise:

A separate DB in the same Postgres instance obviously sounds most ideal to me because that means admins don't need to e.g. provision another RDS database. But, it's not clear to me how much load this will place on Postgres so it is hard to know if a separate instance is actually warranted or not. I would suggest rolling this out on a large instance (k8s.sgdev.org) with the separate approach and measuring the change. We could for example take a day to do this and measure how much the Postgres disk grows in size and maybe extract some generic info on avg query time -- which I imagine would probably be enough to gain some confidence (looks like this could maybe be used: https://www.citusdata.com/blog/2019/02/08/the-most-useful-postgres-extension-pg-stat-statements/ )
If you do decide going with a separate postgres instance entirely is needed, we would still use just one in the server case I think (no point in running two in the same container).
You can just specify the PG_* vars on your lsif-server container. This is:

https://github.com/sourcegraph/deploy-sourcegraph/blob/master/base/lsif-server/lsif-server.Deployment.yaml
https://github.com/sourcegraph/deploy-sourcegraph-docker/blob/master/deploy-lsif-server.sh
https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/server/shared/shared.go#L149-150 (here you just need to override PGDATABASE using your own new env var named e.g. PGDATABASE_LSIF which users can configure - the others have preconfigured values)

In any case, you should make use of the same tool we use for migrations and create a similar strategy for applying migrations to this DB: see https://github.com/golang-migrate/migrate and https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/migrations/README.md
Please please please just make sure the above is documented well :) Every hour you spend making the above seamless, documenting expected increase in resource usage on postgres, etc. saves our admin users several hours -- it's time well spent!

Created by: tsenart

@slimsag's advice is sound architecturally. At the code level, I'm not the most qualified to review TypeScript, so I'll defer.

Created by: efritz

Updated with migrations and docs. I'm looking to see how I can auto-create the lsif database in the base postgres image.

https://github.com/sourcegraph/deploy-sourcegraph/pull/386 https://github.com/sourcegraph/deploy-sourcegraph-docker/pull/44

Created by: efritz

When is concurrent write access needed in both the server and worker, and why don't journaling modes and exclusive locks solve this problem?

Because we need to update commit data on demand, the API needs also write access to the xrepo db (alternatively, we could drop that request and queue the update, which is bad UX, or we could queue a job and block until it's complete, which would cause long delays if there are any LSIF uploads to process).

SQLite requires exclusive write access (an exclusive write file lock) on the database for any write. For one, this could be slow with multiple API and/or workers trying to update things. But more importantly, SQLite is basically a wrapper around filesystem calls. Locks in SQLite are implemented in terms of POSIX file locking. This seems very brittle when the drive is networked and/or shared. I've seen some cautionary tales about data loss when SQLite was written to concurrently via the network, and loosing this database will nerf all of the LSIF data that had been already uploaded/processed.

That's not a failure mode I want to bring to ops, and that's not a customer issue I'd want to bring to any of us (sorry all your code intel is gone, rerun things in CI).

Created by: slimsag

Yeah, as a frontend-based migration that lsif-server waits on we handle all of the following cases nicely -- they are all automatic and handled just as any other DB migration we've had in the past effectively:

A new cluster deployment running sourcegraph/postgres-11.1
An existing cluster deployment running sourcegraph/postgres-11.1 and upgrading to this new Sourcegraph version
A new cluster deployment with external postgres DB v9.6+
An existing cluster deployment using an external Postgres DB v9.6+ upgrading.
A new sourcegraph/server deployment with the built-in postgres
An existing sourcegraph/server deployment with the built-in postgres upgrading to this new Sourcegraph version
A new sourcegraph/server deployment with external postgres v9.6+
An existing sourcegraph/server deployment with external postgres v9.6+ upgrading
Dev environments

Created by: slimsag

Let me know when this is ready for re-review.

LSIF: Cross-repository database to target Postgres

Merged by (Jul 25, 2025 5:26am UTC)

Activity

LSIF: Cross-repository database to target Postgres

Merge request reports

Merged by (Jul 25, 2025 5:26am UTC)

Activity