postgres-alpine: reindex all databases on startup
Created by: ggilmore
Fixes https://github.com/sourcegraph/sourcegraph/issues/23310
(albeit not ideally - see caveats)
script outline
- check for a marker file in
PGDATADIR
(3.31-reindexing.completed
) - its presence indicates that re-indexing process has already been performed. (the following steps assume that the marker file is missing) - prepare the postgres environment variables (in the same way as
initdb.sh
does) - start up the postgres server with the following options
- the
listen_address
is set to empty ('') - this prevents all external TCP connections to the DB, and allows only local connections via a unix socket -
-P
prevents postgres from using system catalog indexes for lookups (since these will need to be re-created for the same reasons that we are re-indexing the other databases). This follows the advice provided by https://www.postgresql.org/docs/12/sql-reindex.html
- the
- re-index the system catalogs (
--system
), then the rest of the databases (--all
) - write the marker file to signal that the process has completed
- shut down postgres
The liveness script has been altered to check for the presence of the marker file. If it's absent, that means that the re-indexing process is still underway - so it'll return a 0
exit code so that k8s doesn't kill the pod.
(The readiness script doesn't need to be altered, it still works in the same way that it always did since postgres doesn't listen to external TCP connections during the re-indexing process)
still to be done:
- need to verify that all permutations of specifying postgres credentials work with this script (
PGPASSWORD
, etc.). I suspect this will work since it uses the same machinery thatinitdb.sh
uses
caveats
Reindexing the entire database can take a long time if you have a lot of data. k8s.sdev.org's codeintel-db instance takes 1-2 hours to re-index. I've been experimenting with using the CONCURRENTLY
option that's described on https://www.postgresql.org/docs/12/app-reindexdb.html. If we can pull it off, it's possible that we can do this without incurring downtime. I haven't been able to figure out a clean way around the caveats described in the above document for that option though (a uniqueness violation means that the process will exit early, and we'll have to manually clean that up). I put this PR up in the meantime for discussion, even though I know it's not ideal.