large migrations fail due to liveness checks
Created by: davejrt
Currently the liveness probe for sourcegraph-frontend is set to 300s and checks a http endpoint.
Any migration that exceeds that timeout forces a frontend pod to be restarted, leaving the database marked as dirty and blocking the frontend from starting again.
The current livenees and readiness probes use the same /healthz endpoint, which is not available until the migrations have completed and the frontend starts listening.
We should create a separate probe for liveness, that listens whilst migrations are taking place so pods are not restarted during migrations.
-
Create new endpoint for liveness checks during migrations -
update deploy-sourcegraph with the new liveness check