Skip to content

large migrations fail due to liveness checks

Created by: davejrt

Currently the liveness probe for sourcegraph-frontend is set to 300s and checks a http endpoint.

Any migration that exceeds that timeout forces a frontend pod to be restarted, leaving the database marked as dirty and blocking the frontend from starting again.

The current livenees and readiness probes use the same /healthz endpoint, which is not available until the migrations have completed and the frontend starts listening.

We should create a separate probe for liveness, that listens whilst migrations are taking place so pods are not restarted during migrations.

  • Create new endpoint for liveness checks during migrations
  • update deploy-sourcegraph with the new liveness check