Skip to content

Redis AOF database corruption

Created by: sqs

We received reports that the single-node Docker image's (version 2.12.3) Redis server sometimes needs manual recovery after a crash or power cycle. This problem manifests when Sourcegraph does not come back online after the crash. The admin SSH'd in and saw that the sourcegraph/server container's Redis server was reporting errors and required manual recovery steps. The manual recovery steps worked, but:

  • Why were they necessary? Is it possible to run Redis with stricter journaling or fsyncing so that it can't get into this state?
  • Is it possible to run the recovery steps automatically?