Redis AOF database corruption
Created by: sqs
We received reports that the single-node Docker image's (version 2.12.3) Redis server sometimes needs manual recovery after a crash or power cycle. This problem manifests when Sourcegraph does not come back online after the crash. The admin SSH'd in and saw that the sourcegraph/server container's Redis server was reporting errors and required manual recovery steps. The manual recovery steps worked, but:
- Why were they necessary? Is it possible to run Redis with stricter journaling or fsyncing so that it can't get into this state?
- Is it possible to run the recovery steps automatically?