Skip to content

migration: Improve runner and store

Warren Gifford requested to merge ef/improve-runner into main

Created by: efritz

Improvements to migration runner and store. This includes much better error messages in validation and migration run attempts, as well as ignoring some common failure scenarios that shouldn't affect a running instance (for one, future migrations failing should not crash the actively deployed instance).

Reviewers: Please read the new migration runner's Run and Validate methods in their entirety.

The core of Run's behavior is to:

  1. Acquire an exclusive advisory lock for the schema
  2. Set the dirty flag to true and bump the version number
  3. Run the migration
  4. If successful, set the dirty flag to false
  5. Release the advisory lock

We also have the following behaviors, which make the migration experience nicer:

  • Run does not immediately throw an error if the database is dirty. Based on the flow above, if the database is dirty but the lock is held, there's a migrator currently running. We shouldn't immediately throw an error telling users to contact support in this case. In this PR, we will attempt to wait for the currently executing migration to finish executing first.
  • Similarly, Validate no longer complains when the database is dirty, but the version is ahead of where we want. This is fine, as it would indicate that a migration to the next version was not fully successful, but should not cause the running instance to go down as well (what a bad upgrade experience that would be).
  • Run is now protective in the case where we're not simply migrating up all the way. This can happen if a site admin needs to upgrade a specific number of migrations or might need to downgrade to a previous version. We don't want to simply wait our turn in this situation, and fail-fast with a message about concurrent migrations when we detect this case.
  • Validate will now basically poll the database version while it thinks a migration is running. This will also prevent spurious out of date and/or dirty database errors from new containers trying to start while the migrator is still active.

Merge request reports

Loading