Make `dbtest.NewDB()` fast for real
Created by: camdencheek
Alright. I've made a couple of attempts in the past to improve the performance and semantics of our dbtest
package.
Those PRs improved the speed of our database tests by migrating the database only once per package, doing incremental migrations, and running tests in parallel.
However, there are still some gotchas with the current package:
- The first call to
dbtest.NewDB()
in each package is slow because we run migrations once per package - Test databases aren't tracked anywhere, so they don't get cleaned up
- The migrated template database isn't reused between test packages
- We don't have any good way to save a set of clean databases for use with transactions that leave the database clean
- Editing migrations doesn't cause test databases to be re-migrated, which is annoying when iterating while writing a migration
I think this PR is nearing endgame for the dbtest
package. Here's an outline of what this PR does:
- It creates a new database
dbtest_pool
with three tables:template_dbs
,migrated_dbs
, andschema_version
-
schema_version
only contains a version of the schema for thedbtest_pool
database. If that version is out of date, we blow away the database and recreate it with the new schema. This way we don't have to worry about migrations for this small utility database. -
template_dbs
contains a list of databases that have had migrations run on them and can be used as a template for the test databases. Tests should never run directly against the template dbs. The template dbs are identified by a hash of the migrations used to create them. If a migration is added or changed, a new template db will be created. -
migrated_dbs
contains a list of test databases that have been created from one of the template dbs. When a database is used by a test, it is marked asclaimed
. If, when the test is done, the database has been modified, it will be deleted. - This creates the function
NewFastDB()
, which is equivalent toNewDB()
except that it can more optimally reuse the migrated template DB between test processes. -
Probably the most exciting part: A new function
NewFastTx()
is created which returns a transaction to a freshly migrated database. On test completion, the transaction is rolled back, leaving a still-fresh database that can be re-used. A returned transaction is always the only transaction pointing to a database, so there is never any risk of a deadlock between test transactions. - Once per startup, we clean up any databases for migrations that haven't been used in more than a day. This is somewhat aggressive since the only cost is re-migrating the database once.
This is only hooked up when the USE_FAST_DBTEST
environment variable is set. Before cutting this over, I'd like to test this across a couple of migrations locally. In a few days, I'll rename dbtest.NewFastDB()
to dbtest.NewDB()
since it should be a drop-in replacement. Additionally, many uses of dbtest.NewDB()
can be replaced with the much faster dbtest.NewFastTx()
, which will be some followup work.
Okay, so the numbers. On my machine:
-
dbtest.NewDB()
currently takes ~1s to to run the first time in each package, and ~120ms to run after that. The ~120ms is pretty unchanged, but the that extra second on the front mostly goes away withdbtest.NewFastDB()
since we can reuse the template db between packages. -
dbtest.NewFastTx()
effectively takes 20ms, which is a big improvement, especially since we don't have to pay as much IO cost in recreating the database for every call.
I've segregated this into separate files so it's really easy to delete if we decide this approach won't work for us. It's certainly possible we decide the performance gains aren't worth the added complexity. However, I think it's worth integrating it a bit further and seeing what the performance improvements actually are before deciding the complexity is too much.
There is a bit of duplicated code right now, but I was struggling to deduplicate and maintain clarity, so I didn't for now.