batches: consider promoting the batch change ↔ changeset relationship into a join table
Created by: LawnGnome
Right now, to look up the changesets in a batch change, you have to run a query along these lines:
SELECT
*
FROM
changesets
WHERE
batch_change_ids ? '6'
This looks up the batch change ID in the jsonb
batch_change_ids
field, which is a JSON object with the IDs as keys.
We have a GIN index (from the people who brought you PIN numbers and ATM machines) on this field, but in practice, it doesn't seem to kick in a lot of the time. For example, here's an EXPLAIN ANALYZE
of the above query from my dev server:
+-------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|-------------------------------------------------------------------------------------------------------|
| Seq Scan on changesets (cost=0.00..9.45 rows=1 width=1206) (actual time=0.012..0.018 rows=1 loops=1) |
| Filter: (batch_change_ids ? '6'::text) |
| Rows Removed by Filter: 35 |
| Planning Time: 0.065 ms |
| Execution Time: 0.059 ms |
+-------------------------------------------------------------------------------------------------------+
It's possible (actually, probable) that the index would kick in if I had more changesets and the estimated cost of the sequential scan was higher, but I'd probably be happier if we turned this into a more traditional many-to-many join table which could then use the more traditional index scanning methods PostgreSQL has to offer. This would also mean that we wouldn't have to marshal the int64
IDs into and out of strings when interacting with the database.