codeintel: Optimize out-of-band migration queries
Created by: efritz
This replaces the old selection query with one that runs efficiently with additional indexes. This takes the time to select a batch of definition records to migrate from 15 seconds to a few millisecond (on a clone of the production code intelligence database).
The indexes take about 45 minutes to create concurrently. I've added an alert in https://github.com/sourcegraph/sourcegraph/pull/19694 to invalid indexes in our database layer to detect when they fail to create asynchronously.
New selection query:
On a clone of the production cloudintel-db (select dump_id, scheme, identifier, length(data) from lsif_data_references t1 where dump_id = (select sv.dump_Id from lsif_data_references_schema_versions sv where sv.min_schema_version<=1 and sv.max_schema_version>=1 and exists (select 1 from lsif_data_references t2 where t2.dump_id=sv.dump_id and t2.schema_version=1) order by dump_id limit 1) and schema_version=1 limit 1000 for update skip locked
):
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=8.69..1105.94 rows=1000 width=59) (actual time=12.460..14.949 rows=1000 loops=1)
InitPlan 1 (returns $1)
-> Limit (cost=0.86..8.12 rows=1 width=4) (actual time=9.884..9.887 rows=1 loops=1)
-> Nested Loop Semi Join (cost=0.86..30395.75 rows=4187 width=4) (actual time=9.883..9.885 rows=1 loops=1)
-> Index Scan using lsif_data_references_schema_versions_pkey on lsif_data_references_schema_versions sv (cost=0.29..22678.00 rows=12512 width=4) (actual time=0.126..3.328 rows=741 loops=1)
Filter: ((min_schema_version <= 1) AND (max_schema_version >= 1))
Rows Removed by Filter: 11476
-> Index Only Scan using lsif_data_references_dump_id_schema_version on lsif_data_references t2 (cost=0.57..818.20 rows=28995 width=4) (actual time=0.009..0.009 rows=0 loops=741)
Index Cond: ((dump_id = sv.dump_id) AND (schema_version = 1))
Heap Fetches: 4001
-> LockRows (cost=0.57..31815.38 rows=28995 width=59) (actual time=12.459..14.818 rows=1000 loops=1)
-> Index Scan using lsif_data_references_dump_id_schema_version on lsif_data_references t1 (cost=0.57..31525.43 rows=28995 width=59) (actual time=10.741..12.711 rows=2000 loops=1)
Index Cond: ((dump_id = $1) AND (schema_version = 1))
Planning Time: 0.304 ms
Execution Time: 15.128 ms
(15 rows)