Skip to content

codeintel: Optimize out-of-band migration queries

Administrator requested to merge ef/codeintel-oob-migrations-speedup into main

Created by: efritz

This replaces the old selection query with one that runs efficiently with additional indexes. This takes the time to select a batch of definition records to migrate from 15 seconds to a few millisecond (on a clone of the production code intelligence database).

The indexes take about 45 minutes to create concurrently. I've added an alert in https://github.com/sourcegraph/sourcegraph/pull/19694 to invalid indexes in our database layer to detect when they fail to create asynchronously.

New selection query:

On a clone of the production cloudintel-db (select dump_id, scheme, identifier, length(data) from lsif_data_references t1 where dump_id = (select sv.dump_Id from lsif_data_references_schema_versions sv where sv.min_schema_version<=1 and sv.max_schema_version>=1 and exists (select 1 from lsif_data_references t2 where t2.dump_id=sv.dump_id and t2.schema_version=1) order by dump_id limit 1) and schema_version=1 limit 1000 for update skip locked):

                                                                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.69..1105.94 rows=1000 width=59) (actual time=12.460..14.949 rows=1000 loops=1)
   InitPlan 1 (returns $1)
     ->  Limit  (cost=0.86..8.12 rows=1 width=4) (actual time=9.884..9.887 rows=1 loops=1)
           ->  Nested Loop Semi Join  (cost=0.86..30395.75 rows=4187 width=4) (actual time=9.883..9.885 rows=1 loops=1)
                 ->  Index Scan using lsif_data_references_schema_versions_pkey on lsif_data_references_schema_versions sv  (cost=0.29..22678.00 rows=12512 width=4) (actual time=0.126..3.328 rows=741 loops=1)
                       Filter: ((min_schema_version <= 1) AND (max_schema_version >= 1))
                       Rows Removed by Filter: 11476
                 ->  Index Only Scan using lsif_data_references_dump_id_schema_version on lsif_data_references t2  (cost=0.57..818.20 rows=28995 width=4) (actual time=0.009..0.009 rows=0 loops=741)
                       Index Cond: ((dump_id = sv.dump_id) AND (schema_version = 1))
                       Heap Fetches: 4001
   ->  LockRows  (cost=0.57..31815.38 rows=28995 width=59) (actual time=12.459..14.818 rows=1000 loops=1)
         ->  Index Scan using lsif_data_references_dump_id_schema_version on lsif_data_references t1  (cost=0.57..31525.43 rows=28995 width=59) (actual time=10.741..12.711 rows=2000 loops=1)
               Index Cond: ((dump_id = $1) AND (schema_version = 1))
 Planning Time: 0.304 ms
 Execution Time: 15.128 ms
(15 rows)

Merge request reports

Loading