Skip to content

workerutil: limit potential candidate records to prevent large queue tables scanning entire index

Administrator requested to merge workerutil/limit_candidate_selection into main

Created by: coury-clark

Closes https://github.com/sourcegraph/sourcegraph/issues/37351

The PR that introduced a window function to select candidate rows from the workerutil unintentionally introduced huge latency for large queue tables. This alleviates this problem by limited the potential candidate rows to 1 result, which will direct the query planner to use indices of finished_at and process_after.

Test plan

Query plan before:

Nested Loop  (cost=1499282.63..1499284.86 rows=1 width=237) (actual time=17871.921..17871.928 rows=1 loops=1)
  CTE candidate
    ->  Limit  (cost=1499279.23..1499279.24 rows=1 width=54) (actual time=17871.817..17871.821 rows=1 loops=1)
          ->  LockRows  (cost=1499279.23..1519930.13 rows=1652072 width=54) (actual time=17300.694..17300.697 rows=1 loops=1)
                ->  Sort  (cost=1499279.23..1503409.41 rows=1652072 width=54) (actual time=17299.347..17299.352 rows=2 loops=1)
"                      Sort Key: pc.""order"""
                      Sort Method: external merge  Disk: 188472kB
                      ->  Hash Join  (cost=669485.32..1491018.87 rows=1652072 width=54) (actual time=2908.184..16063.694 rows=2787834 loops=1)
                            Hash Cond: (pc.candidate_id = insights_query_runner_jobs_1.id)
                            ->  Subquery Scan on pc  (cost=0.43..752747.15 rows=2742028 width=48) (actual time=0.491..10488.722 rows=2787834 loops=1)
                                  ->  WindowAgg  (cost=0.43..725326.87 rows=2742028 width=16) (actual time=0.484..9886.540 rows=2787834 loops=1)
                                        ->  Index Scan using insights_query_runner_jobs_processable_priority_id on insights_query_runner_jobs insights_query_runner_jobs_2  (cost=0.43..677341.38 rows=2742028 width=8) (actual time=0.470..8732.401 rows=2787834 loops=1)
                                              Filter: ((state = 'queued'::text) AND ((process_after IS NULL) OR (process_after <= now())))
                            ->  Hash  (cost=621820.54..621820.54 rows=2742028 width=10) (actual time=2906.052..2906.052 rows=2787834 loops=1)
                                  Buckets: 262144  Batches: 16  Memory Usage: 9194kB
                                  ->  Seq Scan on insights_query_runner_jobs insights_query_runner_jobs_1  (cost=0.00..621820.54 rows=2742028 width=10) (actual time=154.419..2178.175 rows=2787834 loops=1)
"                                        Filter: (state = ANY ('{queued,errored}'::text[]))"
                                        Rows Removed by Filter: 1768049
  CTE updated_record
    ->  Update on insights_query_runner_jobs insights_query_runner_jobs_3  (cost=0.58..2.81 rows=1 width=451) (actual time=0.138..0.138 rows=0 loops=1)
          ->  Nested Loop  (cost=0.58..2.81 rows=1 width=451) (actual time=0.015..0.015 rows=1 loops=1)
                ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=1)
                      Group Key: candidate_1.id
                      ->  CTE Scan on candidate candidate_1  (cost=0.00..0.02 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
                ->  Index Scan using insights_query_runner_jobs_pkey on insights_query_runner_jobs insights_query_runner_jobs_3  (cost=0.56..2.78 rows=1 width=279) (actual time=0.004..0.004 rows=1 loops=1)
                      Index Cond: (id = candidate_1.id)
  ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=17871.868..17871.868 rows=1 loops=1)
        Group Key: candidate.id
        ->  CTE Scan on candidate  (cost=0.00..0.02 rows=1 width=4) (actual time=17871.861..17871.862 rows=1 loops=1)
  ->  Index Scan using insights_query_runner_jobs_pkey on insights_query_runner_jobs  (cost=0.56..2.78 rows=1 width=241) (actual time=0.033..0.033 rows=1 loops=1)
        Index Cond: (id = candidate.id)
Planning Time: 0.828 ms
JIT:
  Functions: 45
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 12.302 ms, Inlining 16.430 ms, Optimization 331.562 ms, Emission 222.727 ms, Total 583.020 ms"
Execution Time: 17920.371 ms

Query plan after:

Nested Loop  (cost=6.90..9.12 rows=1 width=237) (actual time=0.654..0.658 rows=1 loops=1)
  CTE candidate
    ->  Limit  (cost=3.49..3.51 rows=1 width=54) (actual time=0.641..0.643 rows=1 loops=1)
          ->  LockRows  (cost=3.49..3.51 rows=1 width=54) (actual time=0.640..0.642 rows=1 loops=1)
                ->  Sort  (cost=3.49..3.50 rows=1 width=54) (actual time=0.606..0.608 rows=1 loops=1)
"                      Sort Key: pc.""order"""
                      Sort Method: quicksort  Memory: 25kB
                      ->  Nested Loop  (cost=0.99..3.48 rows=1 width=54) (actual time=0.587..0.589 rows=1 loops=1)
                            ->  Subquery Scan on pc  (cost=0.43..0.70 rows=1 width=48) (actual time=0.559..0.560 rows=1 loops=1)
                                  ->  Limit  (cost=0.43..0.69 rows=1 width=16) (actual time=0.554..0.555 rows=1 loops=1)
                                        ->  WindowAgg  (cost=0.43..726137.05 rows=2765176 width=16) (actual time=0.553..0.554 rows=1 loops=1)
                                              ->  Index Scan using insights_query_runner_jobs_processable_priority_id on insights_query_runner_jobs insights_query_runner_jobs_2  (cost=0.43..677746.47 rows=2765176 width=8) (actual time=0.540..0.542 rows=2 loops=1)
                                                    Filter: ((state = 'queued'::text) AND ((process_after IS NULL) OR (process_after <= now())))
                                                    Rows Removed by Filter: 1
                            ->  Index Scan using insights_query_runner_jobs_pkey on insights_query_runner_jobs insights_query_runner_jobs_1  (cost=0.56..2.78 rows=1 width=10) (actual time=0.026..0.026 rows=1 loops=1)
                                  Index Cond: (id = pc.candidate_id)
"                                  Filter: (state = ANY ('{queued,errored}'::text[]))"
  CTE updated_record
    ->  Update on insights_query_runner_jobs insights_query_runner_jobs_3  (cost=0.58..2.81 rows=1 width=451) (actual time=0.242..0.242 rows=0 loops=1)
          ->  Nested Loop  (cost=0.58..2.81 rows=1 width=451) (actual time=0.012..0.013 rows=1 loops=1)
                ->  HashAggregate  (cost=0.02..0.03 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=1)
                      Group Key: candidate_1.id
                      ->  CTE Scan on candidate candidate_1  (cost=0.00..0.02 rows=1 width=32) (actual time=0.003..0.003 rows=1 loops=1)
                ->  Index Scan using insights_query_runner_jobs_pkey on insights_query_runner_jobs insights_query_runner_jobs_3  (cost=0.56..2.78 rows=1 width=279) (actual time=0.004..0.004 rows=1 loops=1)
                      Index Cond: (id = candidate_1.id)
  ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4) (actual time=0.647..0.647 rows=1 loops=1)
        Group Key: candidate.id
        ->  CTE Scan on candidate  (cost=0.00..0.02 rows=1 width=4) (actual time=0.642..0.643 rows=1 loops=1)
  ->  Index Scan using insights_query_runner_jobs_pkey on insights_query_runner_jobs  (cost=0.56..2.78 rows=1 width=241) (actual time=0.006..0.006 rows=1 loops=1)
        Index Cond: (id = candidate.id)
Planning Time: 0.665 ms
Execution Time: 1.201 ms

Merge request reports

Loading