perf: Reduce DB pressure of status messages queries
Created by: eseliger
Big thanks to @efritz for pairing on yet another query. This commit changes the repo list query to use exists queries to prevent costly joins.
Two queries were improved by this:
See if there is at least one external service accessible by the user that connects to a repo that hasn't been cloned yet
Before
User scope
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.29..722.66 rows=1 width=52) (actual time=1459.928..1459.931 rows=0 loops=1)
-> Nested Loop Left Join (cost=1.29..181785.15 rows=252 width=52) (actual time=1459.926..1459.929 rows=0 loops=1)
Filter: ((gr.clone_status = 'not_cloned'::text) OR (gr.clone_status IS NULL))
Rows Removed by Filter: 12
-> Nested Loop (cost=0.86..181469.89 rows=622 width=52) (actual time=93.074..1459.848 rows=12 loops=1)
-> Index Only Scan using external_service_repos_repo_id_external_service_id_unique on external_service_repos esr (cost=0.43..178529.34 rows=1119 width=4) (actual time=93.050..1459.687 rows=12 loops=1)
Filter: (external_service_id = ANY ('{102,320,3110}'::integer[]))
Rows Removed by Filter: 4016717
Heap Fetches: 38871
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..2.63 rows=1 width=52) (actual time=0.010..0.010 rows=1 loops=12)
Index Cond: (id = esr.repo_id)
Filter: (blocked IS NULL)
-> Index Scan using gitserver_repos_pkey on gitserver_repos gr (cost=0.43..0.49 rows=1 width=12) (actual time=0.005..0.005 rows=1 loops=12)
Index Cond: (repo_id = repo.id)
Planning Time: 1.872 ms
Execution Time: 1459.986 ms
(16 rows)
Site admin scope
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.29..3.49 rows=1 width=52) (actual time=364.026..364.027 rows=1 loops=1)
-> Nested Loop (cost=1.29..1958355.78 rows=893055 width=52) (actual time=364.025..364.026 rows=1 loops=1)
-> Merge Left Join (cost=0.86..1209485.85 rows=1552759 width=52) (actual time=0.090..335.688 rows=12231 loops=1)
Merge Cond: (repo.id = gr.repo_id)
Filter: ((gr.clone_status = 'not_cloned'::text) OR (gr.clone_status IS NULL))
Rows Removed by Filter: 113203
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..865011.43 rows=3830506 width=52) (actual time=0.012..158.213 rows=125434 loops=1)
Filter: (blocked IS NULL)
Rows Removed by Filter: 6
-> Index Scan using gitserver_repos_pkey on gitserver_repos gr (cost=0.43..270607.53 rows=6872888 width=12) (actual time=0.003..120.511 rows=126140 loops=1)
-> Index Only Scan using external_service_repos_repo_id_external_service_id_unique on external_service_repos esr (cost=0.43..0.47 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=12231)
Index Cond: (repo_id = repo.id)
Filter: (external_service_id = ANY ('{...}'::integer[]))
Rows Removed by Filter: 1
Heap Fetches: 1292
Planning Time: 0.575 ms
Execution Time: 364.068 ms
(17 rows)
After
User scope
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3306.24..3306.24 rows=1 width=52) (actual time=0.184..0.185 rows=0 loops=1)
-> Sort (cost=3306.24..3306.87 rows=252 width=52) (actual time=0.183..0.184 rows=0 loops=1)
Sort Key: repo.id
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=38.84..3304.98 rows=252 width=52) (actual time=0.178..0.179 rows=0 loops=1)
Filter: ((gr.clone_status = 'not_cloned'::text) OR (gr.clone_status IS NULL))
Rows Removed by Filter: 12
-> Nested Loop (cost=38.41..2989.72 rows=622 width=52) (actual time=0.053..0.117 rows=12 loops=1)
-> HashAggregate (cost=37.98..49.17 rows=1119 width=4) (actual time=0.038..0.046 rows=12 loops=1)
Group Key: esr.repo_id
-> Index Only Scan using external_service_repos_idx on external_service_repos esr (cost=0.43..35.18 rows=1119 width=4) (actual time=0.014..0.032 rows=12 loops=1)
Index Cond: (external_service_id = ANY ('{102,320,3110}'::integer[]))
Heap Fetches: 3
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..2.63 rows=1 width=52) (actual time=0.005..0.005 rows=1 loops=12)
Index Cond: (id = esr.repo_id)
Filter: (blocked IS NULL)
-> Index Scan using gitserver_repos_pkey on gitserver_repos gr (cost=0.43..0.49 rows=1 width=12) (actual time=0.005..0.005 rows=1 loops=12)
Index Cond: (repo_id = repo.id)
Planning Time: 0.637 ms
Execution Time: 0.286 ms
(20 rows)
Site admin scope
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.29..3.48 rows=1 width=52) (actual time=438.690..438.693 rows=1 loops=1)
-> Nested Loop Semi Join (cost=1.29..1951759.04 rows=893055 width=52) (actual time=438.689..438.691 rows=1 loops=1)
-> Merge Left Join (cost=0.86..1209486.15 rows=1552759 width=52) (actual time=0.095..409.349 rows=12231 loops=1)
Merge Cond: (repo.id = gr.repo_id)
Filter: ((gr.clone_status = 'not_cloned'::text) OR (gr.clone_status IS NULL))
Rows Removed by Filter: 113203
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..865011.43 rows=3830506 width=52) (actual time=0.014..204.816 rows=125434 loops=1)
Filter: (blocked IS NULL)
Rows Removed by Filter: 6
-> Index Scan using gitserver_repos_pkey on gitserver_repos gr (cost=0.43..270607.53 rows=6872888 width=12) (actual time=0.005..148.048 rows=126140 loops=1)
-> Index Only Scan using external_service_repos_repo_id_external_service_id_unique on external_service_repos esr (cost=0.43..0.47 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=12231)
Index Cond: (repo_id = repo.id)
Filter: (external_service_id = ANY ('{...}'::integer[]))
Rows Removed by Filter: 1
Heap Fetches: 1292
Planning Time: 0.552 ms
Execution Time: 438.735 ms
(17 rows)
See if there is at least one external service accessible by the user that failed syncing
Before
User scope
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=2982.37..2982.38 rows=1 width=8) (actual time=0.113..0.114 rows=1 loops=1)
-> Nested Loop (cost=1.29..2982.37 rows=1 width=0) (actual time=0.111..0.112 rows=0 loops=1)
Join Filter: (esr.repo_id = repo.id)
-> Nested Loop (cost=0.86..2977.43 rows=2 width=8) (actual time=0.111..0.112 rows=0 loops=1)
-> Index Only Scan using external_service_repos_idx on external_service_repos esr (cost=0.43..35.18 rows=1119 width=4) (actual time=0.049..0.062 rows=12 loops=1)
Index Cond: (external_service_id = ANY ('{102,320,3110}'::integer[]))
Heap Fetches: 3
-> Index Scan using gitserver_repos_pkey on gitserver_repos gr (cost=0.43..2.63 rows=1 width=4) (actual time=0.004..0.004 rows=0 loops=12)
Index Cond: (repo_id = esr.repo_id)
Filter: (last_error IS NOT NULL)
Rows Removed by Filter: 1
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..2.46 rows=1 width=4) (never executed)
Index Cond: (id = gr.repo_id)
Filter: (blocked IS NULL)
Planning Time: 0.565 ms
Execution Time: 0.148 ms
(16 rows)
Site admin scope
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=156484.35..156484.36 rows=1 width=8) (actual time=868.930..869.117 rows=1 loops=1)
-> Gather (cost=156484.14..156484.35 rows=2 width=8) (actual time=868.923..869.111 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 0
-> Partial Aggregate (cost=155484.14..155484.15 rows=1 width=8) (actual time=868.374..868.376 rows=1 loops=1)
-> Nested Loop (cost=0.86..155480.25 rows=1556 width=0) (actual time=10.543..868.167 rows=696 loops=1)
-> Nested Loop (cost=0.43..153650.70 rows=2808 width=8) (actual time=10.528..864.220 rows=720 loops=1)
-> Parallel Seq Scan on gitserver_repos gr (cost=0.00..146565.03 rows=4868 width=4) (actual time=0.068..826.550 rows=11958 loops=1)
Filter: (last_error IS NOT NULL)
Rows Removed by Filter: 6861027
-> Index Only Scan using external_service_repos_repo_id_external_service_id_unique on external_service_repos esr (cost=0.43..1.45 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=11958)
Index Cond: (repo_id = gr.repo_id)
Filter: (external_service_id = ANY ('{...}'::integer[]))
Rows Removed by Filter: 0
Heap Fetches: 160
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..0.65 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=720)
Index Cond: (id = esr.repo_id)
Filter: (blocked IS NULL)
Rows Removed by Filter: 0
Planning Time: 0.668 ms
Execution Time: 869.254 ms
(21 rows)
After
User scope
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1066.99..1067.00 rows=1 width=8) (actual time=0.061..0.062 rows=1 loops=1)
-> Nested Loop (cost=38.69..1066.99 rows=1 width=0) (actual time=0.059..0.059 rows=0 loops=1)
-> Nested Loop (cost=38.26..1062.07 rows=2 width=8) (actual time=0.058..0.059 rows=0 loops=1)
-> HashAggregate (cost=37.98..49.17 rows=1119 width=4) (actual time=0.030..0.036 rows=12 loops=1)
Group Key: esr.repo_id
-> Index Only Scan using external_service_repos_idx on external_service_repos esr (cost=0.43..35.18 rows=1119 width=4) (actual time=0.010..0.024 rows=12 loops=1)
Index Cond: (external_service_id = ANY ('{102,320,3110}'::integer[]))
Heap Fetches: 3
-> Index Only Scan using erik_test_remove on gitserver_repos gr (cost=0.29..0.91 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=12)
Index Cond: (repo_id = esr.repo_id)
Heap Fetches: 0
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..2.46 rows=1 width=4) (never executed)
Index Cond: (id = gr.repo_id)
Filter: (blocked IS NULL)
Planning Time: 0.678 ms
Execution Time: 0.122 ms
(16 rows)
Site admin scope
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=26328.95..26328.96 rows=1 width=8) (actual time=33.906..35.591 rows=1 loops=1)
-> Gather (cost=26328.74..26328.95 rows=2 width=8) (actual time=33.697..35.583 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=25328.74..25328.75 rows=1 width=8) (actual time=29.385..29.387 rows=1 loops=3)
-> Nested Loop Semi Join (cost=100.99..25324.84 rows=1560 width=0) (actual time=1.924..29.346 rows=232 loops=3)
Join Filter: (gr.repo_id = esr.repo_id)
-> Nested Loop (cost=100.56..24013.01 rows=2706 width=8) (actual time=1.683..22.162 rows=2094 loops=3)
-> Parallel Bitmap Heap Scan on gitserver_repos gr (cost=100.13..12053.04 rows=4868 width=4) (actual time=1.635..7.841 rows=3986 loops=3)
Recheck Cond: (last_error IS NOT NULL)
Heap Blocks: exact=3137
-> Bitmap Index Scan on erik_test_remove (cost=0.00..97.20 rows=11684 width=0) (actual time=2.850..2.850 rows=11959 loops=1)
-> Index Scan using repo_non_deleted_id_name_idx on repo (cost=0.43..2.46 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=11959)
Index Cond: (id = gr.repo_id)
Filter: (blocked IS NULL)
Rows Removed by Filter: 0
-> Index Only Scan using external_service_repos_repo_id_external_service_id_unique on external_service_repos esr (cost=0.43..0.47 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=6281)
Index Cond: (repo_id = repo.id)
Filter: (external_service_id = ANY ('{...}'::integer[]))
Rows Removed by Filter: 1
Heap Fetches: 160
Planning Time: 0.675 ms
Execution Time: 35.649 ms
(23 rows)