repo-updater: Faster counting of user added repos
Created by: tsenart
This PR speeds up the query we use to count user added repos. I noticed this query while looking through Cloud SQL logs, as it was reported to be a slow query.
Plan before:
Finalize Aggregate (cost=443694.49..443694.50 rows=1 width=8) (actual time=2534.882..2534.882 rows=1 loops=1)
-> Gather (cost=443694.27..443694.48 rows=2 width=8) (actual time=2534.405..2872.419 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=442694.27..442694.28 rows=1 width=8) (actual time=2529.423..2529.423 rows=1 loops=3)
-> Parallel Hash Semi Join (cost=86672.06..442105.34 rows=235572 width=0) (actual time=2350.572..2529.378 rows=56 loops=3)
Hash Cond: (r.id = sr.repo_id)
-> Parallel Seq Scan on repo r (cost=0.00..335387.16 rows=1581100 width=4) (actual time=0.025..1597.235 rows=1265293 loops=3)
Filter: (deleted_at IS NULL)
Rows Removed by Filter: 303
-> Parallel Hash (cost=82805.82..82805.82 rows=235619 width=4) (actual time=354.386..354.386 rows=56 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 1120kB
-> Hash Join (cost=2.32..82805.82 rows=235619 width=4) (actual time=303.913..352.519 rows=56 loops=3)
Hash Cond: (sr.external_service_id = s.id)
-> Parallel Seq Scan on external_service_repos sr (cost=0.00..77973.30 rows=1590430 width=12) (actual time=0.020..197.140 rows=1279571 loops=3)
-> Hash (cost=2.27..2.27 rows=4 width=8) (actual time=0.057..0.057 rows=4 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on external_services s (cost=0.00..2.27 rows=4 width=8) (actual time=0.047..0.049 rows=4 loops=3)
Filter: ((namespace_user_id IS NOT NULL) AND (deleted_at IS NULL))
Rows Removed by Filter: 23
Planning Time: 0.516 ms
Execution Time: 2872.519 ms
Plan after:
Finalize Aggregate (cost=212334.94..212334.95 rows=1 width=8) (actual time=366.147..366.148 rows=1 loops=1)
-> Gather (cost=212334.72..212334.93 rows=2 width=8) (actual time=366.011..421.147 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=211334.72..211334.73 rows=1 width=8) (actual time=359.469..359.469 rows=1 loops=3)
-> Nested Loop (cost=2.75..210745.79 rows=235572 width=0) (actual time=322.372..359.453 rows=56 loops=3)
-> Hash Join (cost=2.32..82805.82 rows=235619 width=4) (actual time=322.301..358.692 rows=56 loops=3)
Hash Cond: (esr.external_service_id = es.id)
-> Parallel Seq Scan on external_service_repos esr (cost=0.00..77973.30 rows=1590430 width=12) (actual time=0.025..200.583 rows=1279573 loops=3)
-> Hash (cost=2.27..2.27 rows=4 width=8) (actual time=0.060..0.060 rows=4 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on external_services es (cost=0.00..2.27 rows=4 width=8) (actual time=0.048..0.050 rows=4 loops=3)
Filter: ((namespace_user_id IS NOT NULL) AND (deleted_at IS NULL))
Rows Removed by Filter: 23
-> Index Scan using repo_pkey on repo (cost=0.43..0.54 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=168)
Index Cond: (id = esr.repo_id)
Filter: (deleted_at IS NULL)
Planning Time: 0.431 ms
Execution Time: 421.226 ms