Skip to content

repo-updater: Faster counting of user added repos

Warren Gifford requested to merge ts/improve-count-query into main

Created by: tsenart

This PR speeds up the query we use to count user added repos. I noticed this query while looking through Cloud SQL logs, as it was reported to be a slow query.

Plan before:

Finalize Aggregate  (cost=443694.49..443694.50 rows=1 width=8) (actual time=2534.882..2534.882 rows=1 loops=1)
  ->  Gather  (cost=443694.27..443694.48 rows=2 width=8) (actual time=2534.405..2872.419 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=442694.27..442694.28 rows=1 width=8) (actual time=2529.423..2529.423 rows=1 loops=3)
              ->  Parallel Hash Semi Join  (cost=86672.06..442105.34 rows=235572 width=0) (actual time=2350.572..2529.378 rows=56 loops=3)
                    Hash Cond: (r.id = sr.repo_id)
                    ->  Parallel Seq Scan on repo r  (cost=0.00..335387.16 rows=1581100 width=4) (actual time=0.025..1597.235 rows=1265293 loops=3)
                          Filter: (deleted_at IS NULL)
                          Rows Removed by Filter: 303
                    ->  Parallel Hash  (cost=82805.82..82805.82 rows=235619 width=4) (actual time=354.386..354.386 rows=56 loops=3)
                          Buckets: 131072  Batches: 8  Memory Usage: 1120kB
                          ->  Hash Join  (cost=2.32..82805.82 rows=235619 width=4) (actual time=303.913..352.519 rows=56 loops=3)
                                Hash Cond: (sr.external_service_id = s.id)
                                ->  Parallel Seq Scan on external_service_repos sr  (cost=0.00..77973.30 rows=1590430 width=12) (actual time=0.020..197.140 rows=1279571 loops=3)
                                ->  Hash  (cost=2.27..2.27 rows=4 width=8) (actual time=0.057..0.057 rows=4 loops=3)
                                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                      ->  Seq Scan on external_services s  (cost=0.00..2.27 rows=4 width=8) (actual time=0.047..0.049 rows=4 loops=3)
                                            Filter: ((namespace_user_id IS NOT NULL) AND (deleted_at IS NULL))
                                            Rows Removed by Filter: 23
Planning Time: 0.516 ms
Execution Time: 2872.519 ms

Plan after:

Finalize Aggregate  (cost=212334.94..212334.95 rows=1 width=8) (actual time=366.147..366.148 rows=1 loops=1)
  ->  Gather  (cost=212334.72..212334.93 rows=2 width=8) (actual time=366.011..421.147 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=211334.72..211334.73 rows=1 width=8) (actual time=359.469..359.469 rows=1 loops=3)
              ->  Nested Loop  (cost=2.75..210745.79 rows=235572 width=0) (actual time=322.372..359.453 rows=56 loops=3)
                    ->  Hash Join  (cost=2.32..82805.82 rows=235619 width=4) (actual time=322.301..358.692 rows=56 loops=3)
                          Hash Cond: (esr.external_service_id = es.id)
                          ->  Parallel Seq Scan on external_service_repos esr  (cost=0.00..77973.30 rows=1590430 width=12) (actual time=0.025..200.583 rows=1279573 loops=3)
                          ->  Hash  (cost=2.27..2.27 rows=4 width=8) (actual time=0.060..0.060 rows=4 loops=3)
                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                ->  Seq Scan on external_services es  (cost=0.00..2.27 rows=4 width=8) (actual time=0.048..0.050 rows=4 loops=3)
                                      Filter: ((namespace_user_id IS NOT NULL) AND (deleted_at IS NULL))
                                      Rows Removed by Filter: 23
                    ->  Index Scan using repo_pkey on repo  (cost=0.43..0.54 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=168)
                          Index Cond: (id = esr.repo_id)
                          Filter: (deleted_at IS NULL)
Planning Time: 0.431 ms
Execution Time: 421.226 ms

Merge request reports

Loading