Skip to content

db: Improve defaultRepos.List query

Warren Gifford requested to merge ts/improved-query into main

Created by: tsenart

This commit improves the defaultRepos.List query to use a better plan — it results in rougly 480ms runtime in production currently, vs the previous 1.4s. It also fixes a bug with the old query (we weren't filtering out deleted repos from the default_repos table).

Thanks @felixge for the SQL hacking session :)

Before

Unique  (cost=1325604.75..1351922.72 rows=3509063 width=36) (actual time=1135.181..1410.477 rows=206416 loops=1)
  ->  Sort  (cost=1325604.75..1334377.41 rows=3509063 width=36) (actual time=1135.179..1157.578 rows=206416 loops=1)
        Sort Key: r.id, r.name
        Sort Method: external merge  Disk: 9984kB
        ->  Append  (cost=168704.21..831721.53 rows=3509063 width=36) (actual time=760.768..1046.177 rows=206416 loops=1)
              ->  Hash Semi Join  (cost=168704.21..639660.16 rows=3302671 width=39) (actual time=760.395..760.395 rows=0 loops=1)
                    Hash Cond: (r.id = sr.repo_id)
                    ->  Seq Scan on repo r  (cost=0.00..347065.21 rows=3302671 width=39) (actual time=0.006..0.006 rows=1 loops=1)
                          Filter: (deleted_at IS NULL)
                    ->  Hash  (cost=107855.17..107855.17 rows=3708883 width=4) (actual time=760.255..760.255 rows=0 loops=1)
                          Buckets: 131072  Batches: 64  Memory Usage: 1024kB
                          ->  Hash Join  (cost=1.38..107855.17 rows=3708883 width=4) (actual time=760.254..760.254 rows=0 loops=1)
                                Hash Cond: (sr.external_service_id = s.id)
                                ->  Seq Scan on external_service_repos sr  (cost=0.00..95690.83 rows=3708883 width=12) (actual time=0.008..443.226 rows=3643696 loops=1)
                                ->  Hash  (cost=1.17..1.17 rows=17 width=8) (actual time=0.013..0.013 rows=3 loops=1)
                                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                      ->  Seq Scan on external_services s  (cost=0.00..1.17 rows=17 width=8) (actual time=0.008..0.010 rows=3 loops=1)
                                            Filter: (namespace_user_id IS NOT NULL)
                                            Rows Removed by Filter: 16
              ->  Gather  (cost=1000.85..139425.42 rows=206392 width=39) (actual time=0.371..269.532 rows=206416 loops=1)
                    Workers Planned: 2
                    Workers Launched: 2
                    ->  Nested Loop  (cost=0.85..117786.22 rows=85997 width=39) (actual time=0.096..279.144 rows=68805 loops=3)
                          ->  Parallel Index Only Scan using default_repos_pkey on default_repos  (cost=0.42..3522.35 rows=85997 width=4) (actual time=0.047..37.204 rows=68805 loops=3)
                                Heap Fetches: 206416
                          ->  Index Scan using repo_pkey on repo  (cost=0.43..1.33 rows=1 width=39) (actual time=0.003..0.003 rows=1 loops=206416)
                                Index Cond: (id = default_repos.repo_id)
Planning Time: 0.547 ms
Execution Time: 1425.366 ms

After (attempt 1)

Unique  (cost=493745.14..495867.59 rows=424491 width=39) (actual time=421.170..476.224 rows=206400 loops=1)
  ->  Sort  (cost=493745.14..494806.37 rows=424491 width=39) (actual time=421.169..443.556 rows=206400 loops=1)
        Sort Key: repo.id
        Sort Method: external merge  Disk: 9976kB
        ->  Gather  (cost=1000.43..447265.20 rows=424491 width=39) (actual time=0.369..321.361 rows=206400 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Nested Loop  (cost=0.43..403816.10 rows=176871 width=39) (actual time=0.079..322.571 rows=68800 loops=3)
                    ->  Parallel Append  (cost=0.00..60190.47 rows=304167 width=4) (actual time=0.033..13.560 rows=68805 loops=3)
                          ->  Nested Loop  (cost=0.43..54996.19 rows=218170 width=4) (actual time=0.040..0.040 rows=0 loops=1)
                                ->  Seq Scan on external_services es  (cost=0.00..1.17 rows=1 width=8) (actual time=0.040..0.040 rows=0 loops=1)
                                      Filter: ((namespace_user_id IS NOT NULL) AND (deleted_at IS NULL))
                                      Rows Removed by Filter: 19
                                ->  Index Scan using external_service_repos_external_service_id on external_service_repos esr  (cost=0.43..45722.81 rows=927221 width=12) (never executed)
                                      Index Cond: (external_service_id = es.id)
                          ->  Parallel Seq Scan on default_repos  (cost=0.00..2128.07 rows=121407 width=4) (actual time=0.018..7.518 rows=68805 loops=3)
                    ->  Index Scan using repo_pkey on repo  (cost=0.43..1.13 rows=1 width=39) (actual time=0.004..0.004 rows=1 loops=206416)
                          Index Cond: (id = esr.repo_id)
                          Filter: (deleted_at IS NULL)
                          Rows Removed by Filter: 0
Planning Time: 0.333 ms
Execution Time: 487.791 ms

After (attempt 2)

Gather  (cost=59179.51..199339.22 rows=206358 width=39) (actual time=1.589..437.872 rows=206400 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Nested Loop  (cost=58179.51..177703.42 rows=85982 width=39) (actual time=1.025..370.151 rows=68800 loops=3)
        ->  Hash Left Join  (cost=58179.08..63431.76 rows=85997 width=8) (actual time=0.968..67.309 rows=68805 loops=3)
              Hash Cond: (dr.repo_id = esr.repo_id)
              ->  Parallel Index Only Scan using default_repos_pkey on default_repos dr  (cost=0.42..3522.35 rows=85997 width=4) (actual time=0.040..37.065 rows=68805 loops=3)
                    Heap Fetches: 206416
              ->  Hash  (cost=55265.88..55265.88 rows=177503 width=4) (actual time=0.058..0.058 rows=0 loops=3)
                    Buckets: 131072  Batches: 4  Memory Usage: 1024kB
                    ->  Nested Loop  (cost=0.43..55265.88 rows=177503 width=4) (actual time=0.057..0.057 rows=0 loops=3)
                          ->  Seq Scan on external_services es  (cost=0.00..2.21 rows=1 width=8) (actual time=0.057..0.057 rows=0 loops=3)
                                Filter: ((namespace_user_id IS NOT NULL) AND (deleted_at IS NULL))
                                Rows Removed by Filter: 21
                          ->  Index Scan using external_service_repos_external_service_id on external_service_repos esr  (cost=0.43..45944.79 rows=931888 width=12) (never executed)
                                Index Cond: (external_service_id = es.id)
        ->  Index Scan using repo_pkey on repo  (cost=0.43..1.33 rows=1 width=39) (actual time=0.004..0.004 rows=1 loops=206416)
              Index Cond: (id = COALESCE(esr.repo_id, dr.repo_id))
              Filter: (deleted_at IS NULL)
              Rows Removed by Filter: 0
Planning Time: 2.565 ms
Execution Time: 447.685 ms

Merge request reports

Loading