Speed up ListIndexableRepos query for uncloned
Created by: eseliger
Since we introduced an invariant that gitserver_repos MUST have a record for each repo, we no longer need to do a left join here, and hence can optimize. Still not fast, but faster.
Before:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=1385996.09..1428720.77 rows=366186 width=45) (actual time=7499.200..7589.828 rows=4914 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=1384996.07..1385453.80 rows=183093 width=45) (actual time=7484.861..7485.005 rows=1638 loops=3)
Sort Key: repo.stars DESC NULLS LAST
Sort Method: quicksort Memory: 248kB
Worker 0: Sort Method: quicksort Memory: 257kB
Worker 1: Sort Method: quicksort Memory: 256kB
-> Parallel Hash Left Join (cost=480914.62..1368991.71 rows=183093 width=45) (actual time=2002.405..7483.617 rows=1638 loops=3)
Hash Cond: (repo.id = gr.repo_id)
Filter: ((gr.clone_status IS NULL) OR (gr.clone_status = 'not_cloned'::text))
Rows Removed by Filter: 994128
-> Parallel Seq Scan on repo (cost=104508.90..988268.68 rows=1644687 width=45) (actual time=312.475..4995.084 rows=995766 loops=3)
Filter: ((deleted_at IS NULL) AND (blocked IS NULL) AND (((stars >= 5) AND (NOT COALESCE(fork, false)) AND (NOT archived)) OR (lower((name)::text) ~ '^(src\.fedoraproject\.org|maven|npm|jdk)'::text) OR (hashed SubPlan 1)))
Rows Removed by Filter: 1258974
SubPlan 1
-> Append (cost=6113.13..103243.67 rows=506091 width=4) (actual time=45.900..249.734 rows=476519 loops=3)
-> Bitmap Heap Scan on external_service_repos (cost=6113.13..95632.31 rows=504001 width=4) (actual time=45.899..190.160 rows=474429 loops=3)
Recheck Cond: ((user_id IS NOT NULL) OR (org_id IS NOT NULL))
Heap Blocks: exact=39490
-> BitmapOr (cost=6113.13..6113.13 rows=504118 width=0) (actual time=38.602..38.603 rows=0 loops=3)
-> Bitmap Index Scan on external_service_user_repos_idx (cost=0.00..5836.75 rows=502372 width=0) (actual time=38.509..38.509 rows=472829 loops=3)
Index Cond: (user_id IS NOT NULL)
-> Bitmap Index Scan on external_service_repos_org_id_idx (cost=0.00..24.38 rows=1746 width=0) (actual time=0.092..0.092 rows=1600 loops=3)
Index Cond: (org_id IS NOT NULL)
-> Seq Scan on user_public_repos (cost=0.00..40.90 rows=2090 width=4) (actual time=0.041..0.412 rows=2090 loops=3)
-> Parallel Hash (cost=341028.10..341028.10 rows=2830210 width=11) (actual time=1477.973..1477.974 rows=2254740 loops=3)
Buckets: 8388608 Batches: 1 Memory Usage: 382976kB
-> Parallel Seq Scan on gitserver_repos gr (cost=0.00..341028.10 rows=2830210 width=11) (actual time=0.014..661.638 rows=2254740 loops=3)
Planning Time: 1.929 ms
Execution Time: 7592.740 ms
(31 rows)
After
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=1164998.24..1207722.92 rows=366186 width=45) (actual time=5885.152..5900.653 rows=4913 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=1163998.21..1164455.94 rows=183093 width=45) (actual time=5879.589..5879.741 rows=1638 loops=3)
Sort Key: repo.stars DESC NULLS LAST
Sort Method: quicksort Memory: 260kB
Worker 0: Sort Method: quicksort Memory: 257kB
Worker 1: Sort Method: quicksort Memory: 244kB
-> Parallel Hash Join (cost=259916.77..1147993.85 rows=183093 width=45) (actual time=1061.089..5878.503 rows=1638 loops=3)
Hash Cond: (repo.id = gr.repo_id)
-> Parallel Seq Scan on repo (cost=104508.90..988268.68 rows=1644687 width=45) (actual time=286.767..4787.528 rows=995766 loops=3)
Filter: ((deleted_at IS NULL) AND (blocked IS NULL) AND (((stars >= 5) AND (NOT COALESCE(fork, false)) AND (NOT archived)) OR (lower((name)::text) ~ '^(src\.fedoraproject\.org|maven|npm|jdk)'::text) OR (hashed SubPlan 1)))
Rows Removed by Filter: 1258975
SubPlan 1
-> Append (cost=6113.13..103243.67 rows=506091 width=4) (actual time=35.820..253.044 rows=476519 loops=3)
-> Bitmap Heap Scan on external_service_repos (cost=6113.13..95632.31 rows=504001 width=4) (actual time=35.818..193.572 rows=474429 loops=3)
Recheck Cond: ((user_id IS NOT NULL) OR (org_id IS NOT NULL))
Heap Blocks: exact=39490
-> BitmapOr (cost=6113.13..6113.13 rows=504118 width=0) (actual time=28.704..28.705 rows=0 loops=3)
-> Bitmap Index Scan on external_service_user_repos_idx (cost=0.00..5836.75 rows=502372 width=0) (actual time=28.623..28.624 rows=472829 loops=3)
Index Cond: (user_id IS NOT NULL)
-> Bitmap Index Scan on external_service_repos_org_id_idx (cost=0.00..24.38 rows=1746 width=0) (actual time=0.078..0.079 rows=1600 loops=3)
Index Cond: (org_id IS NOT NULL)
-> Seq Scan on user_public_repos (cost=0.00..40.90 rows=2090 width=4) (actual time=0.032..0.428 rows=2090 loops=3)
-> Parallel Hash (cost=151488.03..151488.03 rows=313587 width=4) (actual time=601.020..601.025 rows=247423 loops=3)
Buckets: 1048576 Batches: 1 Memory Usage: 37248kB
-> Parallel Index Only Scan using gitserver_repos_not_cloned_status_idx on gitserver_repos gr (cost=0.42..151488.03 rows=313587 width=4) (actual time=0.163..507.651 rows=247423 loops=3)
Heap Fetches: 1081295
Planning Time: 1.356 ms
Execution Time: 5902.782 ms
(30 rows)
Test plan
Validated with query plans this change is good.