codeintel: remove redundant double `repo` table join
Created by: Strum355
Low hanging fruit PR that removes using certain Postgres views lsif_uploads_with_repository_name
and lsif_indexes_with_repository_name
in cases where we were doing join onto repo
for authz-conds, as these views were already doing a join onto repo
. This shaves between 1-3 seconds off the queries.
The biggest offenders are yet to be addressed (huge heap fetches from the index-only scan on repo
)
Old Query + Plan
SELECT
u.id,
u.commit,
u.queued_at,
u.state,
u.failure_message,
u.started_at,
u.finished_at,
u.process_after,
u.num_resets,
u.num_failures,
u.repository_id,
u.repository_name,
u.docker_steps,
u.root,
u.indexer,
u.indexer_args,
u.outfile,
u.execution_logs,
s.rank,
u.local_steps,
(
SELECT MAX(id) FROM lsif_uploads WHERE associated_index_id = u.id
) AS associated_upload_id
FROM lsif_indexes_with_repository_name u
LEFT JOIN (
SELECT
r.id,
ROW_NUMBER() OVER (
ORDER BY COALESCE(r.process_after, r.queued_at), r.id
) as rank
FROM lsif_indexes_with_repository_name r
WHERE r.state = 'queued'
) s
ON u.id = s.id
JOIN repo ON repo.id = u.repository_id
ORDER BY queued_at DESC, u.id LIMIT 50;
Limit (cost=500273.03..500406.95 rows=50 width=577) (actual time=4877.716..4986.262 rows=50 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, r.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps, ((SubPlan 1))"
Buffers: shared hit=10342226
-> Result (cost=500273.03..2317785.05 rows=678591 width=577) (actual time=4877.714..4986.252 rows=50 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, r.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps, (SubPlan 1)"
Buffers: shared hit=10342226
-> Sort (cost=500273.03..501969.51 rows=678591 width=573) (actual time=4877.648..4986.059 rows=50 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, r.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps"
" Sort Key: u.queued_at DESC, u.id"
Sort Method: top-N heapsort Memory: 78kB
Buffers: shared hit=10342076
-> Hash Left Join (cost=162778.90..477730.73 rows=678591 width=573) (actual time=2300.504..4449.476 rows=1007576 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, r.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps"
Hash Cond: (u.id = s.id)
Buffers: shared hit=10342076
-> Gather (cost=162611.64..475018.46 rows=678591 width=565) (actual time=2300.077..4013.694 rows=1007576 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps, r.name"
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=10341757
-> Nested Loop (cost=161611.64..406159.36 rows=282746 width=565) (actual time=2289.993..4171.644 rows=335859 loops=3)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps, r.name"
Inner Unique: true
Buffers: shared hit=10341757
Worker 0: actual time=2285.207..4470.810 rows=394736 loops=1
Buffers: shared hit=3719316
Worker 1: actual time=2285.218..4461.963 rows=397046 loops=1
Buffers: shared hit=3728602
-> Parallel Hash Join (cost=161611.21..275512.27 rows=282746 width=569) (actual time=2289.971..2877.886 rows=335859 loops=3)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps, r.name, r.id"
Inner Unique: true
Hash Cond: (u.repository_id = r.id)
Buffers: shared hit=5608147
Worker 0: actual time=2285.187..2968.299 rows=394736 loops=1
Buffers: shared hit=1865475
Worker 1: actual time=2285.198..2960.412 rows=397046 loops=1
Buffers: shared hit=1863186
-> Parallel Seq Scan on public.lsif_indexes u (cost=0.00..112800.73 rows=419173 width=521) (actual time=0.008..296.627 rows=335859 loops=3)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps"
Buffers: shared hit=108609
Worker 0: actual time=0.008..345.063 rows=394736 loops=1
Buffers: shared hit=42332
Worker 1: actual time=0.009..345.255 rows=397046 loops=1
Buffers: shared hit=42561
-> Parallel Hash (cost=130930.73..130930.73 rows=2454438 width=48) (actual time=2271.239..2271.240 rows=1971750 loops=3)
" Output: r.name, r.id"
Buckets: 8388608 Batches: 1 Memory Usage: 505760kB
Buffers: shared hit=5499478
Worker 0: actual time=2285.025..2285.026 rows=1963275 loops=1
Buffers: shared hit=1823113
Worker 1: actual time=2285.041..2285.041 rows=1957885 loops=1
Buffers: shared hit=1820595
-> Parallel Index Only Scan using repo_non_deleted_id_name_idx on public.repo r (cost=0.43..130930.73 rows=2454438 width=48) (actual time=0.031..1356.792 rows=1971750 loops=3)
" Output: r.name, r.id"
Heap Fetches: 2199897
Buffers: shared hit=5499478
Worker 0: actual time=0.030..1352.703 rows=1963275 loops=1
Buffers: shared hit=1823113
Worker 1: actual time=0.039..1352.041 rows=1957885 loops=1
Buffers: shared hit=1820595
-> Index Only Scan using repo_pkey on public.repo (cost=0.43..0.46 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=1007576)
Output: repo.id
Index Cond: (repo.id = r.id)
Heap Fetches: 910427
Buffers: shared hit=4733610
Worker 0: actual time=0.003..0.003 rows=1 loops=394736
Buffers: shared hit=1853841
Worker 1: actual time=0.003..0.003 rows=1 loops=397046
Buffers: shared hit=1865416
-> Hash (cost=166.69..166.69 rows=45 width=16) (actual time=0.409..0.414 rows=69 loops=1)
" Output: s.rank, s.id"
Buckets: 1024 Batches: 1 Memory Usage: 12kB
Buffers: shared hit=319
-> Subquery Scan on s (cost=165.34..166.69 rows=45 width=16) (actual time=0.339..0.392 rows=69 loops=1)
" Output: s.rank, s.id"
Buffers: shared hit=319
-> WindowAgg (cost=165.34..166.24 rows=45 width=24) (actual time=0.338..0.376 rows=69 loops=1)
" Output: u_1.id, row_number() OVER (?), (COALESCE(u_1.process_after, u_1.queued_at))"
Buffers: shared hit=319
-> Sort (cost=165.34..165.45 rows=45 width=16) (actual time=0.328..0.337 rows=69 loops=1)
" Output: u_1.id, (COALESCE(u_1.process_after, u_1.queued_at))"
" Sort Key: (COALESCE(u_1.process_after, u_1.queued_at)), u_1.id"
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=319
-> Nested Loop (cost=0.86..164.10 rows=45 width=16) (actual time=0.068..0.293 rows=69 loops=1)
" Output: u_1.id, COALESCE(u_1.process_after, u_1.queued_at)"
Inner Unique: true
Buffers: shared hit=319
-> Index Scan using lsif_indexes_state on public.lsif_indexes u_1 (cost=0.42..59.15 rows=67 width=28) (actual time=0.053..0.102 rows=69 loops=1)
" Output: u_1.id, u_1.process_after, u_1.queued_at, u_1.repository_id"
Index Cond: (u_1.state = 'queued'::text)
Buffers: shared hit=33
-> Index Only Scan using repo_non_deleted_id_name_idx on public.repo r_1 (cost=0.43..1.57 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=69)
" Output: r_1.id, r_1.name"
Index Cond: (r_1.id = u_1.repository_id)
Heap Fetches: 67
Buffers: shared hit=286
SubPlan 1
-> Aggregate (cost=2.66..2.67 rows=1 width=4) (actual time=0.002..0.003 rows=1 loops=50)
Output: max(lsif_uploads.id)
Buffers: shared hit=150
-> Index Scan using lsif_uploads_associated_index_id on public.lsif_uploads (cost=0.42..2.65 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=50)
Output: lsif_uploads.id
Index Cond: (lsif_uploads.associated_index_id = u.id)
Buffers: shared hit=150
Planning Time: 1.442 ms
Execution Time: 4986.415 ms
New Plan + Query
SELECT
u.id,
u.commit,
u.queued_at,
u.state,
u.failure_message,
u.started_at,
u.finished_at,
u.process_after,
u.num_resets,
u.num_failures,
u.repository_id,
repo.name,
u.docker_steps,
u.root,
u.indexer,
u.indexer_args,
u.outfile,
u.execution_logs,
s.rank,
u.local_steps,
(
SELECT MAX(id) FROM lsif_uploads WHERE associated_index_id = u.id
) AS associated_upload_id
FROM lsif_indexes u
LEFT JOIN (
SELECT
r.id,
ROW_NUMBER() OVER (
ORDER BY COALESCE(r.process_after, r.queued_at), r.id
) as rank
FROM lsif_indexes_with_repository_name r
WHERE r.state = 'queued'
) s
ON u.id = s.id
JOIN repo ON repo.id = u.repository_id
WHERE repo.deleted_at IS NULL
ORDER BY queued_at DESC, u.id LIMIT 50;
Limit (cost=369625.94..369814.16 rows=50 width=577) (actual time=3589.991..3673.780 rows=50 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, repo.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps, ((SubPlan 1))"
Buffers: shared hit=5608955
-> Result (cost=369625.94..2924013.65 rows=678591 width=577) (actual time=3589.990..3673.770 rows=50 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, repo.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps, (SubPlan 1)"
Buffers: shared hit=5608955
-> Sort (cost=369625.94..371322.42 rows=678591 width=573) (actual time=3589.948..3673.475 rows=50 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, repo.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps"
" Sort Key: u.queued_at DESC, u.id"
Sort Method: top-N heapsort Memory: 81kB
Buffers: shared hit=5608756
-> Hash Left Join (cost=162778.46..347083.64 rows=678591 width=573) (actual time=2349.443..3185.644 rows=1007727 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, repo.name, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, s.rank, u.local_steps"
Hash Cond: (u.id = s.id)
Buffers: shared hit=5608756
-> Gather (cost=162611.21..344371.37 rows=678591 width=565) (actual time=2349.118..2750.451 rows=1007727 loops=1)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps, repo.name"
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=5608630
-> Parallel Hash Join (cost=161611.21..275512.27 rows=282746 width=565) (actual time=2338.288..2866.322 rows=335909 loops=3)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps, repo.name"
Inner Unique: true
Hash Cond: (u.repository_id = repo.id)
Buffers: shared hit=5608630
Worker 0: actual time=2333.120..3113.884 rows=488314 loops=1
Buffers: shared hit=1852067
Worker 1: actual time=2333.128..3103.832 rows=506976 loops=1
Buffers: shared hit=1911134
-> Parallel Seq Scan on public.lsif_indexes u (cost=0.00..112800.73 rows=419173 width=521) (actual time=0.013..275.659 rows=335909 loops=3)
" Output: u.id, u.commit, u.queued_at, u.state, u.failure_message, u.started_at, u.finished_at, u.process_after, u.num_resets, u.num_failures, u.repository_id, u.docker_steps, u.root, u.indexer, u.indexer_args, u.outfile, u.execution_logs, u.local_steps"
Buffers: shared hit=108609
Worker 0: actual time=0.015..408.241 rows=488314 loops=1
Buffers: shared hit=51959
Worker 1: actual time=0.015..406.191 rows=506976 loops=1
Buffers: shared hit=53738
-> Parallel Hash (cost=130930.73..130930.73 rows=2454438 width=48) (actual time=2318.496..2318.497 rows=1971791 loops=3)
" Output: repo.name, repo.id"
Buckets: 8388608 Batches: 1 Memory Usage: 505792kB
Buffers: shared hit=5499915
Worker 0: actual time=2332.972..2332.973 rows=1935991 loops=1
Buffers: shared hit=1800055
Worker 1: actual time=2332.939..2332.940 rows=1997412 loops=1
Buffers: shared hit=1857343
-> Parallel Index Only Scan using repo_non_deleted_id_name_idx on public.repo (cost=0.43..130930.73 rows=2454438 width=48) (actual time=0.033..1381.237 rows=1971791 loops=3)
" Output: repo.name, repo.id"
Heap Fetches: 2200449
Buffers: shared hit=5499915
Worker 0: actual time=0.047..1377.731 rows=1935991 loops=1
Buffers: shared hit=1800055
Worker 1: actual time=0.029..1378.511 rows=1997412 loops=1
Buffers: shared hit=1857343
-> Hash (cost=166.69..166.69 rows=45 width=16) (actual time=0.301..0.305 rows=15 loops=1)
" Output: s.rank, s.id"
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: shared hit=126
-> Subquery Scan on s (cost=165.34..166.69 rows=45 width=16) (actual time=0.279..0.294 rows=15 loops=1)
" Output: s.rank, s.id"
Buffers: shared hit=126
-> WindowAgg (cost=165.34..166.24 rows=45 width=24) (actual time=0.277..0.287 rows=15 loops=1)
" Output: u_1.id, row_number() OVER (?), (COALESCE(u_1.process_after, u_1.queued_at))"
Buffers: shared hit=126
-> Sort (cost=165.34..165.45 rows=45 width=16) (actual time=0.266..0.269 rows=15 loops=1)
" Output: u_1.id, (COALESCE(u_1.process_after, u_1.queued_at))"
" Sort Key: (COALESCE(u_1.process_after, u_1.queued_at)), u_1.id"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=126
-> Nested Loop (cost=0.86..164.10 rows=45 width=16) (actual time=0.090..0.255 rows=15 loops=1)
" Output: u_1.id, COALESCE(u_1.process_after, u_1.queued_at)"
Inner Unique: true
Buffers: shared hit=126
-> Index Scan using lsif_indexes_state on public.lsif_indexes u_1 (cost=0.42..59.15 rows=67 width=28) (actual time=0.072..0.181 rows=15 loops=1)
" Output: u_1.id, u_1.process_after, u_1.queued_at, u_1.repository_id"
Index Cond: (u_1.state = 'queued'::text)
Buffers: shared hit=59
-> Index Only Scan using repo_non_deleted_id_name_idx on public.repo r (cost=0.43..1.57 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=15)
" Output: r.id, r.name"
Index Cond: (r.id = u_1.repository_id)
Heap Fetches: 13
Buffers: shared hit=67
SubPlan 1
-> Aggregate (cost=3.74..3.75 rows=1 width=4) (actual time=0.004..0.004 rows=1 loops=50)
Output: max(lsif_uploads.id)
Buffers: shared hit=199
-> Index Scan using lsif_uploads_associated_index_id on public.lsif_uploads (cost=0.42..3.74 rows=2 width=4) (actual time=0.003..0.004 rows=0 loops=50)
Output: lsif_uploads.id
Index Cond: (lsif_uploads.associated_index_id = u.id)
Buffers: shared hit=199
Planning Time: 1.430 ms
Execution Time: 3673.916 ms
Test plan
Covered by existing testing. Ran queries against production database for scale.