codeintel: Add RepoUsageStatistics to db
Created by: efritz
Part of RFC 153.
Add a method to the codeintel db layer to determine the search and lsif-based code intel event counts in the past week for each unique repo known to the instance.
This will be used by the auto indexer to determine which repos should be eligible for automatic LSIF indexing. The basic idea is that things with a code intel event count over a threshold and a high enough search/(precise+search) ratio would benefit greatly from precise indexes.
Query plan of new query:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=428850.23..639314.06 rows=92757 width=20) (actual time=10122.697..11508.229 rows=4614 loops=1)
-> GroupAggregate (cost=428849.67..443148.53 rows=92757 width=48) (actual time=10121.537..10874.261 rows=12988 loops=1)
Group Key: ("substring"(event_logs.url, '//[^/]+/(.+)/-/'::text))
-> Sort (cost=428849.67..431039.57 rows=875960 width=52) (actual time=10121.520..10670.309 rows=1014416 loops=1)
Sort Key: ("substring"(event_logs.url, '//[^/]+/(.+)/-/'::text))
Sort Method: external merge Disk: 63240kB
-> Index Scan using event_logs_timestamp on event_logs (cost=0.44..307314.71 rows=875960 width=52) (actual time=0.075..7587.788 rows=1014416 loops=1)
Index Cond: ("timestamp" >= (now() - '7 days'::interval))
-> Index Scan using repo_uri_idx on repo r (cost=0.56..2.09 rows=1 width=39) (actual time=0.048..0.048 rows=0 loops=12988)
Index Cond: (uri = (("substring"(event_logs.url, '//[^/]+/(.+)/-/'::text)))::citext)
Planning Time: 0.447 ms
Execution Time: 11536.736 ms
This query will only be called periodically (probably on the order of a handful of times an hour) from the auto indexer service (coming soon) so doesn't need to be terribly efficient (just not waste resources of other processes).
Merge request reports
Activity
Created by: efritz
Do you think it'd be worth adding a NULLable repo_id column to event_logs? Wondering how much faster the query if you'd group by that id and would avoid all the string transformations.
I'm only doing it this way currently as this is the only place with this data and I'd be against modifying the event_logs table to support this use case.
I'd rather speed this up by creating a second more specialized event counts table that is written conditionally when an event is written.