Skip to content
Snippets Groups Projects

codeintel: Add RepoUsageStatistics to db

Merged Administrator requested to merge pci-repo-usage-stats into master

Created by: efritz

Part of RFC 153.

Add a method to the codeintel db layer to determine the search and lsif-based code intel event counts in the past week for each unique repo known to the instance.

This will be used by the auto indexer to determine which repos should be eligible for automatic LSIF indexing. The basic idea is that things with a code intel event count over a threshold and a high enough search/(precise+search) ratio would benefit greatly from precise indexes.

Query plan of new query:

                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=428850.23..639314.06 rows=92757 width=20) (actual time=10122.697..11508.229 rows=4614 loops=1)
   ->  GroupAggregate  (cost=428849.67..443148.53 rows=92757 width=48) (actual time=10121.537..10874.261 rows=12988 loops=1)
         Group Key: ("substring"(event_logs.url, '//[^/]+/(.+)/-/'::text))
         ->  Sort  (cost=428849.67..431039.57 rows=875960 width=52) (actual time=10121.520..10670.309 rows=1014416 loops=1)
               Sort Key: ("substring"(event_logs.url, '//[^/]+/(.+)/-/'::text))
               Sort Method: external merge  Disk: 63240kB
               ->  Index Scan using event_logs_timestamp on event_logs  (cost=0.44..307314.71 rows=875960 width=52) (actual time=0.075..7587.788 rows=1014416 loops=1)
                     Index Cond: ("timestamp" >= (now() - '7 days'::interval))
   ->  Index Scan using repo_uri_idx on repo r  (cost=0.56..2.09 rows=1 width=39) (actual time=0.048..0.048 rows=0 loops=12988)
         Index Cond: (uri = (("substring"(event_logs.url, '//[^/]+/(.+)/-/'::text)))::citext)
 Planning Time: 0.447 ms
 Execution Time: 11536.736 ms

This query will only be called periodically (probably on the order of a handful of times an hour) from the auto indexer service (coming soon) so doesn't need to be terribly efficient (just not waste resources of other processes).

Merge request reports

Approval is optional
Ready to merge by members who can write to the target branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Created by: tsenart

    Do you think it'd be worth adding a NULLable repo_id column to event_logs? Wondering how much faster the query if you'd group by that id and would avoid all the string transformations.

  • Created by: efritz

    Do you think it'd be worth adding a NULLable repo_id column to event_logs? Wondering how much faster the query if you'd group by that id and would avoid all the string transformations.

    I'm only doing it this way currently as this is the only place with this data and I'd be against modifying the event_logs table to support this use case.

    I'd rather speed this up by creating a second more specialized event counts table that is written conditionally when an event is written.

  • Created by: efritz

    Also currently this is a sloppy heuristic that we're using for a v0, so speed (nor stability long-term) is paramount.

  • Created by: tsenart

    Also currently this is a sloppy heuristic that we're using for a v0, so speed (nor stability long-term) is paramount.

    This all sounds good to me then.

Please register or sign in to reply
Loading