insights: historical backfiller unnecessarily repeats many queries
Created by: coury-clark
The historical backfiller for code insights analyzes data frames to determine which frames need to be sampled.
Once the analyzer finds frames that need to be queried, tasks are queued up. These tasks can take significantly longer than the frame analyzer, and many of them will end up duplicated in the queue. Currently on dogfood-k8s there are approximately 112 million queued jobs for only 3 defined insights.
The analyzer does perform a check if there is already a point in the time frame, but this is ineffective:
- The frame could be queued but not yet executed
- The frame could have been skipped as part of an optimization pass, in which case the value at that frame should be the most recent observation, which may exist
This is problematic because any newly defined insight would have to wait for all of those records to clear before it will start filling, as well as the unnecessary work performed on the cluster.
Ideally we should only need to perform one search per frame per insight series.