insights: modify historical dataframes to record at the start of each interval instead of in the middle
Created by: coury-clark
One more observation:
The intervals are not consistent as they approach the created_at
time of a data series, which is also the first indexed recording. For an underlying data value that grows rapidly, that could make it look like a significantly larger gap than at the older points. Here is a comparison of an "all repository insight" versus sourcegraph/sourcegraph
for the same query over roughly the same time interval.
If we look at the raw datapoints inside the "all repos" version but scoped specifically to sourcegraph/sourcegraph
, we can see the last historical recording was 2021-06-18
.
time | value |
---|---|
2021-08-02 | 2524 |
2021-08-01 | 2524 |
2021-07-30 | 2524 |
2021-06-18 | 1568 |
2021-05-19 | 1143 |
2021-04-19 | 987 |
2021-03-20 | 865 |
2021-02-15 | 602 |
The logic behind the skipped last interval was to accommodate a potentially stale commit index, with the assumption that the indexed interval would occur near enough to the end of the last interval that it would be consistent all the way through. Code
There are a few reasons this behaves strangely.
- The time intervals that would be built for a backfill starting on
2021-07-30
would have been queried close to the midpoint of each interval. #23515 (closed) presents a small bug in this behavior. - Choosing the midpoint may already be an incorrect choice for a time. The intervals are constructed such that the starting time is the time at which we want to record, and we are building a frame for what that value remains valid. This means we should also likely query for the most recent commit before a time, rather than the nearest commit.
I suspect we should do the following:
- Fix #23515 (closed)
- Modify the behavior of both the compressor and the backfiller to use the starting point of each frame as the recording time, and look for the most recent commit <= that time.
Originally posted by @coury-clark in https://github.com/sourcegraph/sourcegraph/issues/22401#issuecomment-892151451