Skip to content

insights: query data aggregated over 12h

Administrator requested to merge sg/insights-aggregation into main

Created by: slimsag

Today, we query every data point that TimescaleDB has for a series. We record data points at a rate of 1 per 12h - however, there can still sometimes be duplicate data points which appear at a faster rate:

  1. Each time Sourcegraph is started (repo-updater) we enqueue data points, so the interval between data points is not strictly deterministic.
  2. More important, if we record 1 data point per repository (# of search results per repository), we have many data points recorded roughly at the same time and need to aggregate them if we intend to display an insight for "total results across all repositories" - this will be true once we start doing data backfilling and once I implement recording of data points per repository (very soon.)

Thus, this change makes us aggregate so we only ever get back 1 data point every 12h.

In the future, it may make sense for the web UI to ask for different granularity values - e.g. give me 1 point for every month in the last year. We can support this efficiently using pre-defined TimescaleDB continuous aggregates but for now 12h is the minimal and default aggregation, which should be OK without a special index.

Signed-off-by: Stephen Gutekanst [email protected]

Merge request reports

Loading