internal/repos: Extract seconds instead of epoch in backoffquery
Created by: indradhanush
Description
Here's the same query with the existing and the proposed change on dotcom:
Current query
dev-readonly@localhost:sg> -- source: internal/repos/metrics.go:src_repoupdater_errored_sync_jobs_total
SELECT extract(epoch from max(now() - last_sync_at))
FROM external_services AS es
WHERE deleted_at IS NULL
AND NOT cloud_default
AND last_sync_at IS NOT NULL
-- Exclude any external services that are currently syncing since it's possible they may sync for more
-- than our max backoff time.
AND NOT EXISTS(SELECT FROM external_service_sync_jobs WHERE external_service_id = es.id AND finished_at IS NULL);
+--------------+
| date_part |
|--------------|
| 50830.671107 |
+--------------+
SELECT 1
Time: 0.296s
Proposed change
dev-readonly@localhost:sg> -- source: internal/repos/metrics.go:src_repoupdater_errored_sync_jobs_total
SELECT extract(seconds from max(now() - last_sync_at))
FROM external_services AS es
WHERE deleted_at IS NULL
AND NOT cloud_default
AND last_sync_at IS NOT NULL
-- Exclude any external services that are currently syncing since it's possible they may sync for more
-- than our max backoff time.
AND NOT EXISTS(SELECT FROM external_service_sync_jobs WHERE external_service_id = es.id AND finished_at IS NULL);
+-------------+
| date_part |
|-------------|
| 19.014354 |
+-------------+
SELECT 1
Time: 0.288s
Why
The alert tied to this metric is firing here: https://opsg.in/a/i/sourcegraph/51c80ccd-78a9-4ea0-892d-8f8797627dad-1629947213048
A look at the graph for this metric reveals that the value is monotonically increasing over the last ~12 hours, even though looking at the repo-updater queue on dotcom, it is evident that sync jobs are running.
This change should fix the metric.
Our current approach used to work so far, but I am not sure what might have caused this to stop working. The proposed approach to show the difference in seconds
also feels more precise instead of extracting epoch
- which shows seconds since epoch.
Converting the value in the above example - 50830
to timestamp using https://www.epochconverter.com/ shows a timestamp very close to epoch itself.
COREAPP-237 #done