Skip to content

internal/repos: Extract seconds instead of epoch in backoffquery

Warren Gifford requested to merge ig/COREAPP-237 into main

Created by: indradhanush

Description

Here's the same query with the existing and the proposed change on dotcom:

Current query

dev-readonly@localhost:sg> -- source: internal/repos/metrics.go:src_repoupdater_errored_sync_jobs_total
 SELECT extract(epoch from max(now() - last_sync_at))
 FROM external_services AS es
 WHERE deleted_at IS NULL
 AND NOT cloud_default
 AND last_sync_at IS NOT NULL
 -- Exclude any external services that are currently syncing since it's possible they may sync for more
 -- than our max backoff time.
 AND NOT EXISTS(SELECT FROM external_service_sync_jobs WHERE external_service_id = es.id AND finished_at IS NULL);
+--------------+
| date_part    |
|--------------|
| 50830.671107 |
+--------------+
SELECT 1
Time: 0.296s

Proposed change

dev-readonly@localhost:sg> -- source: internal/repos/metrics.go:src_repoupdater_errored_sync_jobs_total
 SELECT extract(seconds from max(now() - last_sync_at))
 FROM external_services AS es
 WHERE deleted_at IS NULL
 AND NOT cloud_default
 AND last_sync_at IS NOT NULL
 -- Exclude any external services that are currently syncing since it's possible they may sync for more
 -- than our max backoff time.
 AND NOT EXISTS(SELECT FROM external_service_sync_jobs WHERE external_service_id = es.id AND finished_at IS NULL);
+-------------+
| date_part   |
|-------------|
| 19.014354   |
+-------------+
SELECT 1
Time: 0.288s

Why

The alert tied to this metric is firing here: https://opsg.in/a/i/sourcegraph/51c80ccd-78a9-4ea0-892d-8f8797627dad-1629947213048

image

A look at the graph for this metric reveals that the value is monotonically increasing over the last ~12 hours, even though looking at the repo-updater queue on dotcom, it is evident that sync jobs are running.

This change should fix the metric.

Our current approach used to work so far, but I am not sure what might have caused this to stop working. The proposed approach to show the difference in seconds also feels more precise instead of extracting epoch - which shows seconds since epoch.

Converting the value in the above example - 50830 to timestamp using https://www.epochconverter.com/ shows a timestamp very close to epoch itself.

COREAPP-237 #done

Merge request reports

Loading