Skip to content

workerutil: Remove long-running transactions

Administrator requested to merge ef/heartbeat-worker into main

Created by: efritz

Overview

This PR changes the workerutil and dbworker to use a heartbeat update of a job record instead of a long-held transaction to signal an active worker. Fixes https://github.com/sourcegraph/sourcegraph/issues/14920.

Technical overview

  1. A nullable timestamp with tome zone column last_updated_at was added to all tables that are used as a worker job record. Associated Postgres views are not updated as these columns do not need to be exposed outside of the dequeue code.
  2. The workerutil Store interface changed to replace its transactional store-like API with a simple cancel function being returned alongside a dequeued record. This allows a backing store to be implemented by a transaction (as previously) or by some periodic process that ensures the record is still bing serviced (as now). This allows us to simplify the Dequeue function and also remove Transact and Done methods.
  3. Update the dbworker Store implementation. Instead of wrapping a record in a transaction via a two-stage optimistic locking mechanism, we use the non-transactional store to lock a record and periodically update a timestamp to signal that the record is still being processed. The returned cancel function will exit out of this goroutine when the record should be left alone (either having moved into a terminal state or able to be re-processed).

Review by commit

The first commit does all the shared changes and doesn't need to be reviewed by everyone as long as you can confirm that your use of the worker interfaces still behave as expected. Each team should check their own use to make sure I didn't accidentally break a guarantee. I've split changes specific to a team's worker setup into separate commits below.

Merge request reports

Loading