ssbc: Fix duplicate changeset specs
Created by: eseliger
Okay, this has been quite a journey to discover. It happened occasionally on k8s when updates were rolling out. Here's what happens:
- Executor
A
has jobZ
- Frontend goes down, executor
A
cannot send heartbeats anymore - Worker resets job to queued, since it is considered “lost”, executor
A
doesn't know about this yet, as it hasn't had a successful heartbeat again so far - Executor
A
continues to work as frontend is down - Frontend comes back up slowly
- Executor
B
gets assigned jobZ
, overwriting theworker_hostname
column thus now owning the job - Executor
A
is done, and has not yet learned that it doesn’t own the record anymore (no successful heartbeat yet) - Executor
A
callsMarkComplete({workerHostname: A})
- In the backend, we first do our side-effects and then call
dbworkerstore.MarkComplete
. This returnsfalse, nil
(not updated, but no DB err) - We don't look at the
ok
value, commit the transaction sinceerr
isnil
, the changeset spec is committed to the DB - Executor
A
gives up - Executor
B
completes work, callsMarkComplete({workerHostname: B})
, this time returningtrue, nil
, since the hostname matches - The
workspace.changeset_specs
field is overwritten and includes the new ID, but the old stray changeset spec remains - We get the conflicting changeset specs error in the UI, but only 1 result is displayed on the workspace, just as observed
We now properly revert the transactions, test this in code and also do one query less, which was not needed.
Closes https://github.com/sourcegraph/sourcegraph/issues/36918
Test plan
Extended test suite.