Skip to content

executor: Abandon orphaned jobs

Warren Gifford requested to merge ef/executor-unknown-jobs into main

Created by: efritz

This PR modifies the interface of the heartbeat between executors and the executor-queue. Previously, the executors sent the list of active jobs and the executor-queue would simply log and acknowledge it. Now, the executor-queue sends back the set of ids the executor claims to be processing, but should not be.

This can occur if the executor-queue restarts while there are active jobs (the jobs are lost to the executor-queue, but the executor never cancelled its attempt).

This PR modifies the executor-queue to respond with the set of unknown ids on each heartbeat request, and modifies the executor to receive the unknown ids on a heartbeat and cancel the context associated with unknown jobs.

Fixes #21635 (closed). Review by commit.

Merge request reports

Loading