Add `BatchSpecWorkspaces` and `BatchSpecResolutionJobs`
Created by: mrnugget
What?
This is part of #24421 (closed) and adds three new database tables to our batch changes system:
-
batch_spec_resolution_jobs
- these are worker jobs that will be created through the GraphQL together with abatch_spec
when a user wants to kick of a server-side execution. The GraphQL part is not done yet. But once abatch_spec_resolution_job
is created a worker will pick them up, load the correspondingbatch_spec
and resolve itson
part intoRepoWorkspaces
: a combination of repository, commit, path, steps, branch, etc. For eachRepoWorkspace
they create abatch_spec_workspace
in the database. -
batch_spec_workspace
- Eachbatch_spec_workspace
represents a unit of work for asrc batch exec
invocation inside the executor. Oncesrc batch exec
has successfully executed, thesebatch_spec_workspaces
will contain references tochangeset_specs
and those in turn will be updated to point to thebatch_spec
that kicked all of this off. -
batch_spec_workspace_execution_jobs
- these are the worker jobs that get picked up the executor and lead tosrc batch exec
being called. Eachbatch_spec_workspace_execution_job
points to onebatch_spec_workspace
. This extra table lets us separate the workspace data from the execution ofsrc batch exec
. Separation of these two tables is the result of us running into tricky concurrency problems where workers were modifying table rows that the GraphQL layer was reading (or even modifying).
Here's a rough drawing showing the relationships:
Motivation
Why change the data model? Why not keep batch_spec_executions
?
When planning for #24421 (closed) we realised that we should tackle concurrent executions of workspaces sooner rather than later: it's one of the big "wow" SSBC features and we knew that in order to get concurrent execution working we'd have to have separate database entries for each workspace so they could get picked up by different executors. That would also then require us to change the API/UI because suddenly we wouldn't have a single batch_spec_execution
with log entries, but maybe sub-nodes under that that each have log entries etc.
So if we will have to change it anyway, we decided, we could make it right from the start and by doing that also "solve" the idea of persisted searches (i.e. user sees preview of workspaces in UI then clicks "run" and the search that yielded the workspaces doesn't have to be executed again, giving the user a WYSIWG experience of SSBC).
Or, in other words: we knew that a single table entry for batch_spec_executions
wouldn't cut it anymore at a certain point, so we implemented persistent repo-resolving & concurrent workspace execution in the same step.
Notes for reviewers
Note this only contains the database/worker layers and is the first of multiple PRs. See #24421 (closed) for what will still be built (e.g. cancelation). But I wanted to keep these PRs reviewable.
Yes, there is a lot of duplication between batch_spec_workspace_execution_jobs
and batch_spec_executions
(see transform.go
), that's on purpose! The goal is to remove batch_spec_executions
completely once we've switched over to this model. But until then it makes things easier to build alongside the existing stuff vs. trying to merge it and keep old & new working.