Server-side execution of src action exec

Created by: eseliger

Related work on src-cli: (that code is even worse than in here, I started learning Go on the src-cli repo )

https://github.com/sourcegraph/src-cli/pull/139

Created by: eseliger

Follow-up to your review @mrnugget:

In this branch I implemented a prototype of running actions on multiple machines.

Actions, as we know them from src-cli already, do have the problem that they, in their current implementation, don't scale too well for very large campaigns. Therefore, I went ahead writing a prototype that adds Sourcegraph into the process of generating those patches, while maintaing the previous means of creating campaigns, because it's not always beneficial to run them via Sourcegraph (debugging an action locally, running a small campaign on a local machine).

For that I introduced these entities into the system:

Action, which represents one definition of an action to run, it holds the JSON defining the steps, and the associations to saved searches and campaigns.
ActionExecution, which is one instance of a src actions exec. It belongs to an action and inherits it's current steps definition. For an action, multiple executions can be triggered (a saved search triggers, a user manually wants to run the action again to update his campaign, ...). The parent action itself can be updated to fix issues, so across executions the definition may change, which is why it's saved "redundantly".
ActionJob, which is a single run of a steps definition over a single repository. What is known of src-cli as running the steps over many repositories, an ActionJob represents a single run over a repo, so all actionJobs of an execution represent what was previously one src actions exec command.

They relate to each other roughly like this:

Action 1--N ActionExecution 1--N ActionJob

A user creates an action (currently via the UI, a src actions exec -remote flag is to be added)
In the background, the action is created and therefrom, one execution is triggered.
Upon execution creation, the scope query from the saved config is evaluated and one job for each repo is created and attached to the execution.
Whenever a src actions runner is free to accept a job, it calls pullActionJob on the API, which, if any pending jobs exist, marks the job as running and returns all information needed for executing the defined steps.
Every 5 seconds, the logs buffer from the src actions runner is streamed to the API using appendLog, saving it to the actionJob. Those logs also appear in the UI for further inspection.
When a runner finished a job successfully, it calls updateActionJob, sets the status to completed and delivers any patches, should there have been some created.
The updateActionJob handler checks if the action job was the last to finish. If so, the execution itself has finished and is ready for further processing.
The createCampaignPlanFromPatches is invoked with the set of all patches for all jobs that were part of this execution.
Optional: If the parent action has a campaign associated, it is automatically updated with the newly generated plan.

Other optionals:

If an action_job failed, it is possible to try it by resetting it to the pending state. A runner will pick it up and, since the execution completed again, a new campaign plan will be created with the updated patches, should it have passed this time.

Alternative entrypoints:

The query-runner runs over the saved searches. If one triggers, the frontend is called at createActionExecutionsForSavedSearch.
The frontend checks if any Action is associated to that saved search, if so, a new action execution gets created with the invokation_reason saved search, in the UI that will appear as "This execution ran because saved search changed it's results".
as above

(Not yet implemented):

A periodic tasks checks the actions.schedule column in the DB if any scheduled executions should be triggered.
An execution is created for those actions.
as above

Job timeouts: We currently store runner_seen_at to use that as a kind of heartbeat. A periodic check should go by and mark those that were a long time without a heartbeat from a runner as canceled.
Job cancellation: Cancellation is supported in the frontend, but the src runner has to check that it's still a running task upon receiving the result of appendLog (which is also our heartbeat)

Created by: mrnugget

To reduce scope: would it be a lot of work to remove the parts that are related to scheduled actions and saved searches?

Created by: eseliger

No, scheduled actions are actually just having a db field to store it rn but no other code around it and saved searches have 1 api endpoint and 1 change to the query-runner, both can be pulled out easily

Created by: slimsag

Please avoid using the term 'remote code execution' as it's often associated with security vulnerabilities and e.g. I gasped when I got the email notification for this :)

Was there an RFC for this that I missed or is this the RFC itself so-to-speak? Where is the best place to leave feedback/questions/concerns?

Created by: eseliger

hey there this was a weekend project just to try out how much effort it would be to implement such a feature, so we didn't yet come up with an RFC unfortunately. I just keep this branch up to date from time to time when we do larger code changes to upstream, but we aim to get a gql schema in for 3.15 and the actual implementation for it for 3.16-ish (depends on workload coming up this iteration). Any feedback is super welcome and I think the best place for it would be on this PR directly at this point, until we decide the moment has come to complete this and I can sketch out the idea in an RFC. :)

Created by: slimsag

side note: query-runner is actually something I would like to remove entirely and put into the frontend as a singleton service running inside of there instead. Saved searches (which is what query-runner provides) is actually 100% managed by the frontend so query-runner is basically a dumb while loop with a sleep statement that performs all actions via the frontend's internal HTTP API: getting the defined saved searches, running them, sending email or slack notifications, everything is done through the frontend. So it's kinda crazy to have it in a separate query-runner service.

Given that, you should consider from first-principles if it makes sense to have campaign running logic scheduled in a separate service as if query-runner did not exist or if it should also be done as a singleton service or via DB-based scheduling within the frontend monolith. I suspect probably the latter makes more sense given this information.

Created by: mrnugget

I suspect a lot of things will change once we go from this PR to RFC to actual implementation, so I'll defer on getting into details, but I wanted to point out some high-level things that might make it easier to place this PR and understand what it is and isn't:

"Remote code execution": ignore the name, think of it as "server-side execution of src action exec", which is what this is about.
I wouldn't even say it's "arbitrary container execution" since the containers/commands that are executed are provided by the customers themselves. It's "arbitrary" from our perspective, but not from theirs.
The main idea is: move src action exec from the local machine to a beefier machine and control it from the Sourcegraph UI.

Think: buildkite-agent but for src-action-exec.

Right now src action exec simply shells out to either execute a docker run or to run a shell command (provided by the user!). The idea is to keep the same execution model, but move it to a different machine that connects to Sourcegraph and receives "jobs" from it.

The big question of what docker is being called is orthogonal to the changes in this PR, I think.

Our idea was that customers can just use any machine they want and launch src action-agent (or something like that). If they do that in a DIND environment or a separate machine doesn't matter, as long as src action-agent can (1) receive jobs from Sourcegraph (2) shell out and call docker.

Created by: slimsag

"Remote code execution": ignore the name, think of it as "server-side execution of src action exec", which is what this is about.

I understand I myself can ignore the name and that this is just about server-side execution of campaigns. My point here is that we should avoid putting 'remote code execution' into anything (pull requests, issues, docs) unless it's a security vulnerability. As an example, I do not want someone searching for "Sourcegraph remote code" to turn up results for "Spike on remote code execution" in Google because that looks scary. It's not a huge deal, but good to avoid in general is my point. Any objection to renaming the PR "server-side execution of src action exec" ?

Created by: eseliger

absolutely not!

Created by: codecov[bot]

Codecov Report

Merging #8574 into master will decrease coverage by 1.36%. The diff coverage is 20.44%.

@@            Coverage Diff             @@
##           master    #8574      +/-   ##
==========================================
- Coverage   44.26%   42.90%   -1.37%     
==========================================
  Files        1411     1399      -12     
  Lines       77383    77679     +296     
  Branches     6959     6794     -165     
==========================================
- Hits        34257    33331     -926     
- Misses      39907    41320    +1413     
+ Partials     3219     3028     -191

Flag	Coverage Δ
#go	`?`
#typescript	`?`
#unit	`42.90% <20.44%> (-1.37%)`

Impacted Files	Coverage Δ
cmd/frontend/graphqlbackend/campaigns.go	`0.00% <0.00%> (ø)`
cmd/query-runner/graphql.go	`0.00% <0.00%> (ø)`
cmd/query-runner/main.go	`0.00% <0.00%> (ø)`
...internal/campaigns/resolvers/action_definitions.go	`0.00% <0.00%> (ø)`
.../internal/campaigns/resolvers/action_executions.go	`0.00% <0.00%> (ø)`
...rprise/internal/campaigns/resolvers/action_jobs.go	`0.00% <0.00%> (ø)`
enterprise/internal/campaigns/resolvers/actions.go	`0.00% <0.00%> (ø)`
enterprise/internal/campaigns/resolvers/agents.go	`0.00% <0.00%> (ø)`
enterprise/internal/campaigns/workers.go	`49.14% <ø> (-0.28%)`
internal/campaigns/types.go	`9.80% <0.00%> (-1.97%)`
... and 418 more

Created by: chrispine

Closing because this is out of date at this point. Once we look at server-side execution again, we can use this as a reference.

Server-side execution of src action exec

Closed by (Jul 24, 2025 11:48am UTC)

Activity

In this branch I implemented a prototype of running actions on multiple machines.

Codecov Report

Server-side execution of src action exec

Merge request reports

Closed by (Jul 24, 2025 11:48am UTC)

Activity

In this branch I implemented a prototype of running actions on multiple machines.

Codecov Report