Skip to content

job-runner MVP (heavily WIP still)

Administrator requested to merge sg/job-runner into main

Created by: slimsag

The goal here is to implement an MVP of the job runner proposed in RFC 152 that could be agnostic/independent of campaigns while serving campaigns as its first use case. The actual scope of this in practice is still being defined, but I view it as "the supporting infrastructure for running campaign jobs" while campaigns would itself provide the actual web UI for viewing things, API and UI for creating campaigns, etc. It is not intended to conflict with https://github.com/sourcegraph/sourcegraph/pull/8574 but rather to be the "backend" for that change to run in customer environments and on sourcegraph.com.

Eventually, I see this type of approach and https://github.com/sourcegraph/sourcegraph/pull/8574 converging into a single thing where campaigns behind the scenes uses this approach and exposes logs, all interaction with it, etc. through its own interface (so no change in the way users interact with it, this would be an implementation detail of how campaigns runs server-side and other parts of Sourcegraph could later reuse this same infrastructure/logic).

Notable mentions:

  • The Docker image is automatically built and ran as part of dev/start.sh
  • Changes to cmd/job-runner/*.go automatically rebuild the Docker image and re-run it, so you get live code reloading just like our other services.
  • Prometheus scrapes it for metrics in dev.
  • Health check handles rogue jobs
  • It exposes Prometheus metrics we can monitor (what would these be? at least CPU/memory/IO I guess) and a Grafana dashboard exists for it
  • Basic protocol (cmd/job-runner/protocol) idea is defined
  • Protocol is implemented
  • There is a format for writing the commands we run defined (jsonc? yaml? bash script? a combination?) that fits in well with campaign actions (since it would need to use it)
  • There is a good way to send build context to the job runner, i.e. repository contents
  • There is a src command which provides a nice local-dev experience (100% local execution using the same job-runner service / docker image)
  • You can actually send it a job and get the results back, e.g. via curl or a src command.
  • Campaigns makes use of it.
  • Docker Compose out-of-the-box deployment:
    • Prometheus scraping
    • As isolated network as possible, while still allowing outbound downloads
  • Kubernetes out-of-the-box deployment:
    • Prometheus scraping
    • As isolated network as possible, while still allowing outbound downloads
  • Optional secure deployment:
    • With out-of-the-box deployment on sourcegraph.com, only us (site admins) can run jobs we trust / have vetted.
    • With out-of-the-box deployment on Docker Compose / Kubernetes, only site admins can run jobs by default but an option allows all users to be trusted to submit jobs (they are trusted to not be malicious).
    • With optional secure deployment, we would have proper sandboxing (effectively the same codepath, just more infrastructure wrapping it) which allows arbitrary users to run arbitrary jobs. Both for sourcegraph.com and customers who want better security guarantees and are willing to pay the cost of setup.

Merge request reports

Loading