Skip to content

Automatic workspace discovery

Warren Gifford requested to merge mrn/workspace-discovery into main

Created by: mrnugget

What?

This adds automatic workspace discovery to src-cli. It allows users to

  1. define where projects in large repositories live and
  2. run the campaign specs steps in those project folders, turning them into workspaces.

How?

Users define workspaces like so:

workspaces:
  - rootAtLocationOf: go.mod
    in: github.com/sourcegraph/sourc*

That means: in every repository that starts with github.com/sourcegraph/sourc projects have a go.mod at its root and those folders should be used as workspaces for the execution of campaign spec steps.

src-cli uses Sourcegraph search under the hood to search for the locations of the rootAtLocationOf file, which means it doesn't need to download the repository first and search the file system.

workspaces can also contain multiple definitions, matching different repositories (but a repository cannot be matched by multiple definitions):

on:
  - repositoriesMatchingQuery: repo:automation-testing$
  - repository: github.com/sourcegraph/sourcegraph

workspaces:
  - rootAtLocationOf: go.mod
    in: github.com/sourcegraph/sourcegraph*
  - rootAtLocationOf: package.json
    in: "*automation-testing"

Since multiple workspaces per repository means that multiple changesets will be produced in a single repository, the changesetTemplate.branch needs to use templating to avoid name clashes.

For that, users can access the template variable steps.path and use helper functions to generate a unique branch name per changeset. Example:

changesetTemplate:
  # [...]
  branch: ${{ join_if "-" "thorsten/workspace-discovery" (replace steps.path "/" "-") }}
  published: false
  commit:
    message: Automatic workspace discovery

(The join_if and the replace helpers are new. join_if joins the given list of strings, but ignoring the blank strings. replace is an alias for strings.ReplaceAll)

Users can, of course, also user other ways to generate a unique branch name per changeset. With outputs, for example:

steps:
  # [...]
  - run:  if [[ -f "package.json" ]]; then cat package.json | jq -j .name; fi
    container: jiapantw/jq-alpine:latest
    outputs:
      projectName:
        value: ${{ step.stdout }}

changesetTemplate:
  # [...]
  branch: thorsten/workspace-discovery-${{ outputs.projectName }}

Or, in combination:

steps:
  # [...]
  - run:  if [[ -f "package.json" ]]; then cat package.json | jq -j .name; fi
    container: jiapantw/jq-alpine:latest
    outputs:
      projectName:
        value: ${{ step.stdout }}

changesetTemplate:
  # If we have an `outputs.projectName` we use it, otherwise we append the path
  # of the workspace. If the path is emtpy (as is the case in the root folder),
  # we ignore it.
  branch: |
    ${{ if eq outputs.projectName "" }}
    ${{ join_if "-" "thorsten/workspace-discovery" (replace steps.path "/" "-") }}
    ${{ else }}
    thorsten/workspace-discovery-${{ outputs.projectName }}
    ${{ end }}

Details & Edge Cases

  • Repositories matching multiple workspace definitions produce an error.
  • If a repository is yielded by on and not matched by an workspaces.in: glob, the steps will be executed in its root folder.
  • If a repository is yielded by on and matches a workspace.in: glob, but there are no workspaces in it that contain the file in rootAtLocationOf then the steps won't be executed in the repository.
  • If a repository contains multiple workspaces where one is the subdirectory of another, then the steps will be executed in both and the parent directory also includes the sub directory. I think this is the most intuitive choice.

Dependency

This requires the addition of workspaces to the campaign spec schema, which means it requires changes to the Sourcegraph server.

The PR is here: https://github.com/sourcegraph/sourcegraph/pull/17757

What's not included

src-cli still downloads a complete archive of every matched repository, even if the steps should only be included in subdirectories. Only downloading archives of the workspace directories is something we should implement in the near future to make support for large monorepos better.

Full campaign spec to try this at home

There you go:

name: workspace-discovery
description: Automatic workspace discovery

on:
  - repositoriesMatchingQuery: repo:automation-testing$
  - repository: github.com/sourcegraph/sourcegraph

workspaces:
  - rootAtLocationOf: go.mod
    in: github.com/sourcegraph/sourcegraph*
  - rootAtLocationOf: package.json
    in: "*automation-testing"

steps:
  - run: "echo \"pwd: $(pwd)\nfiles: $(ls)\" >> message.txt"
    container: alpine:3
  - run:  if [[ -f "package.json" ]]; then cat package.json | jq -j .name; fi
    container: jiapantw/jq-alpine:latest
    outputs:
      projectName:
        value: ${{ step.stdout }}

changesetTemplate:
  title: Automatic workspace discovery
  body:  Automatic workspace discovery

  # If we have an `outputs.projectName` we use it, otherwise we append the path
  # of the workspace. If the path is emtpy (as is the case in the root folder),
  # we ignore it.
  branch: |
    ${{ if eq outputs.projectName "" }}
    ${{ join_if "-" "thorsten/workspace-discovery" (replace steps.path "/" "-") }}
    ${{ else }}
    thorsten/workspace-discovery-${{ outputs.projectName }}
    ${{ end }}
  published: false
  commit:
    message: Automatic workspace discovery

Merge request reports

Loading