Automatic workspace discovery
Created by: mrnugget
What?
This adds automatic workspace discovery to src-cli. It allows users to
- define where projects in large repositories live and
- run the campaign specs
steps
in those project folders, turning them into workspaces.
How?
Users define workspaces like so:
workspaces:
- rootAtLocationOf: go.mod
in: github.com/sourcegraph/sourc*
That means: in every repository that starts with github.com/sourcegraph/sourc
projects have a go.mod
at its root and those folders should be used as workspaces
for the execution of campaign spec steps
.
src-cli uses Sourcegraph search under the hood to search for the locations of the rootAtLocationOf
file, which means it doesn't need to download the repository first and search the file system.
workspaces
can also contain multiple definitions, matching different repositories (but a repository cannot be matched by multiple definitions):
on:
- repositoriesMatchingQuery: repo:automation-testing$
- repository: github.com/sourcegraph/sourcegraph
workspaces:
- rootAtLocationOf: go.mod
in: github.com/sourcegraph/sourcegraph*
- rootAtLocationOf: package.json
in: "*automation-testing"
Since multiple workspaces per repository means that multiple changesets will be produced in a single repository, the changesetTemplate.branch
needs to use templating to avoid name clashes.
For that, users can access the template variable steps.path
and use helper functions to generate a unique branch name per changeset. Example:
changesetTemplate:
# [...]
branch: ${{ join_if "-" "thorsten/workspace-discovery" (replace steps.path "/" "-") }}
published: false
commit:
message: Automatic workspace discovery
(The join_if
and the replace
helpers are new. join_if
joins the given list of strings, but ignoring the blank strings. replace
is an alias for strings.ReplaceAll
)
Users can, of course, also user other ways to generate a unique branch name per changeset. With outputs
, for example:
steps:
# [...]
- run: if [[ -f "package.json" ]]; then cat package.json | jq -j .name; fi
container: jiapantw/jq-alpine:latest
outputs:
projectName:
value: ${{ step.stdout }}
changesetTemplate:
# [...]
branch: thorsten/workspace-discovery-${{ outputs.projectName }}
Or, in combination:
steps:
# [...]
- run: if [[ -f "package.json" ]]; then cat package.json | jq -j .name; fi
container: jiapantw/jq-alpine:latest
outputs:
projectName:
value: ${{ step.stdout }}
changesetTemplate:
# If we have an `outputs.projectName` we use it, otherwise we append the path
# of the workspace. If the path is emtpy (as is the case in the root folder),
# we ignore it.
branch: |
${{ if eq outputs.projectName "" }}
${{ join_if "-" "thorsten/workspace-discovery" (replace steps.path "/" "-") }}
${{ else }}
thorsten/workspace-discovery-${{ outputs.projectName }}
${{ end }}
Details & Edge Cases
- Repositories matching multiple workspace definitions produce an error.
- If a repository is yielded by
on
and not matched by anworkspaces.in:
glob, thesteps
will be executed in its root folder. - If a repository is yielded by
on
and matches aworkspace.in:
glob, but there are no workspaces in it that contain the file inrootAtLocationOf
then thesteps
won't be executed in the repository. - If a repository contains multiple workspaces where one is the subdirectory of another, then the steps will be executed in both and the parent directory also includes the sub directory. I think this is the most intuitive choice.
Dependency
This requires the addition of workspaces
to the campaign spec schema, which means it requires changes to the Sourcegraph server.
The PR is here: https://github.com/sourcegraph/sourcegraph/pull/17757
What's not included
src-cli still downloads a complete archive of every matched repository, even if the steps should only be included in subdirectories. Only downloading archives of the workspace directories is something we should implement in the near future to make support for large monorepos better.
Full campaign spec to try this at home
There you go:
name: workspace-discovery
description: Automatic workspace discovery
on:
- repositoriesMatchingQuery: repo:automation-testing$
- repository: github.com/sourcegraph/sourcegraph
workspaces:
- rootAtLocationOf: go.mod
in: github.com/sourcegraph/sourcegraph*
- rootAtLocationOf: package.json
in: "*automation-testing"
steps:
- run: "echo \"pwd: $(pwd)\nfiles: $(ls)\" >> message.txt"
container: alpine:3
- run: if [[ -f "package.json" ]]; then cat package.json | jq -j .name; fi
container: jiapantw/jq-alpine:latest
outputs:
projectName:
value: ${{ step.stdout }}
changesetTemplate:
title: Automatic workspace discovery
body: Automatic workspace discovery
# If we have an `outputs.projectName` we use it, otherwise we append the path
# of the workspace. If the path is emtpy (as is the case in the root folder),
# we ignore it.
branch: |
${{ if eq outputs.projectName "" }}
${{ join_if "-" "thorsten/workspace-discovery" (replace steps.path "/" "-") }}
${{ else }}
thorsten/workspace-discovery-${{ outputs.projectName }}
${{ end }}
published: false
commit:
message: Automatic workspace discovery