Skip to content

DevX: Q2B2 Improve actionability of CI

Created by: burmudar

Problem

Devs often don’t know what the pipeline is doing or why their pipeline has failed.

We (devx) often get asked to help diagnose and figure out what the actual problem is. This is because we have in-depth knowledge about the pipeline and how it works.

The pipeline needs to be less of a blackbox and surface useful information which allows one to investigate further.

Boundaries

Scope

Our primary concern is with the pipeline and its communication channels. As it stands the pipeline has 3 communication channels:

  1. Buildkite UI
  2. Slack notifications
  3. sg ci status

Each channel can be enhanced to provide more contextual information and easy on-ramps to other tools that help failure diagnosis.

Improving pipeline stability also falls into improving pipeline communication. A false positive or bug communicates the wrong thing and confuses the end user. Therefore, addressing bugs falls into the scope of this bet.

Out of scope

  • Nice to haves = eye candy
  • Speed improvements
  • Test out buildkite tracing / opentelemetry
  • Better test output can help with making a broken pipeline more actionable

Definition of Done

  • Document our various annotations and what each link on an annotation does.
  • Make everyone aware of the new on ramps to tools to help them diagnose their pipeline.
  • All pipeline failures generate a unified failure annotation

Payout:

  • Annotations provide more contextual help
  • All Job failures have relevant annotations
    • Some integration / client tests don’t produce annotations therefore we have to add it
  • Details of build failures are clearly communicated and next steps to investigate are provided.

Tracked issues

@unassigned

Completed

@burmudar: 3.00d

Completed: 3.00d

Legend

  • 👩 Customer issue
  • 🐛 Bug
  • 🧶 Technical debt
  • 🎩 Quality of life
  • 🛠️ Roadmap
  • 🕵️ Spike
  • 🔒 Security issue
  • 🙆 Stretch goal
brain storm doc