monitoring: update vision, revamp pillars
Created by: bobheadxi
Given the recent (and very welcome!) drive to improve how monitoring works for engineers at Sourcegraph, I thought I might put up a PR to start shaping those discussions into formal guidelines that we can reference when driving changes to the monitoring generator!
Changes
The overall goal of this PR is to relax our guidelines to enable more flexibility to make changes to the tooling without introducing "incompatibilities" with the pillars, as well as to allow everyone more freedom to make alternative decisions based on their best judgement to a greater extent.
- Add a new long-term vision item: serving Sourcegraph engineers!
- "Five pillars" to just "pillars": we should be able to add or remove guidelines as Sourcegraph changes.
- In general, build our pillars on more relaxed, positive messaging ("Should...") instead of negative ("...is forbidden") - this hopefully makes the reasoning behind tooling restrictions more palpable, and add more flexibility.
- Each pillar can now include an "Exceptions" section.
- Some pillars have been changed or removed.
Would appreciate if each team who currently has alerts configured could take a look!
Follow-up implementation tasks
From here, we can create issues to implement improvements to the generator to align it with these new pillars. This work will be tracked in the monitoring redux project, which the Distribution team is hoping to prioritize soon.
- Improve generator documentation => https://github.com/sourcegraph/sourcegraph/issues/15787
-
https://github.com/sourcegraph/about/pull/2000#discussion_r523886716 Implement better enforcement of documentation for graphs without alerts => https://github.com/sourcegraph/sourcegraph/issues/15872
- We need a new generated page to document graph descriptions (since graphs without alerts won't show up in the alerts solutions documentation)
- We should include this information (at least link to it) in the Grafana panel description
- https://github.com/sourcegraph/about/pull/2000#discussion_r524803355 Improved workflow for iterating on dashboards => https://github.com/sourcegraph/sourcegraph/issues/15874, https://github.com/sourcegraph/sourcegraph/issues/15873, https://github.com/sourcegraph/sourcegraph/issues/15875