The report-a-bug page should include a list of alerts that fired in the last 7 days, queried from existing Prometheus data
Created by: slimsag
https://k8s.sgdev.org/site-admin/report-bug
This page should include a JSON entry which indicates over the last 7 days which alerts fired in Grafana and when, to let us easily see which alerts may be firing intermittently or constantly on a customer instance:
"alerts": [
{"timestamp": "...", "service_name": "gitserver", "name": "low_disk_space", value: "0"},
{"timestamp": "...", "service_name": "gitserver", "name": "low_disk_space", value: "1"},
]
I believe this data could be easily acquired from here and just using Go to hit the Prometheus admin API.
I would want to know:
- That their instance has the alerts defined (i.e. when the value is zero, don't exclude it)
- When the alert count changed, if at all. Do not include repeated information (e.g. I want to be able to read the JSON in an editor and make sense of it)
This is important because it gives us a way to get this information from customers without going through a "screenshot the home dashboard and then if any alerts fired I'll ask you again to screenshot another page to tell me what alert that actually was" -- and because these alerts are going to be more and more important going forward.
This is easier to add than the packaging of a full metrics dump, and easier to add than broadcasting this information up to sourcegraph.com 24/7 AND we would need this anyway for more privacy-conscious customers who disable pings and would refuse to send us a full metrics dump.