metrics: fix broken error rate when count unset
Created by: Strum355
We use a query something along the lines of sum by (op) (increase(src_<root>_errors_total[5m])) / (sum by (op) (increase(src_<root>_total[5m])) + sum by (op) (increase(src_<root>_errors_total[5m]))) * 100
to determine the error rate. This works because on success, we increment a "success" counter (src_<root>_total
), and on error, we increment an "error" counter (src_<root>_errors_total
).
This falls apart when the "success" counter has never been incremented, resulting in the metric series being "unset" (see the gaps in the "success" visualization on the right below), resulting in an apparent error rate of 0 with a non-zero error count.
This PR addresses the issue by always making sure the "success" counter is seeded with at least 0 in the error case by incrementing by 0.
Test plan
Tested locally