monitoring: graphql dashboard
Created by: uwedeportivo
Defines a dahsboard with three panels for the graphql metric src_graphql_field_seconds
.
I had to remove the labels from src_graphql_field_seconds
because it became too expensive and actually caused Grafana server errors displaying the graphs. I tried reducing it by aggregating with sum (which I can do for the counter of the histogram). It worked for rate of requests and error rate but durations cannot be aggregated this way. I scrubbed the code to see if we use these metrics for anything with these labels. Seems like we're ok.
src_graphql_field_seconds
is a histogram (which is already expensive). The two removed labels were "field" and "type" with way more than just a couple of possible values. The product rule makes this metric have possibly hundreds of vectors (field x type x bucket).
This helps with prometheus performance and memory too.