Skip to content

monitoring for CPU and memory usage in Kubernetes

Created by: slimsag

Today, we have container CPU and memory monitoring on each service dashboard e.g. at the bottom of https://k8s.sgdev.org/-/debug/grafana/d/frontend/frontend?orgId=1

image

As it says, however, this is not available in Kubernetes deployments yet. In docker-compose deployments we use cadvisor to export these metrics for Prometheus to scrape:

https://github.com/sourcegraph/deploy-sourcegraph-docker/blob/master/docker-compose/docker-compose.yaml#L485-L504

Historically, I believe we had node_exporter which exposed some CPU and memory container metrics but not as much as cadvisor did itself (internally, I understand Kubernetes runs cadvisor but I think perhaps it can be an outdated version or something?)

From an initial glance, I think we should add cadvisor to https://github.com/sourcegraph/deploy-sourcegraph by doing something like what is done in the official cadvisor kubernetes deployment here: https://github.com/google/cadvisor/tree/master/deploy/kubernetes

This will ensure we have the closest / most similar Prometheus metrics as in our docker-compose deployments, so the existing queries on the Grafana dashboards should likely just work (or, if not, it should be easy to tweak them in a way that works in both environments).

One challenge, however, is that we want to make sure:

  1. It is in the base deployment, for all stock / out-of-the-box Kubernetes deployments: https://github.com/sourcegraph/deploy-sourcegraph/tree/master/base
  2. We edit the non-privileged overlay here so that it disables / removes the cadvisor deployment we add in base/, since it will require privileged access to the Kubernetes cluster.