LSIF: Replicate worker within container
There are no commits yet
Push commits to the source branch or add previously merged commits to review them.
Created by: efritz
From a conversation with @creachadair and @slimsag, we decided to scale lsif-workers on two fronts:
Syntect server does something similar to the second approach (ENV WORKERS=4
) in the dockerfile to ensure that one slow process does not block the remaining resources of the container. This could also be beneficial on bursty workloads with small repositories, as the containers may otherwise be idle or under-provisioned and a handful of processes per container may make use of the excess headroom.
Unfortunately, running multiple (unaltered) workers immediately runs into an issue of port-clashing. Each worker tries to serve its own metrics on port 3187, and more than one worker cannot bind to the same port. Giving these unique and sequentially increasing ports (3187, 3188, 3189, etc) solves the problem, but makes it so our current prometheus configuration will only be able to scrape the first of n workers' metrics. We could scrape all ports, but then the worker count is not a dynamic property of the deployment (and every port must be exposed via compose or
A chat with @uwedeportivo revealed that we could get seamless scaling using Prometheus federation. This will basically let one Prometheus instance pre-aggregate the metrics that can then be scraped by a higher-level Prometheus instance.
This PR changes the lsif-server image to accept the number of workers as an environment variable, and will start up 0-1 servers, 0-n workers, and a Prometheus instance that scrapes the (dynamic number of) running processes. The Prometheus instance exposes itself so that it can itself be scraped by our "main" Prometheus instance within a compose or k8s cluster.
Push commits to the source branch or add previously merged commits to review them.