LSIF: Replicate worker within container (!8951) · Merge requests · Administrator / sourcegraph

Administrator requested to merge lsif-multi-worker-container into master Mar 11, 2020

Created by: efritz

From a conversation with @creachadair and @slimsag, we decided to scale lsif-workers on two fronts:

horizontally via replica counts in k8s deployments (easy and the correct way)
horizontally via a combination of multiple compose services and intra-container parallelization in docker/docker-compose deployments (harder for us, but easier for companies using that environment - it's a big pain point to have to scale to double-digits manually)

Syntect server does something similar to the second approach (ENV WORKERS=4) in the dockerfile to ensure that one slow process does not block the remaining resources of the container. This could also be beneficial on bursty workloads with small repositories, as the containers may otherwise be idle or under-provisioned and a handful of processes per container may make use of the excess headroom.

Unfortunately, running multiple (unaltered) workers immediately runs into an issue of port-clashing. Each worker tries to serve its own metrics on port 3187, and more than one worker cannot bind to the same port. Giving these unique and sequentially increasing ports (3187, 3188, 3189, etc) solves the problem, but makes it so our current prometheus configuration will only be able to scrape the first of n workers' metrics. We could scrape all ports, but then the worker count is not a dynamic property of the deployment (and every port must be exposed via compose or

A chat with @uwedeportivo revealed that we could get seamless scaling using Prometheus federation. This will basically let one Prometheus instance pre-aggregate the metrics that can then be scraped by a higher-level Prometheus instance.

This PR changes the lsif-server image to accept the number of workers as an environment variable, and will start up 0-1 servers, 0-n workers, and a Prometheus instance that scrapes the (dynamic number of) running processes. The Prometheus instance exposes itself so that it can itself be scraped by our "main" Prometheus instance within a compose or k8s cluster.

LSIF: Replicate worker within container

Merge request reports