LSIF: Replicate worker within container

Review changes
Download
Patches
Plain diff

Merged LSIF: Replicate worker within container

lsif-multi-worker-container into master

Overview 4
Commits 19
Pipelines 0
Changes 9

Merged LSIF: Replicate worker within container

Administratorrequested to merge

lsif-multi-worker-container into master Mar 11, 2020

Overview 4
Commits 19
Pipelines 0
Changes 9

Created by: efritz

From a conversation with @creachadair and @slimsag, we decided to scale lsif-workers on two fronts:

horizontally via replica counts in k8s deployments (easy and the correct way)
horizontally via a combination of multiple compose services and intra-container parallelization in docker/docker-compose deployments (harder for us, but easier for companies using that environment - it's a big pain point to have to scale to double-digits manually)

Syntect server does something similar to the second approach (ENV WORKERS=4) in the dockerfile to ensure that one slow process does not block the remaining resources of the container. This could also be beneficial on bursty workloads with small repositories, as the containers may otherwise be idle or under-provisioned and a handful of processes per container may make use of the excess headroom.

Unfortunately, running multiple (unaltered) workers immediately runs into an issue of port-clashing. Each worker tries to serve its own metrics on port 3187, and more than one worker cannot bind to the same port. Giving these unique and sequentially increasing ports (3187, 3188, 3189, etc) solves the problem, but makes it so our current prometheus configuration will only be able to scrape the first of n workers' metrics. We could scrape all ports, but then the worker count is not a dynamic property of the deployment (and every port must be exposed via compose or

A chat with @uwedeportivo revealed that we could get seamless scaling using Prometheus federation. This will basically let one Prometheus instance pre-aggregate the metrics that can then be scraped by a higher-level Prometheus instance.

This PR changes the lsif-server image to accept the number of workers as an environment variable, and will start up 0-1 servers, 0-n workers, and a Prometheus instance that scrapes the (dynamic number of) running processes. The Prometheus instance exposes itself so that it can itself be scraped by our "main" Prometheus instance within a compose or k8s cluster.

Merge request reports

Activity

Filter activity

Approvals
Assignees & reviewers
Comments (from bots)
Comments (from users)
Commits & branches
Edits
Labels
Lock status
Mentions
Merge request status
Tracking

Please register or sign in to reply

0 Assignees

0 Reviewers

Request review from

Labels

None

Select labels

Manage project labels

Milestone

None

Time tracking

Participants

There are no commits yet

Push commits to the source branch or add previously merged commits to review them.