Skip to content

executors: Get rid of ignite runlock

Warren Gifford requested to merge es/no-runlock into main

Created by: eseliger

The original reason doesn't hold anymore, we don't pull images at VM start anymore. There was another potential race condition in ignite that could've lead to device busy errors so it had an internal run lock as well, I was able to defeat that here: https://github.com/sourcegraph/ignite/pull/1 and it works reliably across large workloads on k8s.

Why is this exciting:

This makes large instances finally usable for executors! The more VMs a system can handle, the longer the overlap of ignite VM startup would be and hence big instances like AWS's metal ones which we usually use with 36x concurrency didn't really have any more throughput than a 4x instance, because 90% of time was spent in that concurrency gate. This unleashes the full power of these machines. On a 4x parallelism, 8 replica executors cluster against k8s, this reduced the run time of our example spec against 1000 workspaces from 20 minutes to 10:30 minutes. Bigger machines will benefit even more from this. In addition, the lock meant that larger workspaces (big repos) would let other repos starve for even longer. Now there is no direct impact between runtime and repo size anymore.

Test plan

Tested thoroughly on k8s.

Merge request reports

Loading