use multi-stage dockerfiles
Created by: ggilmore
Implements RFC 37: Use multi-stage Docker builds for Sourcegraph images
Overview
Before
- Generate assets in
buildkite-agent
's external environment, place them someOUTPUT
folder - Run
docker build ... $OUTPUT
, which passes the folder containing the assets as the Docker buildcontext
- Downloads any extra dependencies and copies over the assets into the final image
After
- Run
docker build ... $(pwd)
, which passes a plain checkout of https://github.com/sourcegraph/sourcegraph as the build context
build.sh
scripts
All the cmd/**/build.sh
scripts have refactored to only do the raw steps of building the binary. The actual "docker build" call has been moved to cmd/**/docker.sh
scripts. This provides a cleaner separation of concerns, and makes it easier to re-use the build.sh
scripts in different contexts.
Merge request reports
Activity
Created by: ggilmore
@keegancsmith Note that this is still in the draft phase, but:
-
https://github.com/sourcegraph/sourcegraph/blob/multi-stage-build/docker-images/builder/Dockerfile contains the definition of the
sourcegraph/builder
Docker image. A lot of this is taken from https://github.com/sourcegraph/infrastructure/blob/master/docker-images/buildkite-agent/Dockerfile, but it mostly contains the basic packages that we need to build most of our docker images (Go, Node, etc.) -
https://github.com/sourcegraph/sourcegraph/tree/multi-stage-build/enterprise/cmd/server and https://github.com/sourcegraph/sourcegraph/tree/multi-stage-build/cmd/server have the most complex stuff going on, but in general:
- The actual call to
docker build
has been moved into scripts nameddocker.sh
, andbuild.sh
/pre-build.sh
only contains the actual steps to build the binaries - The flow in most dockerfiles is 1) In a stage called
builder
, usesourcegraph/builder
and copy over thesourcegraph/sourcegraph
checkout and run the binaries build scripts 2) In a separate stage, copy over the relevant binaries created in thebuilder
stage to/usr/local/bin
- The enterprise docker images are tricky because they override a lot of the behavior in the basic
cmd/
Docker images. I used environment variables / docker build arguments as a mechanism to propagate the customizations that those enterprise builds need to make
- The actual call to
-
Created by: ggilmore
@keegancsmith @tsenart If y'all have any bandwidth/more familiarity with Google container builder, I'm running into a strange build error with google container builder that I'm not seeing when I run the local
docker build
instructions: https://buildkite.com/sourcegraph/sourcegraph/builds/44112#4bd8b34c-56a5-4392-a525-b5c0a3aebbdcCreated by: tsenart
@ggilmore: From the logs alone, it says
cross-env
isn't available. It seems strange for that to not happen when running docker run locally.I'd try to debug this with the local builder: https://cloud.google.com/cloud-build/docs/build-debug-locally
Created by: tsenart
@ggilmore: Just a possible lead. Is the CI env var set in Google Cloud Build? If not, this line would be affected by it: https://github.com/sourcegraph/sourcegraph/blob/master/cmd/frontend/pre-build.sh#L7
Created by: ggilmore
@tsenart
@ggilmore: Just a possible lead. Is the CI env var set in Google Cloud Build? If not, this line would be affected by it: /cmd/frontend/pre-build.sh@
master
#L7Two things:
-
The CI env var on the host machine shouldn't affect the Docker build process since there is no "CI" docker build argument and
ENV CI=...
isn't set in any scripts or Dockerfiles. -
The enterprise server build refers to the enterprise version of the frontend pre-build script which doesn't refer to a "CI" env var: https://github.com/sourcegraph/sourcegraph/blob/master/enterprise/cmd/server/pre-build.sh
-
Created by: ggilmore
@tsenart
I think I figure out why it's broken at least: https://buildkite.com/sourcegraph/sourcegraph/builds/44214#677c3ecd-5510-4db3-ba57-3c13927109af
[2019-10-03T22:02:12Z] INFO: Ignoring [shared/node_modules/.bin] which is a symlink to non-existent path
shared/node_modules/.bin links to (root)/node_modules/.bin which doesn't exist in a fresh checkout. I'm not sure how to solve this yet.
Created by: ggilmore
I've worked around it by adding a prior build step that
curl
s in a tar.gz. of the commit to the workspace before running docker build. !5745 (closed)I think it's pretty ugly, but it works for now.
The builds still seem to be pretty slow with an 8 core machine. https://buildkite.com/sourcegraph/sourcegraph/builds/44218#1ee8a5c4-b3c0-42a6-b7de-3a0ea99a4f0b
I tried using kaniko to see if could speed things up, but it seems like it made things worse: https://buildkite.com/sourcegraph/sourcegraph/builds/44232
Created by: ggilmore
@tsenart The overall build step failed, but I'm only paying attention to whether or not the
docker build
job actually succeeded and how long it took for now.At the moment, we're using very little of the cache since we're copying in the contents of sourcegraph/sourcegraph on every build. Since the file contents are different for each commit, the Docker cache would be invalidated so it needs to do everything from scratch.
Created by: ggilmore
Performance notes (ignore the build status - the docker build succeeded in both builds):
-
Our internal DIND builder (buildkite build step)
- takes 10-13 minutes from a stale docker cache
- I enabled buildkit so that we could build different stages in parallel.
- there is a lot of noise since all docker operations in our CI pipeline are still using this single node
-
Google Cloud Build (buildkite built step)
- an 8 core machine builds sourcegraph/server in ~16.5 minutes? I tried enabling buildkit to speed things up, but I think the version of Docker those machines are using is old: buildkit errors when it trying to pull docker.io images that it doesn't have in its cache (e.g. sourcegrap/builder or sourcgraph/alpine).
- Even though we aren't using buildkit, there isn't a good reason why Google Cloud build should be ~3-4 minutes slower than DIND. Maybe they're using an older CPU series?
-
Running https://github.com/sourcegraph/sourcegraph/blob/multi-stage-build/enterprise/cmd/frontend/pre-build.sh seems to take ~9 minutes using multi-stage builds versus ~5 minutes if we build everything outside the image (the status quo). Running webpack, in particular, seems to be 1 min faster without using multi-stage builds.
-
Created by: beyang
Have we investigated using knative (k8s functions-as-a-service) for builds at all? https://starkandwayne.com/blog/build-docker-images-inside-kubernetes-with-knative-build/
Created by: codecov[bot]
Codecov Report
Merging #5745 into master will decrease coverage by
0.18%
. The diff coverage is7.69%
.@@ Coverage Diff @@ ## master #5745 +/- ## ========================================== - Coverage 39.53% 39.34% -0.19% ========================================== Files 1192 1192 Lines 61568 61337 -231 Branches 5850 5850 ========================================== - Hits 24341 24136 -205 + Misses 35021 34996 -25 + Partials 2206 2205 -1
Impacted Files Coverage Δ cmd/symbols/internal/symbols/search.go 68.78% <7.69%> (-0.68%)
cmd/searcher/search/search.go 65.58% <0%> (-8.84%)
cmd/searcher/search/search_regex.go 89.68% <0%> (-3.61%)
internal/search/backend/horizontal.go 88.7% <0%> (-1.62%)
cmd/searcher/search/search_structural.go 0% <0%> (ø)
Created by: ggilmore
Yes, although our new dind structure might make this viable again...
On Sat, Apr 4, 2020 at 10:42 PM Keegan Carruthers-Smith < notifications@github.com> wrote:
@ggilmore https://github.com/ggilmore I assume this work is now stale/can be closed out?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub !5745 (closed), or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE2UOYV4EWW6J5CQNP5WYLRLAK2VANCNFSM4I26EMUA .