In making a build system for your software, you codified the dependencies between its parts. But, did you account for implicit software dependencies, like system libraries and compiler toolchains?
Implicit dependencies give rise to the biggest and most common problem with software builds - the lack of hermiticity. Without hermetic builds, reproducibility and cacheability are lost.
This post motivates the desire for reproducibility and cacheability, and explains how we achieve hermetic, reproducible, highly cacheable builds by taking control of implicit dependencies.
Consider a developer newly approaching a code repository. After cloning the repo, the developer must install a long list of “build requirements” and plod through multiple steps of “setup”, only to find that, yes indeed, the build fails. Yet, it worked just fine for their colleague! The developer, typically not expert in build tooling, must debug the mysterious failure not of their making. This is bad for morale and for productivity.
This happens because the build is not reproducible.
One very common reason for the failure is that the compiler toolchain on the developer’s system is different from that of the colleague. This happens even with build systems that use sophisticated build software, like Bazel. Bazel implicitly uses whatever system libraries and compilers are currently installed in the developer’s environment.
A common workaround is to provide developers with a Docker image equipped with a certain compiler toolchain and system libraries, and then to mandate that the Bazel build occurs in that context.
That solution has a number of drawbacks. First, if the developer is using macOS, the virtualized build context runs substantially slower. Second, the Bazel build cache, developer secrets, and the source code remain outside of the image and this adds complexity to the Docker invocation. Third, the Docker image must be rebuilt and redistributed as dependencies change and that’s extra maintenance. Fourth, and this is the biggest issue, Docker image builds are themselves not reproducible - they nearly always rely on some external state that does not remain constant across build invocations, and that means the build can fail for reasons unrelated to the developer’s code.
A better solution is to use Nix to supply the compiler toolchain and system library dependencies. Nix is a software package management system somewhat like Debian’s APT or macOS’s Homebrew. Nix goes much farther to help developers control their environments. It is unsurpassed when it comes to reproducible builds of software packages.
Nix facilitates use of the Nixpkgs package set. That set is the largest single set of software packages. It is also the freshest package set. It provides build instructions that work both on Linux and macOS. Developers can easily pin any software package at an exact version.
Learn more about using Nix with Bazel, here.
Not only should builds be reproducible, but they should also be fast. Fast builds are achieved by caching intermediate build results. Cache entries are keyed based on the precise dependencies as well as the build instructions that produce the entries. Builds will only benefit from a (shared, distributed) cache when they have matching dependencies. Otherwise, cache keys (which depend on the precise dependencies) will be different, and there will be cache misses. This means that the developer will have to rebuild targets locally. These unnecessary local rebuilds slow development.
The solution is to make the implicit dependencies into explicit ones, again using Nix, making sure to configure and use a shared Nix cache.
Learn more about configuring a shared Bazel cache, here.
It is important to eliminate implicit dependencies in your build system in order to retain build reproducibility and cacheability. Identify Nix packages that can replace the implicit dependencies of your Bazel build and use rules_nixpkgs to declare them as explicit dependencies. That will yield a fast, correct, hermetic build.