No new product created in Haskell ever starts from scratch. Hackage hosts millions of lines of third-party code, neatly and independently redistributable as Cabal packages. Now, Bazel has native support for building Cabal packages since the 0.10 release of rules_haskell.
Cabal packages themselves seldom start from scratch. That’s why packages typically have dozens of dependencies. Resolving version bounds declared for all dependencies in the package metadata to a set of concrete versions, and then downloading these dependencies, is a painstaking task if done manually. Bazel can now use Stack to do this all automatically—the user only needs to provide the name of a Stackage snapshot and the names of the packages they want to reuse for their project.
Users frequently ask which build tool to use for their next project. It turns out that “all of them at once” is a compelling answer (including Nix, though we covered that previously and won’t be rehearsing that in this post).
A Bazel primer
Bazel is a build tool originally created by Google. The key attributes of Bazel are:
- Bazel is a polyglot build tool, supporting many different programming languages. This enables Bazel to be fast, because it can have a global and fine-grained view of the dependency graph, even across language boundaries.
- Bazel tries hard to guarantee build correctness. This means that after making a few localized changes to your source code, you don’t need to start your build from scratch to be confident that others get the same result. Incremental builds are guaranteed to yield the same result as full builds (under mild conditions we won’t discuss here). This also means that it’s safe to distribute builds to a large cluster of remote machines to make it finish fast. You still get the same result.
- Bazel is extensible. You can teach Bazel to build code in new programming languages that it didn’t know about out-of-the-box. Doing so requires getting familiar with a simple Python-like language called Starlark. Unlike Make or Shake rules, mechanisms and conventions exist to easily reuse Bazel rules across projects, leading to the emergence of an entire ecosystem of rules that build on each other.
Internally, Google uses a variant of Bazel to build most of their billions of lines of source code, thousands of times a day. If your project has lots of components in a variety of different languages and you don’t want the hassle of lots of build systems too, or if you simply want your builds to remain fast no matter how big your project grows, you should probably be using Bazel (or Buck, Facebook’s equivalent).
The tool expects two types of files in your project:
- One or more
BUILDfile declares a set of targets. Each target is an instance of a build rule, like
haskell_libraryfor any reusable component in your project, or
haskell_binaryfor the executables, or miscellaneous other build rules (like API documentation). See the tutorial for a longer introduction.
WORKSPACEfile that allows you to invoke macros that perform some autodiscovery and automatically generate
BUILDfiles from, say, third-party package metadata.
Here’s how our solution to build third-party code works:
- We define two new build rules:
haskell_cabal_binary. These are like
haskell_binary, respectively, except that Cabal is used to build the targets, rather than calling GHC (the Glasgow Haskell Compiler) directly.
- A macro called
BUILDfile that declares a target for each Cabal package in the given snapshot that we’ll be using in our project, directly or indirectly.
Building a Cabal package
Let’s say you have an existing Cabal library in your project.
Perhaps you would like it to be a Cabal library so that you
can publish it on Hackage. To expose it to downstream Haskell code
that uses Bazel as the build tool, you can
write the following rule in a
haskell_cabal_library( name = "mylib", version = "0.1", srcs = ["mylib.cabal", "Lib.hs"], )
A binary could now depend on this Cabal library as well as on
(which ships with the GHC toolchain):
haskell_toolchain_library(name = "base") haskell_binary( name = "myexe", srcs = ["Main.hs"], deps = [":base", ":mylib"], )
In the above, we have three targets, each designated with a “label”:
:myexe. The label is derived from the
attribute that is mandatory for each
target. rules_haskell is a set of build rules for
Bazel. The build rules tell Bazel that it needs to call Cabal to build
haskell_cabal_library target. Performing
this action produces several outputs, including on most platforms
a static library and a shared library (called
libHSmylib-0.1.so, respectively). You don’t need to remember the
names of any of the outputs, since you can simply pass a target as
a dependency to another, using the target’s label. The build rules
tell Bazel which outputs from each one of a target’s dependencies it needs to build
the target. In this case, we are building
a binary with
:mylib statically linked (the default), so the
libHSmylib-0.1.a output from that target is needed to build the
Building a Stackage snapshot
The ability to build libraries with or without Cabal given a short target definition is great. But in practice, even these short target definitions get tiring to write, for two reasons:
- We don’t want to have to write out the version numbers of each Cabal package explicitly. The great thing about Stackage snapshots is that a single snapshot name determines the version number for all packages. If only we could tell Bazel which snapshot we want to use, explicit version numbers for each package would no longer be necessary.
- Cabal libraries on Hackage typically have many dependencies, which in turn have dependencies of their own. The full dependency graph can get very large, in the order of hundreds of nodes. Writing it out in full in the form of target definitions like above would be tiresome indeed.
Stack already knows how to resolve a snapshot name to
a specific set of package versions. Stack also already knows where to
find these packages, on Hackage or any of its mirrors. Finally, Stack
already knows what the dependency graph looks like. So the
solution is to just call Stack. We added a workspace macro called
stack_snapshot. An example:
stack_snapshot( name = "stackage", packages = ["conduit", "lens", "zlib-0.6.2"], snapshot = "lts-14.7", )
The above generates a
BUILD file behind the scenes with one
haskell_cabal_library per package listed in the
and any transitive dependencies thereof. The result is essentially the
stack dot, which outputs a dependency graph, munged into a form
cromulent for Bazel. This means that Bazel sees the same dependency
graph as Stack does, and can therefore parallelize the build on
multiple cores in exactly the same way Stack does. But because this is
Bazel, we can even distribute the build on multiple machines at once
Shared cache for Cabal libraries
The upshot is that you can now write polyglot projects that include Haskell code and hundreds of third-party dependencies without sweating. By building it with Bazel, you get the benefit of correct caching to accelerate all your build jobs. The Bazel cache can be remote and shared among all of your continuous integration worker machines and even shared with all of your developers. Bazel’s correctness guarantees make this safe to do. If a branch was published and the build succeeded, then any developer that checks out the branch now benefits from fast builds.
It’s an interesting story that to achieve correct, reproducible, and cacheable builds, we gainfully combined Haskell’s three main build technologies:
- Bazel to run build actions in parallel or distributed on many nodes in a build cluster,
- Cabal to interpret the metadata of existing third-party code and correctly construct shared and static libraries, and
- Stack to inform Bazel about where to find the source code for the third-party dependencies, what versions to use, and tell it what the dependency graph looks like.
Another interesting observation is that emulating Cabal is hard. We previously collaborated with Formation on Hazel, an effort to reimplement Cabal as a Bazel ruleset. It turned out that getting the Cabal semantics exactly right for all packages on all platforms (especially Windows) was exceedingly difficult. With the current approach, we lose a few build parallelization opportunities, but wrapping Cabal instead of reimplementing it leads to a much simpler solution overall.
Have a look at Digital Asset’s DAML repository.
DAML is an example of a large Haskell project powered by Bazel
and rules_haskell. It’s an open source smart contract language for building
distributed applications. You can build the project using this new Stack and
Cabal support on Linux and macOS. Windows support is in progress. The
repository has around 150 direct Hackage dependencies and makes use
of advanced features such as a custom
stack snapshot, custom package
flags, C library dependencies, and injecting vendored packages into
Stack’s dependency graph.