Services
BiotechFintechAutonomous Vehicles
Open sourceContactCareersBlog
1 July 2021 — by Torsten Schmits
Building OCaml Projects with Bazel
bazelocaml

This post describes how to use OBazl to build OCaml projects with Bazel. Compared to the first OCaml ruleset published in 2016, which contains only a few basic rules, OBazl is under active development and aims to provide highly granular, low-level control over the build, and eventually automatically generate build files. It is therefore a practical choice to enjoy Bazel’s incremental, parallel, cacheable and remote building capacities. However, it is in its early stages, requiring manual work for some tasks that will be automated in the future.

To exemplify its usage, I will show how to convert the build of the uri package to Bazel. Since OCaml and Bazel is not a frequent combination, I will give an introduction to Bazel, as well as some aspects of compiling OCaml modules. The final product of this endeavour can be examined on GitHub.

Bazel Basics

The uri project is suitable for this introduction because it is small, but incorporates a number of interesting build requirements. It contains three libraries (uri, uri-re and uri-sexp in the subdirectories lib, lib_re and lib_sexp), a test suite in the subdirectory lib_test and two code generation helpers for the tests.

The goal is to be able to run these command lines to build and test all libraries contained in the project:

bazel build //lib:lib-uri
bazel test //lib_test:test

These commands specify targets, which use this syntax:

@reponame//sub/dir:targetname

// denotes the project root directory, called workspace. It must contain the file WORKSPACE.bazel, in which dependencies and global aspects of the project are configured.

The part before : is the path of the subdirectory that contains the target, called a package, and can be omitted to reference the root directory. Each package must contain the file BUILD.bazel. The part after : is the name of the target, which is assigned in a BUILD.bazel. Multiple targets may be declared in a single package.

The whole specification can be prefixed with @some-name to reference an external repository configured in WORKSPACE.bazel. Using the .bazel extension for these files is optional.

Additionally, custom rule definitions and helper functions may be placed in arbitrary files with the .bzl extension. In this project, the subdirectory bzl hosts all the .bzl files.

OBazl Configuration

Let’s start configuring our build for OBazl by writing WORKSPACE.bazel. First, we initialize the workspace with a project name.

workspace(name = "uri")

Next, we import the http_archive function. Importing functions is done by the load function, which takes one argument denoting a file, with the target syntax from above, and a list of function names to be imported into the current scope. @bazel_tools is a native workspace distributed with Bazel.

That is, the following line imports the function http_archive from the file http.bzl in the tools/build_defs_repo package of the @bazel_tools workspace.

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

Using http_archive, we fetch the OBazl tarball. This gives us a new workspace, @obazl_rules_ocaml, that we can reference.

http_archive(
    name = "obazl_rules_ocaml",
    strip_prefix = "rules_ocaml-9a700d54e86ffce2da0e2bd425aebe2c79f5167c",
    urls = [
        "https://github.com/obazl/rules_ocaml/archive/9a700d54e86ffce2da0e2bd425aebe2c79f5167c.tar.gz"
    ],
    sha256 = "3ef69cca2c7829b3bc839c31eef8c02f498f8319ca33b4a27c740f8cbff24f51"
)

The next snippet imports OBazl’s main function, ocaml_configure, as well as the opam configuration dictionary that we will define in the file bzl/opam.bzl below.

load("@obazl_rules_ocaml//ocaml:bootstrap.bzl", "ocaml_configure")
load("//:bzl/opam.bzl", "opam")

Finally, we initialize the OBazl configuration:

ocaml_configure(opam = opam, build = "4.12")

Here, build = "4.12" specifies the compiler version for the project, which will be configured in the next step.

Opam Configuration

Opam uses build environment specifications called switches which provide a compiler and package dependency sets. We configure it in bzl/opam.bzl:

load("@obazl_rules_ocaml//ocaml:providers.bzl", "OpamConfig", "BuildConfig")

opam_pkgs = {
  "stringext": ["1.4.0"],
  . . .
}

opam = OpamConfig(
    version = "2.0",
    builds = {
        "4.12": BuildConfig(
            default = True,
            switch = "4.12",
            compiler = "4.12",
            packages = opam_pkgs,
            install = True,
        ),
    }
)

This defines a single compiler configuration named 4.12. The packages attribute offers a glimpse into the future of OBazl – the feature that would automatically fetch Opam packages isn’t fully implemented yet. Consequently, dependencies have to be installed manually, assuming opam is present in the system:

$ opam install ocamlfind stringext angstrom re sexplib ppx_sexp_conv ppx_deriving ounit

Building a Library

The uri project contains three libraries, each composed of one or more OCaml modules. To build those with Bazel, we place a file named BUILD.bazel in each library’s subdirectory.

In that file, we first import the needed OBazl functions:

load(
    "@obazl_rules_ocaml//ocaml:rules.bzl",
    "ocaml_signature",
    "ocaml_module",
    "ocaml_library",
)

The lib_re package contains two modules. For each module, we have to create a target for the signature and the implementation:

ocaml_signature(
    name = "uri_re_sig",
    src = "uri_re.mli",
    deps_opam = ["stringext", "re"],
)
ocaml_module(
    name = "uri_re",
    struct = "uri_re.ml",
    sig = "uri_re_sig",
    deps_opam = ["stringext", "re"],
)

The name attribute is the identifier of the target. In this case, uri_re_sig is assigned to the interface file uri_re.mli and referenced by the sig attribute of the ocaml_module rule. Since the signature is in the same package as the module, we can simply refer to the signature target by its name, but it would also be possible to use the extended target syntax: //lib_re:uri_re_sig. The struct argument denotes the file containing the implementation of the module.

The signature target compiles to a .cmi, while the module target compiles to a .cmx and combines the implementation and signature into a .o file. Building the module target with bazel build //lib_re:uri_re (referencing subdirectory:targetname) produces the .o at bazel-bin/lib_re/__obazl/Uri_re.o.

Applying the same rules to the other module, uri_legacy.ml, allows us to create a library target from those two modules:

ocaml_signature(
    name = "uri_legacy_sig",
    src = "uri_legacy.mli",
    deps_opam = ["stringext", "re"],
)
ocaml_module(
    name = "uri_legacy",
    struct = "uri_legacy.ml",
    sig = "uri_legacy_sig",
    deps_opam = ["stringext", "re"],
)

ocaml_library(
    name = "lib-uri-re",
    modules = ["uri_re", "uri_legacy"],
)

This library target does not compile anything, its purpose is merely to group the set of modules for reference in other targets (to build a cmxa archive, OBazl has an ocaml_archive rule).

Testing

The subdirectory lib_test contains two test suites for the core library uri.

Aside from the library, it also has a dependency on test fixtures, located in the etc subdirectory. Building those is a little more involved since they are generated from data files.

The pipeline for this task starts with the subdirectory config, which builds an executable that performs the code generation. This program converts the file etc/services.short to OCaml, which is combined with some extra code in etc/uri_services_raw.ml into a module named uri_services.ml.

Building the executable in config/BUILD.bazel is straightforward:

load("@obazl_rules_ocaml//ocaml:rules.bzl", "ocaml_module", "ocaml_executable")
ocaml_module(name = "gen_services", struct = "gen_services.ml", deps_opam = ["stringext"])
ocaml_executable(name = "exe", main = "gen_services")

The executable, //config:exe, can then be used as a dependency in the etc package to generate uri_services.ml:

genrule(
    name = "gen_uri_services",
    srcs = ["services.short", "uri_services_raw.ml"],
    outs = ["uri_services.ml"],
    cmd = "$(execpath //config:exe) $(execpath services.short) > [email protected] && cat
    $(execpath uri_services_raw.ml) >> [email protected]",
    tools = ["//config:exe"],
)

The function genrule allows us to execute an arbitrary shell command using other build products of our project. The rule depends on the executable, which can be used in the cmd attribute by calling the auxiliary function execpath which is provided by Bazel and produces the path to the referenced target. By the same mechanism, we get the paths to our data files. [email protected] is expanded to the paths in the outs attribute, containing the generated module’s file name, which will be written to bazel-bin/etc/uri_services.ml.

So far, so good – but now the module has to be built. This comes with a significant caveat: The module has an associated interface file, and it is statically located at etc/uri_services.mli. The naive approach would be a build definition referencing both signature and implementation by their target names:

ocaml_signature(name = "uri_services_sig", src = "uri_services.mli")
ocaml_module(name = "uri_services", struct = "uri_services.ml", sig = "uri_services_sig")

This compiles just fine, but when we use this module as a dependency of the test executable, the linker tells us:

Error: Files bazel-bin/lib_test/__obazl/Test_runner.cmx
       and bazel-bin/etc/__obazl/Uri_services.cmx
       make inconsistent assumptions over interface Uri_services

What is happening is that the OCaml compiler expects the .mli file to be in the same directory as the .ml file; but, since we generated the .mli file, it is located in the bazel-out directory, while the interface is in the source tree. Therefore, OCaml defaults to compiling Uri_services as if there were no Uri_services.mli, and exposes all of its internals.

The solution to this is to copy the interface file to the build artifact directory using the copy_file function from @bazel_skylib, Bazel’s standard library. However, there is yet another hurdle: in Bazel, the name of an output file cannot be the name of an input file; but at the same time, the .mli file and the .ml file must have matching names. To address this, we copy uri_services.mli to the generated/ subdirectory of the build artifact directory. The generated/ directory doesn’t exist in the sources, which guarantees that there is no name clash between inputs and outputs. Correspondingly uri_services.ml is generated in the generated/ directory.

load("@bazel_skylib//rules:copy_file.bzl", "copy_file")

copy_file(
    name = "uri_services_mli",
    src = "uri_services.mli",
    out = "generated/uri_services.mli",
)

genrule(
    name = "gen_uri_services",
    srcs = ["services.short", "uri_services_raw.ml"],
    outs = ["generated/uri_services.ml"],
    cmd = "$(execpath //config:exe) $(execpath services.short) > [email protected] && cat
    $(execpath uri_services_raw.ml) >> [email protected]",
    tools = ["//config:exe"],
)

We then create a library named //etc:services_short using the usual rules and are equipped to write the definition of the test suite in lib_test/BUILD.bazel:

load(
    "@obazl_rules_ocaml//ocaml:rules.bzl",
    "ocaml_module",
    "ocaml_test",
)

ocaml_module(
    name = "test_runner",
    struct = "test_runner.ml",
    deps_opam = ["oUnit"],
    deps = ["//lib:lib-uri", "//etc:services_short"],
)

ocaml_test(name = "test", main = "test_runner")

The test_runner depends on the test framework oUnit, the main library using its target label //lib:lib-uri, and the generated services. The rule ocaml_test is similar to ocaml_executable, but it defines a test target that allows us to run the test command:

bazel test //lib_test:test
INFO: Found 1 test target...
Target //lib_test:test up-to-date:
  bazel-bin/lib_test/test
INFO: Build completed successfully, 6 total actions
//lib_test:test                                                   PASSED in 0.3s

Executed 1 out of 1 test: 1 test passes.

Preprocessors

The lib_sexp directory, containing the uri-sexp library, has an additional requirement: a preprocessor from the opam package ppx_sexp_conv needs to be executed on the module before building it.

The module uri_sexp.ml, contains the preprocessor annotation @@deriving sexp:

type component = [
  | `Scheme
  . . .
] [@@deriving sexp]

OBazl provides additional tools for this purpose. The OCaml preprocessor core library ppxlib provides an entry point for standalone executables that perform the rewrite routines of any linked preprocessor library. By linking ppx_sexp_conv, we can create a tool that can be used by OBazl:

write_file(
    name = "driver_ml",
    out = "driver.ml",
    content = ["let () = Ppxlib.Driver.standalone ()"],
)

ppx_module(
    name = "driver",
    struct = ":driver.ml",
    deps_opam = ["ppxlib"],
)

ppx_executable(
    name = "sexp_preprocessor",
    main = ":driver",
    opts = ["-linkall"],
    deps_opam = ["ppx_sexp_conv"],
)

In order to apply the preprocessor to a module, sexp_preprocessor has to be passed as the ppx attribute of the ppx_module rule, along with the output encoding in ppx_print:

ppx_module(
    name = "uri_sexp",
    struct = "uri_sexp.ml",
    sig = ":uri_sexp_sig",
    deps_opam = ["sexplib", "ppx_sexp_conv"],
    deps = ["//lib:lib-uri"],
    ppx = ":sexp_preprocessor",
    ppx_print = "@ppx//print:text",
)

The intermediate file at bazel-bin/lib_sexp/__obazl/Uri_sexp.ml shows that preprocessing was successful:

type component = [
  | `Scheme
  . . .
] [@@deriving sexp]
include
  struct
    let rec __component_of_sexp__ =
      (let _tp_loc = "lib_sexp/uri_sexp.ml.Derived.component" in
      . . .

Now the generated file has to be compiled. Since the module file is now located in the build directory while the interface is in the source directory, the same treatment as for the uri_services module is necessary: we have to copy the interface file to the build directory. In this case, the generator rule is not directly under our control, so we’ll have to adapt to the ppx rules’ output path, which is __obazl/Uri_sexp.ml:

load("@obazl_rules_ocaml//ocaml:rules.bzl", "ppx_library")

copy_file(
    name = "uri_sexp_mli",
    src = "uri_sexp.mli",
    out = "__obazl/uri_sexp.mli",
)

ocaml_signature(
    name = "uri_sexp_sig",
    src = "uri_sexp_mli",
    deps_opam = ["sexplib"],
    deps = ["//lib:lib-uri"],
)

Using ppx_library instead of ocaml_library isn’t strictly necessary here; it is a convention to clearly mark the involvement of a preprocessor through Bazel providers.

Conclusion

The current state of OBazl is quite functional, albeit in early stages. You will find that you sometimes need rather substantial efforts to compensate for missing features. For instance, when I started working with OBazl, it didn’t work on NixOS (we like NixOS here at Tweag). So I had to implement NixOS support myself.

However, development is in full swing, with promising improvements on the horizon, including automatic build file generation, dependency handling, and even support for Coq builds.

This article is licensed under a Creative Commons Attribution 4.0 International license.
Interested in working at Tweag?Join us
See our work
  • Biotech
  • Fintech
  • Autonomous vehicles
  • Open source
Tweag
Tweag HQ → 207 Rue de Bercy — 75012 Paris — France
[email protected]
© 2020 Tweag. All rights reserved
Privacy Policy