Tweag - Engineering blog

The minimal megaparsec tutorial

Thu, 24 Apr 2025 00:00:00 GMT

In my functional programming course to Master Students of Telecom Nancy, I like to use parsing as an example of monadic programming, relying on the megaparsec library. My only concern with megaparsec is that its official tutorial is long: at the time I’m writing, it’s 15000 words long.

Unlike the official megaparsec tutorial, this blog post is intended to be smaller, and is aimed at an audience with only a basic understanding of Haskell and monadic programming.

All the Haskell material from this blogpost is available on our GitHub: https://github.com/tweag/minimal-megaparsec-tutorial. You can fork this repository to get a full-fledged setup (including CI and Haskell Language Server support) for experimenting with a megaparsec parser 🚀

Running example

My running example is a parser for a domain-specific language that I designed for the class. This language uses primitive drawing commands to represent ASCII art roguelike maps. It looks like this:

HLine 0 0 8; HLine 0 4 8; VLine 0 0 5; VLine 7 0 2; VLine 7 3 2
HLine 8 1 2; HLine 8 3 2
HLine 10 0 6; HLine 10 4 6; VLine 10 0 2; VLine 10 3 2; VLine 15 0 5
Start 2 1
Cell 13 3 ~; Cell 14 3 ~; Cell 14 2 ~

Here, HLine x y len and VLine x y len draw horizontal and vertical walls respectively. The Start x y command marks the player’s starting point and Cell x y ~ places special terrain.

Roguelike maps typically consist of rectangular rooms and connecting corridors, where walls are shown as #, water as ~, and walkable spaces as dots (.) For example, the snippet above draws a map with two connected rooms. The room on the left contains the player’s start location (>), while some water appears in the lower right corner of the room on the right:

########  ######
#.>....####....#
#.............~#
#......####..~~#
########  ######

Walkable floor cells are omitted from the domain-specific language, as they can be inferred by computing the set of cells reachable from the starting point. In implementations of roguelikes, maps like this one are translated into an array of arrays of symbols, with some symbols being walkable (e.g. dot cells and water cells) and some symbols being blockers (walls). The top-level array is then used to compute possible moves and collisions.

The `Parsec` monad

To use megaparsec, we define our main monad type using the Parsec e s a type. It has three arguments:

The type of errors returned by the parser,
the type of stream accepted as input by the parser, and
the type of data returned upon successful parsing of an input stream.

For a simple parser, we define:

The error type to be Text, for simplicity. In a production parser, you would use a structured error type, that distinguishes the different error cases; so that you can handle them differently.
The input stream to be Text, because this is the most idiomatic choice in the Haskell ecosystem:

import Data.Text (Text)
import Text.Megaparsec

type Error = Text
type Input = Text

-- | @Parser a@ is a parser that accepts @Text@ as input and returns an @a@ upon
-- successful parsing.
type Parser a = Parsec Error Input a

Our first parser

Parsers are built from primitive combinators (e.g. lookAhead, notFollowedBy, end of file eof) and combinators derived from them (e.g. oneOf, anySingle, satisfy). These combinators are designed to consume a few symbols, not complex structures (more on this later).

Combinators return parsers in any MonadParsec monad, which means that they have a signature where the head is MonadParsec e s m => ... and the return type is of the form m a ¹. In our context, it suffices to know that m a is instantiated to Parser a, so we can use these combinators for our parsers.

Let’s parse the different kinds of symbols we usually find in ASCII art roguelike maps, using the anySingle function, which parses a single token. In our case, since the input type is Text, the type of tokens is Char (see the ShareInput case of Stream’s documentation, as well as the instances of Stream):

-- | A symbol in the map of an ASCII roguelike
data Symbol
  = -- | A wall, depicted by a # character
    Wall
  | -- | A water cell, depicted by a ~ character
    Water
  deriving (Eq, Show)

-- | A parser for the symbol of a single cell. Used in 'parseElement' below.
parseSymbol :: Parser Symbol
parseSymbol = do
  c <- anySingle
  case c of
    '#' -> return Wall
    '~' -> return Water
    _   -> fail $ "Unknown symbol: " <> [c] -- See below for how to avoid this case altogether (in parseLineElement)

Parser combinators

By virtue of MonadParsecs being monads, parsers can be built using functions that are common in monadic Haskell code (including functions from Functor, Applicative, etc.). Let’s demonstrate this to build a parser for more advanced roguelike map constructs:

data Element
  = -- | Horizontal wall, starting at @(x,y)@ with @length@ cells (ending at @(x+length-1,y)@)
    HorizontalLine Int Int Int
  | -- | Vertical wall, starting at @(x,y)@ with @length@ cells (ending at @(x,y+length-1)@)
    VerticalLine Int Int Int
  | -- | A cell at @(x,y)@ with a symbol
    Cell Int Int Symbol
  | -- | The starting point of the player
    Start Int Int
  deriving (Eq, Show)

The parser for the HorizontalLine and VerticalLine cases can be written as follows:

import Control.Monad (void)
import Control.Monad.Extra (when)
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer

parseLineElement :: Parser Element
parseLineElement = do
  constructor <- choice [string "HLine" >> return HorizontalLine, string "VLine" >> return VerticalLine]
  space1 -- One or more space
  x <- decimal
  space1
  y <- decimal
  space1
  len <- decimal
  when (len < 1) $ fail $ "Length must be greater than 0, but got " <> show len
  return $ constructor x y len

The first two lines either parse the string HLine or the string VLine and use the choice function to encode the two possibilities. Also, because each line in a do block encodes a step in the computation, writing monadic parsers is natural: each line consumes some of the input, until enough is consumed to return the desired value. Another example of using a regular monadic function is to use when to stop parsing when an incorrect value is consumed.

Running parsers

Since our parser takes Text as input, it can be tested in a pure context. Megaparsec provides the runParser function for this. To be able to print errors of our parser, our error type must be an instance of ShowErrorComponent; and then we can define a convenient runMyParser function that returns either an error or the parsed value:

import Data.Text (pack, unpack)

-- | Instance required for 'runMyParser'
instance ShowErrorComponent Error where
  showErrorComponent = unpack

-- | A variant of megaparsec's 'runParser', instantiated to our context.
-- Successfully parses an @a@ or returns an error message.
runMyParser :: Parser a -> Input -> Either Text a
runMyParser parser input =
  case runParser parser "" input of
    Left err -> Left $ pack $ errorBundlePretty err
    Right x  -> Right x

Parsing expressions, lists, etc.

Megaparsec not only provides building blocks for parsing tokens and combining parsers. It also provides parsers for common constructs found in programming languages and domain-specific languages, such as expressions and lists. Megaparsec does this by relying on the parser-combinators package.

I don’t want to go into the details of parsing expressions here (e.g. parsing 1 + 2 - 3…), but let me emphasize that it is a bad idea to write your own expression parser. Instead, think about what kind of operators you need and encode them, using the Operator type.

List parsing, on the other hand, is done with various sep… functions. In our case of roguelike maps, we allow different elements to be separated by a semicolon, or by one or more newlines. This is encoded as follows:

parseElements :: Parser [Element]
parseElements = parseElement `sepBy1` separator
  where
    separator = do
      hspace -- Optional horizontal (non-newline) space
      choice [void $ char ';', void $ some eol] -- Either a single ';' or many newlines
      hspace
    parseElement :: Parse Element
    parseElement = choice [parseLineElement, parseStart, parseCell]
      where
        parseStart = do
          void $ string "Start"
          space1
          (x, y) <- parseCoord
          return $ Start x y
        parseCell = do
          void $ string "Cell"
          space1
          (x, y) <- parseCoord
          space1
          symbol <- parseSymbol
          return $ Cell x y symbol
        parseCoord = do
          x <- decimal
          space1
          y <- decimal
          return (x, y)

Conclusion

We’ve presented how to parse simple constructs using megaparsec and how to run our parsers. This blog post is less than 1500 words long: mission accomplished presenting megaparsec in a shorter way than the official tutorial 🥳

If you want to use the code from this blog post as a starting point, feel free to clone https://github.com/tweag/minimal-megaparsec-tutorial. And once your project is moving away from a minimal viable product, head over to megaparsec’s official tutorial to learn about more advanced ways to use megaparsec!

This is an instance of the monad transformer pattern.↩

Frontend live-coding via ghci

Thu, 17 Apr 2025 00:00:00 GMT

A few months ago, I announced that the GHC wasm backend added support for Template Haskell and ghci. Initially, the ghci feature only supported running code in nodejs and accessing the nodejs context, and I’ve been asked a few times when ghci was going to work in browsers in order to allow live-coding the frontend. Sure, why not? I promised it in the last blog post’s wishlist. After all, GHCJS used to support GHCJSi for browsers almost 10 years ago!

I was confident this could be done with moderate effort. Almost all the pieces are already in place: the external interpreter logic in GHC is there, and the wasm dynamic linker already works in nodejs. So just make it runnable in browsers as well, add a bit of logic for communicating with GHC and we’re done right? Well, it still took a few months for me to land it…but finally here it is!

To keep this post within reasonable length, I will only introduce the user-facing aspects of the wasm ghci browser mode and won’t cover the underlying implementation. The rest of the post is an example ghci session followed by a series of bite sized subsections, each covering one important tip about using this feature.

How to use it

The ghc-wasm-meta repo provides user-facing installation methods for the GHC wasm backend. Here we’ll go with the simplest nix-based approach:

$ nix shell 'gitlab:haskell-wasm/ghc-wasm-meta?host=gitlab.haskell.org'
$ wasm32-wasi-ghc --interactive -fghci-browser
GHCi, version 9.12.2.20250327: https://www.haskell.org/ghc/  :? for help
Open http://127.0.0.1:38827/main.html or import http://127.0.0.1:38827/main.js to boot ghci

The -fghci-browser flag enables the browser mode. There are a couple of other related flags which you can read about in the user manual, but for now, let’s open that page to proceed. You’ll see a blank page, but you can press F12 to open the devtools panel and check the network monitor tab to see that it’s sending a lot of requests and downloading a bunch of wasm modules. Within a few seconds, the initial loading process should be complete, and the ghci prompt should appear in the terminal and accept user commands.

Let’s start with the simplest:

ghci> putStrLn "hello firefox"
ghci>

The message is printed in the browser’s devtools console. That’s not impressive, so let’s try something that only works in a browser:

ghci> import GHC.Wasm.Prim
ghci> newtype JSButton = JSButton JSVal
ghci> foreign import javascript unsafe "document.createElement('button')" js_button_create :: IO JSButton
ghci> foreign import javascript unsafe "document.body.appendChild($1)" js_button_setup :: JSButton -> IO ()
ghci> btn <- js_button_create
ghci> js_button_setup btn

A fresh button just appeared on the page! It wouldn’t be useful if clicking it does nothing, so:

ghci> newtype Callback t = Callback JSVal
ghci> foreign import javascript "wrapper sync" syncCallback :: IO () -> IO (Callback (IO ()))
ghci> foreign import javascript unsafe "$1.addEventListener('click', $2)" js_button_on_click :: JSButton -> Callback (IO ()) -> IO ()

The above code implements logic to export a Haskell IO () function to a JavaScript synchronous callback that can be attached as a button’s client event listener. Synchronous callbacks always attempt to run Haskell computations to completion, which works fine as long as the exported Haskell function’s main thread does not block indefinitely, like waiting for an async JSFFI import to resolve or be rejected. You can read more about JSFFI in the user manual, but let’s carry on with this example:

ghci> import Data.IORef
ghci> ref <- newIORef 0
ghci> :{
ghci| cb <- syncCallback $ do
ghci|   print =<< readIORef ref
ghci|   modifyIORef' ref succ
ghci| :}
ghci> js_button_on_click btn cb

Now, the button is attached to a simple counter in Haskell that prints an incrementing integer to the console each time the button is clicked. And that should be sufficient for a minimal demo! Now, there are still a couple of important tips to be mentioned before we wrap up this post:

Hot reloading

Just like native ghci, you can perform hot reloading:

ghci> :r
Ok, no modules to be reloaded.
ghci> btn
:15:1: error: [GHC-88464]
    Variable not in scope: btn

Reloading nukes all bindings in the current scope. But it doesn’t magically undo all the side effects we’ve performed so far: if you click on the button now, you’ll notice the counter is still working and the exported Haskell function is still retained by the JavaScript side! And this behavior is also consistent with native ghci: hot-reloading does not actually wipe the Haskell heap, and there exist tricks like foreign-store to persist values across ghci reloads.

For the wasm ghci, things like foreign-store should work, though you can allocate a stable pointer and print it, then reconstruct the stable pointer and dereference it after a future reload. Since wasm ghci runs in a JavaScript runtime after all, you can also cook your global variable by assigning to globalThis. Or locate the element and fetch its event handler, it should be the same Haskell callback exported earlier which can be freed by freeJSVal.

So, when you do live-coding that involve some non-trivial back and forth calling between JavaScript and Haskell, don’t forget that hot reloads don’t kill old code and you need to implement your own logic to disable earlier callbacks to prevent inconsistent behavior.

Loading object code

The wasm ghci supports loading GHC bytecode and object code. All the code you type into the interactive session is compiled to bytecode. The code that you put in a .hs source file and load via command line or :l commands can be compiled as object code if you pass -fobject-code to ghci.

I fixed the ghci debugger for all 32-bit cross targets since the last blog post. Just like native ghci, debugger features like breakpoints now work for bytecode. If you don’t use the ghci debugger, it’s recommended that you use -fobject-code to load Haskell modules, since object code is faster and more robust at run-time.

Interrupting via ^C

My GHC patch that landed the ghci browser mode also fixed a previous bug in wasm ghci: ^C was not handled at all and would kill the ghci session. Now, the behavior should be consistent with native ghci. With or without -fghci-browser, if you’re running a long computation and you press ^C, an async exception should interrupt the computation and unblock the ghci prompt.

Read the `:doc`, Luke

Among the many changes I landed in GHC since last blog post, one of them is adding proper haddock documentation to all user-facing things exported by GHC.Wasm.Prim. Apart from the GHC user manual, the haddock documentation is also worth reading for users. I haven’t set up a static site to serve the haddock pages yet, but they are already accessible in ghci via the :doc command. Just try import GHC.Wasm.Prim and check :doc JSVal or :doc freeJSVal, then you can read them in plain text.

As the Haskell wasm user community grows, so will the frustration with lack of proper documentation. I’m slowly improving that. What you see in :doc will continue to be polished, same for the user manual.

Importing an `npm` library in ghci

You can use JavaScript’s dynamic import() function as an async JSFFI import. If you want to import an npm library in a ghci session, the simplest approach is using a service like esm.run which serves pre-bundled npm libraries as ES modules over a CDN.

If you have a local npm project and want to use the code there, you need to do your own bundling and start your own development server that serves a page to make that code somehow accessible (e.g. via globalThis bindings). But how does that interact with the wasm ghci? Read on.

Using ghci to debug other websites

The browser mode works by starting a local HTTP server that serves some requests to be made from the browser side. For convenience, that HTTP server accepts CORS requests from any origin, which means it’s possible to inject the main.js startup script into browser tabs of other websites and use the wasm ghci session to debug those websites! Once you fire up a ghci session, just open the devtools console of another website and drop a import("http://127.0.0.1:38827/main.js") call, if that website doesn’t actively block third-party scripts, then you can have more fun than running it in the default blank page.

All JavaScript code for the GHC wasm backend consists of proper ES modules that don’t pollute the globalThis namespace. This principle has been enforced since day one, which allows multiple Haskell wasm modules or even wasm ghci sessions to co-exist in the same page! It works fine as long as you respect their boundaries and don’t attempt to do things like freeing a JSVal allocated elsewhere, but even if you only have one wasm module or ghci session, the “no global variable” principle should also minimize the interference with the original page.

In my opinion, being able to interact with other websites is the most exciting aspect of the browser mode. Sure, for Haskell developers that want to experiment with frontend development, using ghci should already be much easier than setting up a playground project and manually handling linker flags, wrapper scripts, etc. But there’s even greater potential: who said the website itself needs to be developed in Haskell? Haskell can be used to test websites written in foreign tech stacks, and testing backed by an advanced type system is undoubtedly one of our core strengths! You can use libraries like quickcheck-state-machine or quickcheck-dynamic to perform state machine property testing interactively, which has much greater potential of finding bugs than just a few hard coded interactions in JavaScript.

No host file system in wasm

The default nodejs mode of wasm ghci has full access to the host file system, so you can use Haskell APIs like readFile to operate on any host file path. This is no longer the case for browser mode: the only handles available are stdout/stderr, which output to the devtools console in a line-buffered manner, and there’s no file to read/write in wasm otherwise. The same restriction also applies to Template Haskell splices evaluated in a browser mode ghci session, so splices like $(embedFile ...) will fail.

This is a deliberate design choice. The dev environment backed by ghci browser mode should be as close as possible to the production environment used by statically linked wasm modules, and the production environment won’t have access to the host file system either. It would be possible to add extra plumbing to expose the host file system to ghci browser mode, but that is quite a bit of extra work and also makes the dev environment less realistic, so I’d like to keep the current design for a while.

If you need to read a local asset, you can serve the asset via another local HTTP server and fetch it in ghci. If you have modules that use splices like embedFile, those modules should be pre-compiled to object code and loaded later in ghci.

Don’t press F5

It’s very important that the browser page is never refreshed. The lifetime of the browser tab is supposed to be tied to the ghci session. Just exit ghci and close the tab when you’re done, but refreshing the page would completely break ghci! A lot of shared state between the browser side and host side is required to make it work, and refreshing would break the browser side of the state.

Likewise, currently the browser mode can’t recover from network glitches. It shouldn’t be a concern when you run GHC and the browser on the same machine, but in case you use SSH port forwarding or tailscale to establish the GHC/browser connection over an unstable network, once the WebSocket is broken then the game is over.

This is not ideal for sure, but supporting auto-recovery upon network issues or even page reloads is incredibly challenging, so let’s live with what is supported for now.

Doesn’t work on Safari yet

Currently the browser mode works fine for Firefox/Chrome, including desktop/mobile versions and all the forks with different logos and names. Sadly, Safari users are quite likely to see spurious crashes with a call_indirect to a null table entry error in the console. Rest assured, normal statically-linked Haskell wasm modules still work fine in Safari.

This is not my fault, but WebKit’s! I’ve filed a WebKit bug and if we’re lucky, this may be looked into on their side and get fixed eventually. If not, or if many people complain loudly, I can implement a workaround that seems to mitigate the WebKit bug to make the browser mode work in Safari too. That’ll be extra maintenance burden, so for now, if you’re on macOS, your best bet is installing Firefox/Chrome and using that for ghci.

Huge libraries don’t work yet

How large is “huge”? Well, you can check the source code of V8, SpiderMonkey and JavaScriptCore. In brief: there are limits agreed upon among major browser engines that restrict a wasm module’s import/export numbers, etc, and we do run into those limits occasionally when the Haskell library is huge. For instance, the monolithic ghc library exceeds the limit, and so does the profiling way of ghc-internal. So cost-center profiling doesn’t work for the ghci browser mode yet, though it does work for statically linked wasm modules and ghci nodejs mode.

Unfortunately, this issue is definitely not a low hanging fruit even for me. I maintain a nodejs fork that patches the V8 limits so that the Template Haskell runner should still work for huge libraries, but I can’t do the same for browsers. A fundamental fix to sidestep the browser limits would be a huge amount of work. So I’ll be prioritizing other work first. If you need to load a huge library in the browser, you may need to split it into cabal sublibraries.

Wishlist, as usual

My past blog posts usually ends with a “what comes next” section. This one is no exception. The browser mode is in its early days, so it’s natural to find bugs and other rough edges, and there will be continuous improvement in the coming months. Another thing worth looking into is profiling: modern browsers have powerful profilers, and it would be nice to integrate our own profiling and event log mechanism with browser devtools to improve developer experience.

The next big thing I’ll be working on is threaded RTS support. Currently all Haskell wasm modules are single-threaded and runs in the browser main thread, but there may exist workloads that can benefit from multiple CPU cores. Once this is delivered, Haskell will also become the first functional language with multi-core support in wasm!

You’re welcome to join the Haskell wasm Matrix room to chat about the GHC wasm backend and get my quick updates on this project.

Practical recursion schemes in Rust: traversing and extending trees

Thu, 10 Apr 2025 00:00:00 GMT

Rust has always felt like a strange beast, culturally speaking. The community is made of a mix of people with very different perspectives, including anything from hardcore low-level kernel hackers to category-theorist and functional programming gurus. This is also what makes this community so fertile: whether you’re coming from C, Haskell or TypeScript, you’re likely to learn a lot from other perspectives.

I’d like to add my modest contribution by introducing a pattern coming from the functional programming world, recursion schemes¹. Recursion schemes are a design pattern for representing and traversing recursive data structures (typically trees) which help factor the common part of recursive traversals, making transformations nicer to write, to read and to compose.

Even in the functional programming world, recursion schemes are not so well-known. Like monads, they are usually presented in Haskell with frightening words like zygohistomorphic prepromorphisms. It’s a pity because recursion schemes can be both simple, useful and practical. I’d even argue that in Rust, the most interesting part is perhaps the representation technique, more than the traversal, despite the latter being the original and the usual motivation for using recursion schemes.

In this post, we’ll work through a concrete example to introduce recursion schemes and what they can do. We’ll point to a more real life example of how we use them in the implementation of the Nickel configuration language, and we’ll discuss the pros and cons of using recursion schemes in the particular context of Rust.

(In)flexible representations

Let’s say you’re writing a JSON parser library. You’ll need to expose a type representing JSON values. For the sake of argument, let’s assume that you support an extension of the JSON language with pairs, so you can write {"foo": ("hello","world")}. Here’s a natural representation:

pub enum JsonValue {
  String(String),
  Number(f64),
  Pair(Box<JsonValue>, Box<JsonValue>),
  Array(Vec<JsonValue>),
  Object(HashMap<String, JsonValue>),
}

This data structure is recursive: JSON values can contain other JSON values. We thus have to use Box (or any other indirection) around recursive occurrences of JsonValue. Otherwise, this enum would have an infinite size (excepted for Array and Object since Vec and HashMap add their own indirection, but it’s somehow luck).

Now, user requestor asks that your parser adds location information to the output, because they validate some user-provided configuration and would like to point to specific items on error. This is a reasonable request which is sadly very hard to satisfy in the serde ecosystem. Anyway, our parser isn’t interfacing with serde, so we can add span information:

pub type Span = std::ops::Range<usize>;

pub struct Spanned<T> {
  pos: Span,
  data: T,
}

pub type SpannedValue = Spanned<JsonValue>;

pub enum JsonValue {
  String(String),
  Number(f64),
  Pair(Box<SpannedValue>, Box<SpannedValue>),
  Array(Vec<SpannedValue>),
  Object(HashMap<String, SpannedValue>),
}

You can go different ways about this. We could have added a second argument to each constructor of the enum, such as in String(String, Span), to avoid the additional Spanned layer, but that would be a lot of repetition. We could also have moved Box to data: Box. Still, the general idea is that we now have two layers:

a struct layer gathering the JSON data and the span together;
the original enum layer, the core of JSON, which is almost unchanged.

So far, so good. But user conservator is now complaining that you’ve spoiled their performance. They’re using JSON as a machine exchange format and don’t care about position information. Could you restore the old representation and a way to produce it, ignoring spans?

Unfortunately, we had to change JsonValue. Copy-pasting the original JsonValue enum under a different name is possible, but it’s unsatisfying, as we now have multiple copies to maintain. It also doesn’t scale. Beside adding position information, you might want to have a value representation that uses Rc instead of Box, because you’re going to need to keep reference to arbitrary nodes during some complex transformation.

The functorial representation

The recursion schemes pattern has two components: a representation technique and a transformation technique. I believe the representation part is particularly interesting for Rust, so let’s start with that.

We’ll try to make our JSON representation more generic to accommodate for the different variations that we mentioned in the previous section. The fundamental idea is to replace the recursive occurrences of JsonValue within itself, Box (or JsonValue for Array and Object), by a generic parameter T. Doing so, we’re defining just one layer of a JSON tree where recursive children can be anything, not necessarily JSON values (we use the F suffix for that generic version because it’s technically a functor, but that doesn’t really matter).

pub enum JsonValueF<T> {
  String(String),
  Number(f64),
  Pair(T, T),
  Array(Vec<T>),
  Object(HashMap<String, T>),
}

Let’s play a with a few examples to get familiar with this representation.

If we set T = (), we get a type that is isomorphic (modulo some ()) to:
```
JsonValueF<()> ~ enum {
  String(String),
  Number(f64),
  Pair,
  Array,
  Object,
}
```
This is precisely a single node of a JSON tree, that is either a leaf or a marker of a node with children but without actually including them.
If we set T = Box>, we get back the original JsonValue. But wait, you can’t define the generic parameter T to be something which depends on T itself! In fact we can, but we need to introduce an extra indirection:
```
pub struct JsonValue {data: JsonValueF<Box<JsonValue>>}
```
The price to pay is an additional struct layer, so you need to match on value.data, and wrap new values as JsonValue { data: JsonValueF::Number(0) }. Note that this layer doesn’t have any cost at run-time.

Another difference is that we now box the values in Array and Object, which isn’t needed. For now I’ll just ignore that, but you could take a second generic parameter U to represent the occurrences of T that don’t need an indirection if this really matters to you.
If we extend our intermediate layer a bit, we can get SpannedValue!
```
pub struct SpannedJsonValue {
  data: JsonValueF<Box<SpannedJsonValue>>,
  span: Span,
}
```
You can create any extension of JsonValue with additional metadata lying at each node of the tree, which is pretty neat.
We are also able to change the ownership model of JSON values. It’s simple to write a reference-counted variant:
```
pub struct SharedJsonValue {data: JsonValueF<Rc<SharedJsonValue>>}
```
Or a borrowed version, that you could allocate in an arena:
```
pub struct ArenaJsonValue<'a> {data: JsonValueF<&'a ArenaJsonValue>}
```

This idea of putting a self-referential type within JsonValueF is referred to as tying the knot. The power of this approach is that you can keep the core JsonValueF type unchanged. This applies to any tree-like recursive structure.

Some methods can be implemented only once on JsonValueF for any T, say is_string or is_number. With additional trait constraints on T, we can write more involved functions, still operating on the generic functor representation.

Let’s now see how to traverse our JSON values.

Traversals

The strong point of recursion schemes is to provide an interface for traversing recursive structures that let you focus on what the function actually does, which is otherwise mixed with how the recursion is done. The idea is to use generic combinators which factor out the plumbing of recursive traversals.

Let’s count the number of String nodes in a JSON value, the naive way.

fn count_strings(value: &JsonValue) -> u32 {
    match &value.data {
        JsonValueF::String(_) => 1,
        JsonValueF::Number(_) => 0,
        JsonValueF::Pair(fst, snd) => count_strings(fst) + count_strings(snd),
        JsonValueF::Array(array) => array.iter().map(|elt| count_strings(elt)).sum(),
        JsonValueF::Object(object) => object.values().map(|elt| count_strings(elt)).sum(),
    }
}

We’ll see how to write this function in the style of recursion schemes. First, we need to define one core combinator: map.

map takes a JsonValueF, a function f from T to U and returns a JsonValue. That is, map takes a JSON layer where all the direct children (the recursive occurrences in our full type) are of some type T and applies f to transform them to something of type U. This is the secret sauce for defining traversals.

impl<T> JsonValueF<T> {
    fn map<U>(self, f: impl FnMut(T) -> U) -> JsonValueF<U> {
        match self {
            JsonValueF::String(s) => JsonValueF::String(s),
            JsonValueF::Number(n) => JsonValueF::Number(n),
            JsonValueF::Pair(fst, snd) => JsonValueF::Pair(f(fst), f(snd)),
            JsonValueF::Array(array) => {
                JsonValueF::Array(array.into_iter().map(|elt| f(elt)).collect())
            }
            JsonValueF::Object(object) => {
                JsonValueF::Object(object.into_iter().map(|(k, v)| (k, f(v))).collect())
            }
        }
    }
}

map isn’t specific to JsonValueF. It can be defined mechanically for any functor representation (e.g. through a macro) of a data structure.

Note that there’s no recursion in sight: there can’t be, because T and U are entirely generic and could very well be (), but we saw that JsonValueF<()> is a single node. map only operates at the current layer.

The trick is that f can use map itself. Let’s see how to use it for count_strings:

fn count_strings(value: JsonValue) -> u32 {
    match value.data.map(|child| count_strings(*child)) {
        JsonValueF::String(_) => 1,
        JsonValueF::Number(_) => 0,
        JsonValueF::Pair(fst, snd) => fst + snd,
        JsonValueF::Array(array) => array.iter().sum(),
        JsonValueF::Object(object) => object.values().sum(),
    }
}

If you look closely, there’s no more recursion in the body of the pattern matching. It’s factored out in the map call. Let’s break down this example:

map, given a function from T to U, promises you that it can transform the direct children of type T in JsonValueF to U, providing JsonValueF. We use it immediately with a recursive call to count_strings, which can indeed transform the direct children from a Box to a u32. If the children have children itself, count_strings will do that recursively as its first action, down to the leaves.

Once we’ve reduced potential children of deeper layers to u32s, we get a JsonValueF. We sum its content at the current layer.

There is a catch though: our count_strings function takes an owned argument, which consumes the original JSON value. I’ll come back to that later.

While I find the second version of count_strings a little cleaner, the difference between the two isn’t really astonishing.

As a more compelling example, let’s define a generic bottom-up traversal function on JsonValue. This traversal is able to map — that is to rewrite — nodes (more exactly entire subtrees). map_bottom_up takes a generic transformation f and applies this function to every subtree starting from the leaves. You could use such a function to apply program transformations or optimizations on an abstract syntax tree.

impl JsonValue {
    pub fn map_bottom_up(self: JsonValue, f: impl FnMut(JsonValue) -> JsonValue) -> JsonValue {
        let data = self.data.map(|v| Box::new(v.map_bottom_up(f)));
        f(JsonValue { data })
    }
}

This example is quite remarkable: it’s almost a one-liner and there is no pattern matching at all! Once again, the structural recursion is entirely factored out in the map function. We implemented map_bottom_up on JsonValue directly, but with some trait constraints on T, we can write a more generic version JsonValueF that works on both the Boxed and Rced version (the arena one is more tricky as it requires an explicit allocator). This example is only scratching the surface.

Mapping is just one example: another common traversals are folds (known as catamorphisms in the recursion schemes jargon), which generalize the well-known Iterator::fold from sequences to trees. In fact, count_strings would make more sense as a fold, but we’ll leave that for another time.

Are recursion schemes useful in Rust?

Haskell has a number of features that make recursion schemes particularly nice to use and to compose, not the least of which is garbage collection. You don’t have to think about ownership; it’s references all the way down. Recursive data structures are easy to express.

On the other side, there is Rust, which culturally doesn’t like recursive functions that much, for good and bad reasons². Though sometimes recursion is hard to avoid, especially on tree-like data structures.

An important issue is that our count_strings consumes its argument, which is unacceptable in practice. It is possible to write a version of map that takes a value by reference, and thus similarly for count_strings, but it’s not entirely straightforward nor free. You can find a by-reference version and more explanations in our associated repository. At any rate, you can always write specific traversals manually without resorting to the recursion schemes way if needed. It’s not an all or nothing approach.

In fact, even if you don’t use map at all, the functor representation alone is quite useful.

How we use recursion schemes in Nickel

In the implementation of the Nickel configuration language, we use the functor representation for the abstract syntax tree of a static type. Here are the stages we went through:

In the parser and most of the Nickel pipeline, we used to have a simple Box-based, owned representation, akin to JsonValue.
However, during type inference, the Nickel typechecker needs to handle new type constructions, in particular unification variables. Those are as-of-yet unknown types, similar to unknowns in an algebraic equation. Extending the base representation is readily done as for SpannedJsonValue:
```
pub enum UnifType {
  Concrete(Box<TypeF<UnifType>>),
  /// A unification variable.
  UnifVar(VarId),
  //.. rigid type variables, etc.
}
```
More recently, we’ve split the historical, all-powerful unique representation of expressions (including Nickel types) into two intermediate ones. The new initial representation is arena-allocated, which makes it natural to use bare references as the recursive indirection instead of allocating in the heap through e.g. Box. This is easy with recursion schemes: that is precisely the ArenaJsonValue example. For a smooth transition, we need to temporarily keep the old Box-ed Type representation in parts of the codebase, but having different representations co-exist is a basic feature of recursion schemes.

We use map-based traversal typically to substitute type variables (that is, a Nickel generic type, as our T in Rust) for a concrete type and similar rewriting operations. We have variants of the core map function that can also thread mutable state, raise errors, or both. Traversal by reference are implemented manually, with a plain recursive function.

On the downside, type and core function definitions can be a bit verbose and tricky to get right. For example, Nickel’s TypeF has sub-components that themselves contain types leading to 4 generic parameters. There are multiple possibilities for Box placement in particular, only some of them are correct and they are subtly different. Though once you’ve defined a new variant, this complexity is mostly hidden from the consumers of your API. It can still manifest as terrible Rust type errors sometimes if, God forbid, you’ve put a Box at the wrong place.

Conclusion

We’ve introduced recursion schemes, a design pattern for representing and traversing recursive data structures. While the traversal part isn’t as good a fit as in purer functional languages like Haskell, it can still be useful in Rust. The representation part is particularly relevant, making it easy to define variations on a recursive data structure with different ownership models or metadata. We’ve shown how we use recursion schemes in Nickel, and while there are performance and complexity trade-offs to consider, they can bring value for moderately complex tree types that need to be extended and transformed in various ways.

The classical paper on this subject is Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire.↩
Rust allocates on the stack by default, which makes it easier to overflow (though the stack can be configured to be larger at compile time). However, I have the impression that there’s a misleading idea that recursive functions perform poorly. For tree transformations at least, the iterative version is usually harder to write and can require explicitly representing the context on the heap through an auxiliary data structure such as a zipper, which is likely to perform worse. The stack can overflow, and (recursive) functions call aren’t entirely free either, but in terms of allocation, deallocation and locality, the stack is also hard to beat!↩

A hundred pull requests for Liquid Haskell

Thu, 20 Mar 2025 00:00:00 GMT

A new release of Liquid Haskell is out after quite an active period of development with 99 pull requests in the liquidhaskell repository, and 29 pull requests in the liquid-fixpoint repository from about ten contributors. This post is to provide an overview of the changes that made it into the latest release.

There were contributions to the reflection and proof mechanisms; we got contributions to the integration with GHC; the support of cvc5 was improved when dealing with sets, bags, and maps; and there was a rather large overhaul of the name resolution mechanism.

Reflection improvements

Liquid Haskell is a tool to verify Haskell programs. We can write formal specifications inside special Haskell comments {-@ ... @-}, and the tool will check whether the program behaves as specified. For instance, the following specification of the filter function says that we expect all of the elements in the result to satisfy the given predicate.

{-@ filter :: p:(a -> Bool) -> xs:[a] -> {v:[a] | all p v } @-}

Liquid Haskell would then analyze the implementation of filter to verify that it does indeed yield elements that satisfy the predicate.

To verify such a specification, Liquid Haskell needs to attach a meaning to the names in the predicate all p v. It readily learns that p is a parameter of filter, and that v is the result. all, however, isn’t bound by the specification’s parameters, so it refers to whatever is in scope, which is the Haskell function from the Prelude.

all :: (a -> Bool) -> [a] -> Bool

And Liquid Haskell has a mechanism to provide logic meaning to the implementation of a function like all, known as reflection. While it has always been convenient to reflect functions in modules analyzed by Liquid Haskell, it was not so easy when there was a mix of local and imported definitions from dependencies that are not analysed with Liquid Haskell. Last year, there was an internship at Tweag to address exactly this friction, which resulted in contributions to the latest release.

Reasoning and reflection of lambdas

The reflection mechanism also has other specific limitations at the moment. For instance, it doesn’t allow reflecting recursive functions defined in let or where bindings. And until recently, it didn’t allow reflecting functions that contained anonymous functions. For example,

takePositives = filter (\x -> x > 0)

In the latest release, we have several contributions that introduce support for reflecting lambdas and improve the story for reasoning with them. This feature is considered experimental at the moment, since we will still have usability and performance concerns that deserve further contributions, but one can already explore the experience that we could expect in the long run.

Integration with GHC

In 2020 Liquid Haskell became a compiler plugin for GHC. It was hooked into the end of the type checking phase firstly to ensure it only runs on well-typed programs, and secondly, to ensure the plugin runs when GHC is only asked to typecheck the module but not to generate code, which was helpful to IDEs.

For a few technical reasons, the plugin was re-parsing and re-typechecking the module instead of using the abstract syntax tree (AST) that GHC handed to it as the result of type checking. That is no longer the case in the latest release, where the AST after type checking is now used for all purposes. In addition, there were several improvements to how the ghc library is used.

cvc5 support

Liquid Haskell offloads part of its reasoning to a family of automated theorem provers known as SMT solvers. For most developments, Liquid Haskell has been used with the Z3 SMT solver, and this is what has been used most of the time in continuous integration pipelines.

In theory, any SMT solver can be used with Liquid Haskell, if it provides a standard interface known as SMT-LIB. In practice, however, experiments are done with theories that are not part of the standard. For instance, the reasoning capabilities for bags, sets, and maps used to require z3. But now the latest release implements support for cvc5 as well.

Name resolution overhaul

Name resolution determines, for each name in a program, what is the definition that it refers to. Liquid Haskell, in particular, is responsible for resolving names that appear in specifications. This task was problematic when the programs it was asked to verify spanned many modules.

There were multiple kinds of names, each with their own name resolution rules, and names were resolved in different environments when verifying a module and when importing it elsewhere, not always yielding the same results, which often produced confusing errors.

Name resolution, however, was done all over the code base, and any attempt to rationalize it would require a few months of effort. I started such an epic last September, and managed to conclude it in February. These changes made it into the latest release together with an awful lot of side quests to simplify the existing code.

The road ahead

There is no coordinated roadmap for Liquid Haskell. Much of the contributions that it receives depend on the opportunity enabled by academic research or the needs of particular use cases.

On my side, I’m trying to improve the adoption of Liquid Haskell. Much of the challenge is reducing the amount of common workarounds that the proficient Liquid Haskeller needs to employ today. For instance, supporting reflection of functions in local bindings would save the user the trouble of rewriting her programs to put the recursive functions in the top level. Repairing the support for type classes would allow functions to be verified even if they use type classes, which is a large subset of Haskell today. And without having defined a scope with precision yet, Liquid Haskell still needs to improve its user documentation, its error messages, and its tracing and logging.

The project is chugging along, though. It is making significant leaps in usability. The upgrade costs have been quantified for a few GHC releases, and no longer look like an unbounded risk. The amount of external contributions has increased last year, although we still have to see if it is a trend. And there is no shortage of interest from academia and industrial interns.

Thanks to the many contributors for their work and their help during code reviews. I look forward to learning what makes it into the coming Liquid Haskell releases!

Bazel and Testwell CTC++, revisited

Thu, 06 Mar 2025 00:00:00 GMT

A while ago, we wrote a post on how we helped a client initially integrate the Testwell CTC++ code coverage tool from Verifysoft into their Bazel build.

Since then, some circumstances have changed, and we were recently challenged to see if we could improve the CTC++/Bazel integration to the point were CTC++ coverage builds could enjoy the same benefits of Bazel caching and incremental rebuilds as regular (non-coverage) builds. Our objective was to make it feasible for developers to do coverage builds with CTC++ locally, rather than them using different coverage tools or delaying coverage testing altogether. Thus we could enable the client to focus their efforts on improving overall test coverage with CTC++ as their only coverage tool.

In this sequel to the initial integration, we, as a team, have come up with a more involved scheme for making CTC++ meet Bazel’s expectations of hermetic and reproducible build actions. There is considerable extra complexity needed to make this work, but the result is a typical speedup of 5-10 times on most coverage builds. The kind of speedup that not only makes your CI faster, but that allows developers to work in a different and more efficient way, altogether.

More generally, we hope this blog post can serve as a good example (or maybe a cautionary tale 😉) of how to take a tool that does not play well with Bazel’s idea of a well-behaved build step, and force it into a shape where we can still leverage Bazel’s strengths.

The status quo

You can read our previous blog post for more details, but here we’ll quickly summarize the relevant bits of the situation after our initial integration of CTC++ coverage builds with Bazel:

CTC++ works by wrapping the compiler invocation with its ctc tool, and adding coverage instrumentation between the preprocessing and compiling steps.
In addition to instrumenting the source code itself, ctc also writes instrumentation data in a custom text format (aka. symbol data) to a separate output file, typically called MON.sym (aka. the symbol file).
At runtime the instrumented unit tests will collect coverage statistics and write these (in binary form) to another separate output file: MON.dat.
As far as Bazel is concerned, both the MON.sym and MON.dat files are untracked side-effects of the respective compilation and testing steps. As such we had to poke a hole in the Bazel sandbox and arrange for these files to be written to a persistent location without otherwise being tracked or managed by Bazel.
More importantly, these side-effects mean that we have to disable all caching and re-run the entire build and all tests from scratch every single time. Otherwise, we would end up with incomplete MON.sym and MON.dat files.

Another consideration - not emphasized in our previous post since we had to disable caching of intermediate outputs in any case - is that the outputs from ctc are not hermetic and reproducible. Both the instrumentation that is added to the source code, as well as the symbol file that is written separately by ctc contain the following information that is collected at compile time:

Absolute paths to source code files: Even though Bazel passes relative paths on the command-line, ctc will still resolve these into absolute paths and record these paths into its outputs. Since all these build steps run inside the Bazel sandbox, the recorded paths vary arbitrarily from build to build. Even worse: the paths are made invalid as soon as the sandbox is removed, when the compilation step is done.
Timestamps: ctc will also record timestamps into the instrumented source code and the symbol file. As far as we know, these might have been part of some internal consistency check in previous versions of CTC++, but currently they are simply copied into the final report, and displayed as a property of the associated symbol data on which the HTML report is based. Since our coverage reports are already tied to known Git commits in the code base, these timestamps have no additional value for us.
Fingerprints: ctc calculates a 32-bit fingerprint based on the symbol data, and records this fingerprint into both the instrumented source and the symbol file. Since the symbol data already contains absolute path names as detailed above, the resulting fingerprint will also vary accordingly, and thus not be reproducible from one build to the next, even when all other inputs remain unchanged.

Outlining the problems to be solved

If we are to make CTC++ coverage builds quicker by leveraging the Bazel cache, we must answer these two questions:

Can we make ctc’s outputs reproducible? Without this, re-enabling the Bazel cache for these builds is a non-starter, as each re-evaluation of an intermediate build step will have never-before-seen action inputs, and none of the cached outputs from previous builds will ever get reused.
Can we somehow capture the extra MON.sym output written by ctc at build time, and appropriately include it into Bazel’s build graph?¹ We need for Bazel to cache and reuse the symbol data associated with a compilation unit in exactly the same way that it would cache and reuse the object file associated with the same compilation unit.

Solving both of these would allow us to achieve a correct coverage report assembled from cached object files and symbol data from previously-built and unchanged source code, together with newly-built object files and symbol data from recently-changed source code (in addition to the coverage statistics collected from re-running all tests).

Achieving reproducibility

Let’s tackle the problem of making ctc’s outputs reproducible first. We start by observing that ctc allows us to configure hook scripts that will be invoked at various points while ctc is running. We are specifically interested in:

RUN_AFTER_CPP, allows access to the preprocessed source before the instrumentation step, and
RUN_AFTER_INSTR, allows access to the instrumented source before it’s passed on to the underlying compiler.

From our existing work, we of course also have our own wrapper script around ctc, which allows us to access the outputs of each ctc invocation before they are handed back to Bazel. We also know, from our previous work, that we can instruct ctc to write a separate symbol file per compilation unit, rather than have all compilation units append to the same MON.sym file.

Together this allows us to rewrite the outputs from ctc in such a way as to make them reproducible. What we want to rewrite, has already been outlined above:

Absolute paths into the sandbox: We could rewrite these into corresponding absolute paths to the original source tree instead, but we can just as well take it one step further and simply strip the sandbox root directory prefix from all absolute paths. This turns them into relative paths that happen to resolve correctly, whether they’re taken relative to the sandbox directory at compile time, or relative to the root of the source tree afterwards.
Timestamps: This one is relatively easy, we just need to decide on a static timestamp that does not change across builds. For some reason the CTC++ report tooling did not like us passing the ultimate default timestamp, aka. the Unix Epoch, so we instead settled for midnight on January 1 2024.²
Fingerprints: Here we need to calculate a 32-bit value that will reflect the complete source code in this compilation unit (but importantly with transient sandbox paths excluded). We don’t have direct access to the in-progress symbol data that ctc uses to calculate its own fingerprint, so instead we settle on calculating a CRC32 checksum across the entire preprocessed source code (before ctc adds its own instrumentation).³

Once we’ve figured out what to rewrite, we can move on to the how:

Using the RUN_AFTER_CPP option to ctc, we can pass in a small script that calculates our new fingerprint by running the preprocessed source code through CRC32.
Using the RUN_AFTER_INSTR option to ctc, we can pass in a script that processes the instrumented source, line by line:
- rewriting any absolute paths that point into the Bazel sandbox,
- rewriting the timestamp recorded by ctc into our static timestamp, and
- rewriting the fingerprint to the one calculated in step 1.
In our script that wraps the ctc invocation, we can insert the above two options on the ctc command line. We can also instruct ctc to write a separate .sym file for this compilation unit inside the sandbox.
In the same wrapper script, after ctc is done producing the object file and symbol file for a compilation unit, we can now rewrite the symbol file that ctc produced. The rewrites are essentially the same as performed in step 2, although the syntax of the symbol file is different than the instrumented source.

At this point, we have managed to make ctc’s outputs reproducible, and we can proceed to looking at the second problem from above: properly capturing and maintaining the symbol data generated by ctc. However, we have changed the nature of the symbol data somewhat: Instead of having multiple compilation units write to the same MON.sym file outside of the sandbox, we now have one .sym file per compilation unit written inside the sandbox. These files are not yet known to Bazel, and would be removed together with the rest of the sandbox as soon as the compilation step is finished.

Enabling correct cache/reuse of symbol data

What we want to achieve here is for the symbol data associated with a compilation unit to closely accompany the corresponding object file from the same compilation unit: If the object file is cached and later reused by Bazel, we want the symbol file to be treated the same. And when the object file is linked into an executable or a shared library, we want the symbol file to automatically become part of any coverage report that is later created based on running code from that executable or library.

I suspect there are other ways we could handle this, for example using Bazel aspects, or similar, but since we’re already knee-deep in compiler wrappers and rewriting outputs…

In for a penny, in for a pound…

Given that we want the symbol file to be as closely associated with the object file as possible, let’s take that to the ultimate conclusion and make it a stowaway inside the object file. After all, the object file is “just” an ELF file, and it does not take too much squinting to regard the ELF format as a generic container of sections, where a section really can be any piece of data you like.

The objcopy tool, part of the GNU binutils tool suite, also comes to our aid with options like --add-section and --dump-section to help us embed and extract such sections from any ELF file.

With this in hand, we can design the following scheme:

In our wrapper script, after ctc has generated an object file with an accompanying symbol file, we run objcopy --add-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE to embed the symbol file as a new ctc_sym section inside the object file.
We make no changes to our Bazel build, otherwise. We merely expect Bazel to collect, cache, and reuse the object files as it would do with any intermediate build output. The symbol data is just along for the ride.
In the linking phase (which is already intercepted by ctc and our wrapper script) we can forward the symbol data from the linker inputs (ELF object files) into the linker output (a shared library or executable, also in the ELF format), like this: Extract the ctc_sym from each object file passed as input (objcopy --dump-section ctc_sym=$SYMBOL_FILE $OBJECT_FILE /dev/null), then concatenate these symbol files together, and finally embed that into the ELF output file from the linker.⁴
At test run time, in addition to running the tests (which together produce MON.dat as a side effect), we can iterate over the test executables and their shared library dependencies, and extract any ctc_sym sections that we come across. These are then split into separate symbol files and placed next to MON.dat.
Finally, we can pass MON.dat and all the .sym files on to the ctcreport report generator to generate the final HTML report.⁵

Results

With all of the above in place, we can run coverage builds with and without our changes, while testing various build scenarios, to see what we have achieved.

Let’s look at some sample build times for generating CTC++ coverage reports. All times below are taken from the best of three runs, all on the same machine.

Status quo

Starting with the situation as of our previous blog post:

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total
Entire source tree	38m46s	2m06s	44m26s
One large application	13m59s	43s	15m30s
One small application	21s	1s	35s

Since caching is intentionally disabled and there is no reuse between these coverage builds, these are the kinds of numbers you will get, no matter the size of your changes since the last coverage build.

Let’s look at the situation after we made the changes outlined above.

Worst case after our changes: No cache to be reused

First, for a new coverage build from scratch (i.e. a situation in which there is nothing that can be reused from the cache):

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total	Speedup
Entire source tree	38m48s	1m59s	43m03s	1.0x
One large application	13m04s	43s	14m26s	1.1x
One small application	19s	1s	22s	1.6x

As expected, these numbers are very similar to the status quo. After all, we are doing the same amount of work, and this is not the scenario we sought to improve in any case.

There is maybe a marginal improvement in the overhead (i.e. the time spent between/around bazel and ctcreport), but it’s pretty much lost in the noise, and certainly nothing worth writing a blog post about.

Best case after our changes: Rebuild with no changes

This is the situation where we are now able to reuse already-instrumented intermediate build outputs. In fact, in this case there are no changes whatsoever, and Bazel can reuse the test executables from the previous build directly, no (re-)building necessary. However, as discussed above, we do need to re-run all tests and then re-generate the coverage report:

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total	Speedup
Entire source tree	3m24s	1m58s	6m55s	6.4x
One large application	1m31s	42s	2m49s	5.5x
One small application	1s	1s	4s	8.8x

Common case after our changes: Rebuild with limited change set

This last table is in many ways the most interesting (but least accurate), as it tries to reflect the common case that most developers are interested in:

“I’ve made a few changes to the source code, how long will I have to wait to see the updated coverage numbers?”

Of course, as with a regular build, it depends on the size of your changes, and the extent to which they cause misses in Bazel’s build cache. Here, I’ve done some small source code change that cause rebuilds in a handful of compilation units:

Scope of coverage build + tests	`bazel` build/test	`ctcreport`	Total	Speedup
Entire source tree	3m23s	1m57s	6m54s	6.4x
One large application	1m34s	42s	2m52s	5.4x
One small application	4s	1s	6s	5.8x

The expectation here would be that the total time needed is the sum of how long it takes to do a regular build of your changes, plus the numbers from the no-op case above. And this seems to largely hold true. Especially for the single- application case were we expect your changes to affect application’s unit tests, and therefore the build phase must strictly precede the test runs.

In the full source tree scenario, it seems that Bazel can start running other (unrelated) tests concurrently with building your changes, and as long as your changes, and the tests on which they depend, are not among the slowest tests to run, then those other, slower test will “hide” the marginal build time cost imposed by your changes.

Conclusion

We have achieved what we set out to do: to leverage the Bazel cache to avoid unnecessary re-building of coverage-instrumented source code. It involves a fair amount of added complexity in the build process, in order to make CTC++‘s outputs reproducible, and thus reusable by Bazel, but the end result, in the common case - a developer making a small source code change relative to a previous coverage build - is a 5-10x speedup of the total time needed to build and test with coverage instrumentation, including the generation of the final coverage report.

Future work

A natural extension of the above scheme is to apply a similar treatment to the generation of the coverage statistics at test runtime: Bazel allows for test runs to be cached, so that later build/test runs can reuse the results and logs from earlier test runs, rather than having to re-run tests that haven’t changed.

However, in much the same way as for symbol data at build time, we would need to make sure that coverage statistics (.dat files) were saved and reused along with the corresponding test run results/logs.

One could imagine each test creating a separate .dat file when run, and then have Bazel cache this together with the test logs. The report generation phase would then need to collect the .dat files from both the reused/cached and the new/uncached test runs, and pass them all to the ctcreport tool. Failure to do so correctly would cause coverage statistics to be lost, and the resulting coverage report would be misleading.

With all this in place we could then enable caching of test results (in practice, removing the --nocache_test_results flag that we currently pass), and enjoy yet another speedup courtesy of Bazel’s cache.

That said, we are entering the realm of diminishing returns: Unit tests - once they are built - typically run quickly, and there is certainly less time to be saved here than what is saved by reusing cached build results. Looking at the above numbers: even if we were able to fully eliminate time used by bazel test, we would still only achieve another 2x speedup, theoretically.

For now, we can live with re-running all tests from scratch in order to create a complete MON.dat file, every time.

And that is where I believe it stops: extending this even further to incrementally generate the coverage report itself, in effect to re-generate parts of the report based on a few changed inputs, is - as far as I can see - not possible with the existing tools.

Finally, I want to commend Verifysoft for their understanding and cooperation. I can only imagine that for someone not used to working with Bazel, our initial questions must have seemed very eccentric. They were, however, eager to understand our situation and find a way to make CTC++ work for us. They have even hinted at including a feature in a future version of CTC++ to allow shortening/mapping paths at instrumentation time. Using such a feature to remove the sandbox paths would also have the nice side effect of making CTC++‘s own fingerprint logic reproducible, as far as we can see. Together, this would enable us to stop rewriting paths and fingerprints on our own.

Thanks to Mark Karpov for being my main co-conspirator in coming up with this scheme, and helping to work out all the side quests and kinks along the way.

Also thanks to Christopher Harrison, Joseph Neeman, and Malte Poll for their reviews of this article.

For now, we ignore the non-hermetic writing of MON.dat files. See the section on future work for how tackling this properly is in many ways similar (and similarly complex) to what we’re doing for the CTC++ symbol data in the rest of this article.↩
On reconsideration, we should probably have used the somewhat standardized $SOURCE_DATE_EPOCH environment variable here rather than coming up with our own static date. In practice, it should not matter.↩
In later talks with Verifysoft, we have been given the OK that this fingerprint scheme should be sufficient for our purpose, at least until a new version of CTC++ that allows for more reproducible fingerprints are available.↩
It seems that - by default - the linker is doing almost exactly what we want: The ctc_sym sections from the linker inputs are indeed automatically concatenated into the linker output. However, the linker appears to discard sections from inputs that are completely optimized away at link time. But we do in fact want these symbol data sections to be retained, otherwise the final coverage report would omit the corresponding source files rather than showing them as lacking test coverage. Hence we resort to maintaining the ctc_sym section ourselves at link time.↩
As an extra sanity check, ctcreport will verify that the fingerprints from inside the given .sym files match the corresponding fingerprints recorded alongside the coverage statistics in the MON.dat file. Thus we can discover if we’ve messed up somewhere along the way.↩

Evaluating the evaluators: know your RAG metrics

Thu, 27 Feb 2025 00:00:00 GMT

Retrieval-augmented generation (RAG) is about providing large language models with extra context to help them produce more informative responses. Like any machine learning application, your RAG app needs to be monitored and evaluated to ensure that it continues to respond accurately to user queries. Fortunately, the RAG ecosystem has developed to the point where you can evaluate your system in just a handful of lines of code. The outputs of these evaluations are easily interpretable: numbers between 0 and 1, where higher numbers are better. Just copy our sample code below, paste it into your continuous monitoring system, and you’ll be looking at nice dashboards in no time. So that’s it, right?

Well, not quite. There are several common pitfalls in RAG evaluation. From this blog post, you will learn what the metrics mean and how to check that they’re working correctly on your data with our field-gained knowledge. As they say, “forewarned is forearmed”!

Background

If you’re new to RAG evaluation, our previous posts about it give an introduction to evaluation and discuss benchmark suites. For now, you just need to know that a benchmark suite consists of a collection of questions or prompts, and for each question establishes:

a “ground truth” context, consisting of documents from our database that are relevant for answering the question; and
a “ground truth” answer to the question.

For example

Query	Ground truth context	Ground truth answer
What is the capital of France?	Paris, the capital of France, is known for its delicious croissants.	Paris
Where are the best croissants?	Lune Croissanterie, in Melbourne, Australia, has been touted as ‘the best croissant in the world.’	Melbourne

Then the RAG system provides (for each question):

a “retrieved” context — the documents that our RAG system thought were relevant — and
a generated answer.

The inputs to a RAG evaluator

Example

Here’s an example that uses the Ragas library to evaluate the “faithfulness” (how well the response was supported by the context) of a single RAG output, using an LLM from AWS Bedrock:

from langchain_aws import ChatBedrockConverse
from ragas import EvaluationDataset, evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import Faithfulness

# In real life, this probably gets loaded from an internal file (and hopefully
# has more than one element!)
eval_dataset = EvaluationDataset.from_list([{
    "user_input": "What is the capital of France?",
    "retrieved_contexts": ["Berlin is the capital of Germany."],
    "response": "I don't know.",
}])

# The LLM to use for computing metrics (more on this below).
model = "anthropic.claude-3-haiku-20240307-v1:0"
evaluator = LangchainLLMWrapper(ChatBedrockConverse(model=model))
print(evaluate(dataset=eval_dataset, metrics=[Faithfulness(llm=evaluator)]))

If you paid close attention in the previous section, you’ll have noticed that our evaluation dataset doesn’t include all of the components we talked about. That’s because the “faithfulness” metric only requires the retrieved context and the generated answer.

RAG evaluation metrics

There are a variety of RAG evaluation metrics available; to keep them straight, we like to use the RAG Triad, a helpful system of categorizing some RAG metrics. A RAG system has one input (the query) and two outputs (the context and the response), and the RAG Triad lets us visualize the three interactions that need to be evaluated.

The RAG triad

Evaluating retrieval

Feeding an LLM with accurate and relevant context can help it respond well; that’s the whole idea of RAG. Your system needs to find that relevant context, and your evaluation system needs to figure out how well the retrieval is working. This is the top-right side of the RAG Triad: evaluating the relationship between the query and the retrieved context. The two main retrieval metrics are precision and recall; each one has a classical definition, plus an “LLM-enhanced” definition for RAG. Roughly, “good precision” means that we don’t return irrelevant information, while “good recall” means that we don’t miss any relevant information. Let’s say that each of our benchmark queries is labelled with a ground truth set of relevant documents, so that we can check how many of the retrieved documents are relevant.

Then the classical precision and recall are

\text{precision} = \frac{\text{\# relevant retrieved docs}}{\text{\# retrieved docs}} \qquad \text{recall} = \frac{\text{\# relevant retrieved docs}}{\text{\# relevant docs in the database}}

These metrics are well-established, useful, and easy to compute. But in a RAG system, the database might be large, uncurated, and contain redundant documents. For example, suppose you have ten related documents, each containing an answer to the query. If your retrieval system returns just one of them then it will have done its job adequately, but it will only receive a 10% recall score. With a large database, it’s also possible that there’s a document with the necessary context that wasn’t tagged as relevant by the benchmark builder. If the retrieval system finds that document, it will be penalized in the precision score even though the document is relevant.

Because of these issues with classical precision and recall, RAG evaluations often adapt them to work on statements instead of documents. We list the statements in the ground-truth context and in the retrieved context; we call a retrieved statement “relevant” if it was present in the ground-truth context.

\text{precision} = \frac{\text{\# relevant retrieved statements}}{\text{\# retrieved statements}} \qquad \text{recall} = \frac{\text{\# relevant retrieved statements}}{\text{\# ground truth statements}}

This definition of precision and recall is better tailored to RAG than the classical one, but it comes with a big disadvantage: you need to decide what a “statement” is, and whether two statements are “the same.” Usually you’ll want to automate this decision with an LLM, but that raises its own issues with cost and reliability. We’ll say more about that later.

Evaluating generation

Once your retrieval is working well — with continuous monitoring and evaluation, of course — you’ll need to evaluate your generation step. The most commonly used metric here is faithfulness¹, which measures whether a generated answer is factually supported by the retrieved context; this is the bottom side of the RAG Triad. To calculate faithfulness, we count the number of factual claims in the generated answer, and then decide which of them is supported by the context. Then we define

\text{faithfulness} = \frac{\text{\# context-supported statements}}{\text{\# statements}}

Like the RAG-adapted versions of context precision and recall, this is a statement-based metric. To automate it, we’d need an LLM to count the factual claims and decide which of them is context-supported.

You can evaluate faithfulness without having retrieval working yet, as long as you have a benchmark with ground truth contexts. But if you do that, there’s one crucial point to keep in mind: you also need to test generation when retrieval is bad, like when it contains distracting irrelevant documents or just doesn’t have anything useful at all. Bad retrieval will definitely happen in the wild, and so you need to ensure that your generation (and your generation evaluation) will degrade gracefully. More on that below.

Evaluating the answer

Finally, there is a family of commonly-used generation metrics that evaluate the quality of the answer by comparing it to the prompt and the ground truth:

answer semantic similarity measures the semantic similarity between the generated answer and the ground truth;
answer correctness also compares the generated answer and the ground truth, but is based on counting factual claims instead of semantic similarity; and
answer relevance measures how well the generated answer corresponds to the question that the prompt asked. This is the top-left side of the RAG Triad.

These metrics directly get to the key outcome of your RAG system: are the generated responses good? They come with the usual pluses and minuses of end-to-end metrics. On the one hand, they measure exactly what you care about; on the other hand, when they fail you don’t know which component is to blame.

As you’ve seen above, many of the metrics used for evaluating RAG rely on LLMs to extract and evaluate factual claims. That means that some of the same challenges you’ll face while building your RAG system also apply to its evaluation:

You’ll need to decide which model (or models) to use for evaluation, taking into account cost, accuracy, and reliability.
You’ll need to sanity-check the evaluator’s responses, preferably with continuous monitoring and occasional manual checks.
Because the field is moving so quickly, you’ll need to evaluate the options yourself — any benchmarks you read online have a good chance of being obsolete by the time you read them.

When the judges don’t agree

In order to better understand these issues, we ran a few experiments on a basic RAG system — without query re-writing, context re-ranking or other tools to improve retrieval — using the Neural Bridge benchmark dataset as our test set. We first ran these experiments in early 2024; when we re-visited them in December 2024 we found that newer base LLMs had improved results somewhat but not dramatically.

The Neural Bridge dataset contains 12,000 questions; each one comes with a context and an answer. We selected 200 of these questions at random and ran them through a basic RAG system using Chroma DB as the vector store and either Llama 2 or Claude Haiku 3 as the LLM for early 2024 and December 2024 runs, respectively. The RAG system was not highly tuned — for example, its retrieval step was just a vector similarity search — and so it gave a mix of good answers, bad answers, and answers saying essentially “I don’t know: the context doesn’t say.” Finally, we used Ragas to evaluate various metrics on the generated responses, while varying the LLMs used to power the metrics.

Experimental results

Our goal in these experiments was to determine:

whether the LLM evaluators were correct, and
whether they were consistent with one another.

We found that different LLMs are often not in agreement. In particular, they can’t all be correct.

Here are the evaluation scores of five different models on four different metrics, averaged across our benchmark dataset. You’ll notice a fair amount of spread in the scores for faithfulness and context precision.

Average metrics scores across models

But the scores above are just averages across the dataset — they don’t tell us how well the LLMs agreed on individual ratings. For that, we checked the correlation between model scores and again found some discrepancies between models. Here are the results for answer relevancy scores: the correlations show that even though the different models gave very similar average scores, they aren’t in full agreement.

Correlation of answer relevancy scores across models. A score of one means that the models agree completely, while a score of zero means that they agree or disagree essentially at random.

It might not be too surprising that models from the same family (GPT 3.5 and 4, and Sonnet 3 and 3.5) had larger overlaps than models from different families. If your budget allows it, choosing multiple uncorrelated models and evaluating with all of them might make your evaluation more robust.

When faithfulness gets difficult

We dug a little more into the specific reasons for LLM disagreement, and found something interesting about the faithfulness score: we restricted to the subset of questions for which retrieval was particularly bad, having no overlap with the ground truth data. Even the definition of faithfulness is tricky when the context is bad. Let’s say the LLM decides that the context doesn’t have relevant information and so responds “I don’t know” or “The context doesn’t say.” Are those factual statements? If so, are they supported by the context? If not, then according to the definition, the faithfulness is zero divided by zero. Alternatively, you could try to detect responses like this and treat them as a sort of meta-response that doesn’t go through the normal metrics pipeline. We’re not sure how best to handle this corner case, but we do know that you need to do it explicitly and consistently. You also need to be prepared to handle null values and empty responses from your metrics pipeline, because this situation often induces them.

Experimental results

On the subset of questions with poor retrieval our Ragas-computed faithfulness scores ranged from 0%, as judged by Llama 3, to more than 80%, as judged by Claude 3 Sonnet. We emphasize that these were faithfulness scores evaluated by different LLMs judging the same retrievals, responses, and generated answers. Even if you exclude Llama 3 as an outlier, there is a lot of variation.

Faithfulness scores across models, when the context is bad

This variation in scores doesn’t seem to be an intentional choice (to the extent that LLMs can have “intent”) by the evaluator LLMs, but rather a situation of corner cases compounding one another. We noticed that this confusing situation made some models — Llama 3 most often, but also other models — fail to respond in the JSON format expected by the Ragas library. Depending on how you treat these failures, this can result in missing metrics or strange scores. You can sidestep these issues somewhat if you have thorough evaluation across the entire RAG pipeline: if other metrics are flagging poor retrieval, it matters less that your generation metrics are behaving strangely on poorly-retrieved examples.

In general, there’s no good substitute for careful human evaluation. The LLM judges don’t agree, so which one agrees best with ground truth human evaluations (and is the agreement good enough for your application)? That will depend on your documents, your typical questions, and on future releases of improved models.

Conclusion

Oh, were you hoping we’d tell you which LLM you should use? No such luck: our advice would be out of date by the time you read this, and if your data doesn’t closely resemble our benchmark data, then our results might not apply anyway.

In summary, it’s easy to compute metrics for your RAG application, but don’t just do it blindly. You’ll want to test different LLMs for driving the metrics, and you’ll need to evaluate their outputs. Your metrics should cover all the sides of the RAG triad, and you should know what they mean (and be aware of their corner cases) so that you can interpret the results. We hope that helps, and happy measuring!

The terminology is not quite settled: what Ragas calls “faithfulness,” TruLens calls “groundedness.” Since the RAG Triad was introduced by TruLens, you’ll usually see it used in conjunction with their terminology. We’ll use the Ragas terminology in this post, since that’s what we used for our experiments.↩

From minimal skeletons to comprehensive transactions with cooked-validators

Thu, 20 Feb 2025 00:00:00 GMT

Cooked Validators is a Haskell library designed to simplify the complex process of crafting and testing transactions on the Cardano blockchain. Writing proper transactions in Cardano can be challenging due to its UTXO-based model, which requires precise definitions and careful structuring of inputs, outputs, and complementary components. cooked-validators tackles these challenges by offering a powerful framework for defining transactions in a minimal and declarative manner while incorporating a significant degree of automation.

One of the library’s core strengths lies in its ability to help developers transform simple transaction templates, referred to as “skeletons”, or TxSkel, into fully-formed transactions that satisfy the technical requirements of Cardano’s validation process. This automation not only minimizes boilerplate code but also reduce the room for errors, thus streamlining the creation and testing of transactions. In particular, we’ve used cooked-validators extensively to rigorously audit smart contracts for many clients and well-known products now live on Cardano.

Although cooked-validators has been a reliable tool for years, no blog post has yet explored how it automates key aspects of transaction creation, simplifying complex processes into manageable workflows. This post aims to fill that gap by showcasing how the library helps developers build Cardano transactions with ease and efficiency, allowing them to focus on high-level design and intent rather than getting bogged down by low-level technical details.

Validating transactions in `cooked-validators`

cooked-validators provides a convenient way to interact with the blockchain through a type class abstraction, MonadBlockChain. Among the primitives provided by this type class, the most fundamental is validateTxSkel which:

takes a transaction skeleton as input,
expands the skeleton’s content based on missing parts and skeleton options,
generates a transaction,
submits this transaction for validation, and
returns the validated transaction, or throws an error if it is invalid.

Thus, the function has the following type signature:

validateTxSkel :: (MonadBlockChain m) => TxSkel -> m CardanoTx

In the remainder of this post, we will explore the fields of the transaction skeleton (TxSkel) and how validateTxSkel behaves when automatically expanding this skeleton.

Transaction skeletons

Cardano transactions are usually represented by large Haskell records containing a predefined set of fields that evolve alongside the Cardano protocol. The traditional approach to building transactions involves directly creating instances of these records and submitting them for validation.

In cooked-validators, however, transactions are further abstracted through a custom record called TxSkel, which has its own set of fields, some of which map directly to corresponding fields in a Cardano transaction, while others guide the translation process. The primary motivation behind using this abstraction is to highlight the most relevant information for common use cases while hiding less critical details that can be inferred automatically based on the provided data¹.

There are several additional reasons for the use of TxSkel:

Our transaction skeletons embed as much type information as possible for scripts and UTXOs, thus increasing type-safety.
Each transaction skeleton includes its own set of options to guide transaction generation, with sensible default values.
Our transaction skeletons have default values for all fields, allowing users to provide minimal information relevant to their use-case.
Our skeleton elements use meaningful, yet simple types, avoiding the need for the complex overlays and type annotations commonly found in Cardano or Ledger APIs, which are avoided by defaulting to the current Cardano era.

While TxSkel is designed to be lighter and more user-friendly than Cardano transactions, it does not compromise user flexibility. Since TxSkel ultimately generates Cardano transactions, users are provided with the option to manually tweak the generated transaction if desired. This ensures that users retain full control and can build their Cardano transactions in any way they prefer.

To build a transaction skeleton, users simply override the fields they need to set from the default skeleton, txSkelTemplate.

txSkelTemplate
  { txSkelIns = ...,
    txSkelMints = ...,
	...
  }

From manual ADA payments to automated transaction balancing

The first feature one might expect from a transaction is to pay assets to a given peer. Surprisingly, this can be quite complex due to the underlying extended UTXO model on which Cardano is based. Without diving too deeply into the details, it’s important to understand that exchanging assets in Cardano is done through “pouches” of various sizes, called UTXOs. If Alice wants to send 12 ADA (Cardano’s currency) to Bob, and she possesses one UTXO with 4 ADA and another with 10 ADA, she will have to provide both UTXOs, create a new UTXO with 12 ADA for Bob, and return a UTXO with 2 ADA for herself. Moreover, she will also need to account for transaction fees, meaning the returning UTXO will actually contain something like 1.998222 ADA (1,998,222 lovelace).

In summary, this seemingly simple payment of 12 ADA will result in a transaction with 2 inputs and 2 outputs, along with an additional “phantom” payment corresponding to the transaction fees. However, from the user’s perspective, the key point is that Alice needs to pay 12 ADA to Bob. cooked-validators allows users to focus on these high-level intentions, as demonstrated by the following skeleton:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ ada 12],
    txSkelSigners = [alice]
  }

In this skeleton, we specify that the transaction pays 12 ADA to Bob and that Alice is a signer of the transaction. And that’s it.

Internally, cooked-validators processes this skeleton through a balancing phase. In this context, “balancing” is a multifaceted term. It not only refers to ensuring that the inputs and outputs of the transaction contain the same amount of ADA (and other assets)², but also to calculating fees, accounting for them in the transaction, and handling associated collaterals when necessary (funds that are made available within the transaction in case a script failure occurs during validation). This automated process is a part of the added value provided by cooked-validators.

Computing fees, collaterals, and balancing transactions is notoriously difficult in Cardano due to circular dependencies (higher fees imply more collaterals, which increase transaction size, which in turn leads to higher fees…) and the unpredictable resource consumption of scripts in terms of memory space and computation cycles. See cooked-validators’s documentation for the details of what balancing involves, how cooked-validators performs it, and the options available to control balancing. Notably, cooked-validators is non-invasive, meaning that the automation can be disabled if needed. For instance, users can manually set fees and collaterals and even balance their transactions themselves.

After balancing, the skeleton will look like this:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ ada 12, paysPk alice $ lovelace 1_998_222],
    txSkelIns = Map.fromList [(aliceUtxo1, emptyTxSkelRedeemer), (aliceUtxo2, emptyTxSkelRedeemer)],
    txSkelSigners = [alice]
  }

In most cases, this skeleton will remain hidden from the user, though it can be retrieved and used if necessary by manually invoking the balancing function or checking the logs.

From manual payments to automated minimal amount of ADA

While Alice is using Cardano, she might come across non-ADA tokens with custom names⁴, such as mySmartContractToken. These tokens are provided by smart contracts and dedicated to specific purposes such as NFTs to represent ownership of a certain resource. Alice might also want to send such a token to Bob:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ mySmartContractToken 1],
    txSkelSigners = [alice]
  }

As shown above, cooked-validators will attempt to balance this skeleton by retrieving an instance of mySmartContractToken from Alice’s UTXOs, along with the necessary ADA to cover the transaction fee. However, validating the resulting balanced skeleton will fail because Cardano requires every UTXO to include a minimum amount of lovelace to cover its storage cost. This minimum amount, derived from the protocol parameters, also acts as a safeguard against potential security risks that could arise if UTXOs were allowed to exist without any ADA. Thankfully, cooked-validators can automatically calculate this required amount when the appropriate transaction option is enabled. The updated skeleton then becomes:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ permanentToken 1],
    txSkelSigners = [alice],
    txSkelOpts = def { txOptEnsureMinAda = True }
  }

Enabling this option triggers an initial transformation pass, before balancing, which calculates the required amount of ADA to sustain the output and adds this amount to the transaction skeleton. After both passes, the skeleton will resemble something like, with remainingValue being the original value in Alice’s UTXO minus the fees and the payment to Bob:

txSkelTemplate
  { txSkelOuts = [paysPk bob $ permanentToken 1 <> lovelace 546_000, paysPk alice remainingValue],
    txSkelIns = Map.singleton aliceUtxo emptyTxSkelRedeemer,
    txSkelSigners = [alice],
    txSkelOpts = def { txOptEnsureMinAda = True }
  }

By default, txOptEnsureMinAda is set to False, which may seem counterintuitive. However, this prevents unexpected adjustments to ADA amounts that may have been carefully computed. If a transaction output is meant to contain a specific ADA amount based on a precise calculation, but the protocol requires a higher minimum, enabling this option would silently modify the value. This could obscure computation errors, allowing transactions to validate without the user realizing the discrepancy. To stay true to cooked-validators’s philosophy of minimal intervention, the option remains off by default, ensuring that any necessary adjustments are made explicitly.

From spending scripts to automated script witness binding

In the previous examples, we saw how cooked-validators can handle the addition of inputs in a transaction skeleton. However, there are cases where one might want to manually specify the inputs. This is typically necessary when a transaction needs to consume UTXOs belonging to scripts, in which case a redeemer must be provided, as it cannot be inferred automatically. A redeemer is a piece of information (which may be empty) required whenever a script from a smart contract is invoked. This redeemer usually informs the script as to why it has been called, and can also pass dynamic values as inputs to the script. In the examples above, the added inputs were UTXOs from peers, so emptyTxSkelRedeemer was automatically provided.

When consuming scripts, collaterals must be included in case the validation process fails after the script execution. These collaterals cover the computation resources used during validation, which cannot be covered by fees, as fees are only paid if the transaction is successfully validated. The inclusion (or omission) of collaterals, depending on whether the transaction involves scripts, is handled during balancing. Collaterals can only be provided as UTXOs from peers, so a signer is also required in such cases, even if no peer UTXO is consumed. A transaction skeleton that consumes a script can thus be written as:

txSkelTemplate
  { txSkelIns = Map.singleton scriptUtxo $ someTxSkelRedeemer scriptRedeemer,
    txSkelSigners = [alice],
  }

From this skeleton, cooked-validators offers two types of automation. The first is the balancing mechanism, which has already been discussed. Beyond computing fees and collaterals, the balancing process also creates an output at the first signer’s address to return any excess value from inputs and consumes a UTXO from the user to cover the transaction fees.

The second automation concerns the addition of script witnesses. On-chain, scripts are represented by their hash, which serves different purposes depending on the script’s type³—address for spending scripts, policy ID for minting scripts, or staking ID for staking scripts. However, during validation, scripts must be executed, and their hash alone is insufficient. Instead, users must supply the full scripts as witnesses, ensuring their hash matches the expected on-chain hash.

When a UTXO is created at a spending script’s address, cooked-validators retains the script, allowing it to automatically attach the required witness for these inputs in future transactions. However, for minting or staking scripts, the tool lacks knowledge of the necessary witnesses, so they must be specified manually.

Since September 2022, Cardano supports reference scripts, which are complete scripts stored on-chain in UTXOs. These reference scripts can be used as witnesses in place of the full script, reducing transaction size and fees. cooked-validators also automates the inclusion of such reference scripts. In practice, when a script witness is required, the following process unfolds:

if a witness is manually provided by the user, it is used as is.
if no such witness exists, cooked-validators attempts to find a reference witness among known UTXOs.
if no such witness could be found, and the script is used for spending a UTXO, cooked-validators attempts to find a direct witness among its known scripts.

In the previous example, assuming a reference witness was present on some UTXO, the skeleton will look like this:

txSkelTemplate
  { txSkelIns = Map.fromList
      [ (scriptUtxo, TxSkelRedeemer scriptRedeemer (Just referenceInputWithScript)),
        (aliceUtxoForFees, emptyTxSkelRedeemer)
      ],
    txSkelOut = [paysPK alice $ valueInScriptUtxo <> valueInAliceUtxo <> negate fee],
    txSkelSigners = [alice],
  }

From issuing proposals to automated deposit payment

The final type of automation we will discuss in this post involves proposals issued by users, a feature introduced in the Conway era. These proposals can vary, but the most common are parameter changes, where users propose new values for parameters that control on-chain behaviors. These proposals must obey a set of constitutional rules, which is checked by a constitution script. For example, here is a skeleton where Alice proposes to update the cost of fees per byte in the size of a transaction to 100 lovelace, witnessed by a given constitution script:

txSkelTemplate
  { txSkelProposals = [simpleTxSkelProposal alice (TxGovActionParameterChange [FeePerByte 100])
                        `withWitness` (constitutionScript, emptyTxSkelRedeemer),
    txSkelSigners = [alice],
  }

Each proposal requires a deposit of a certain amount of lovelace, as specified by the protocol parameters. cooked-validators takes such deposits into account during the balancing phase. It looks up the current required deposit amount and retrieves this amount from the available UTXOs from the balancing wallet to include in the transaction. After balancing, the skeleton will look like this:

txSkelTemplate
  { txSkelProposals = [simpleTxSkelProposal alice (TxGovActionParameterChange [FeePerByte 100])
                        `withWitness` (constitutionScript, emptyTxSkelRedeemer),
    txSkelIns = Map.singleton aliceUtxo emptyTxSkelRedeemer,
    txSkelOuts = [paysPK alice (valueInAliceUtxo <> negate fee <> negate depositValueFromParams)],
    txSkelSigners = [alice],
  }

Currently, cooked-validators allows the users to provide any constitution script to validate whether the proposal adheres to constitutional rules. In practice, the ledger prevents any such script that does not correspond to the current official Cardano constitution. Thus, in the future, cooked-validators might automatically fetch this script and attach it to proposals.

Conclusion

One of cooked-validators’ main strengths is its ability to allow users to express their high-level transaction requirements conveniently and efficiently, without having to deal with the intricate technical details of the resulting transaction. This is achieved through TxSkels, which are transaction abstractions that can be partially filled by users. cooked-validators performs several passes on these partial skeletons, such as filling in missing minimal ADA, balancing the transaction, and automatically adding witnesses, to translate these minimal skeletons into transactions that can be submitted for validation. This blog post has summarized these key automation steps, stay tuned for more posts around cooked-validators.

it is always possible to override those fields in the generated transaction though, as cooked-validators never forces users to build their transactions one way or another.↩
the actual balancing equation is more complicated: withdrawals + inputs + mints = burn + outputs + deposits + fees↩
this name here stands for the combination of a token name and a policy ID.↩
all scripts are defined in a same way following the Conway era, what we call script types are only abstractions to reference the way they are used.↩

Bashfulness

Thu, 13 Feb 2025 00:00:00 GMT

When I first joined the Topiary Team, I floated the idea of trying to format Bash with Topiary. While this did nothing to appease my unenviable epithet of “the Bash guy,” it was our first foray into expanding Topiary’s support beyond OCaml and simple syntaxes like JSON.

Alas, at the time, the Tree-sitter Bash grammar was not without its problems. I got quite a long way, despite this, but there were too many things that didn’t work properly for us to graduate Bash to a supported language.

Fast-forward two years and both Topiary and the Tree-sitter Bash grammar have moved on. As the incumbent Bash grammar was beginning to cause downstream problems from bit rot — frustratingly breaking the builds of both Topiary and Nickel — my fellow Topiarist, Nicolas Bacquey, migrated Topiary to the latest version of the Bash grammar and updated our Bash formatting queries to match.

With surprisingly little effort, Nicolas was able to resolve all those outstanding problems. So with that, Bash was elevated to the lofty heights of “supported language” and — with the changes I’ve made from researching this blog post — Bash formatting is now in pretty good shape in Topiary v0.6.

So much so, in fact, let me put my money where my mouth is! Let’s see how Topiary fares against a rival formatter. I’ll do this, first, by taking you down some of the darker alleys of Bash parsing, just to show you what we’re up against.

Hello darkness, my old friend

There is a fifth dimension beyond that which is known to man. It is a dimension as vast as space and as timeless as infinity. It is the middle ground between light and shadow, between science and superstition; it lies between the pit of man’s fears and the summit of his knowledge. This is the dimension of imagination. It is an area we call: the Bash grammar.

In our relentless hubris, man has built a rocket that — rather than exploding on contact with reality — dynamically twists and turns to meet reality’s expectations. Is that a binary? Execute it! Is that a built-in? Execute it! Is that three raccoons in a trench coat, masquerading as a function? Execute it! And so, with each token parsed, we are Bourne Again and stray ever further from god.

Bear witness to but a few eldritch horrors:¹

Trailing comments must be preceded by whitespace or a semicolon. However, if either of those are escaped, they are interpreted as literals and this changes the tokenisation semantics:
```
echo \ # Ceci n'est pas
 | une pipe'
```
Here, perhaps the writer intended to add a comment against the first line. But, what looks like a comment isn’t a comment at all; it becomes an argument to echo, along with everything that follows. That includes the apostrophe in “n’est”, which is interpreted as an opening quote — a raw string — which is closed at the end of the next line.
Case statements idiomatically delimit each branch condition with a closing parenthesis. In a subshell, for example, this leads to unbalanced brackets:
```
( case $x in foo )   # Wat?...
echo bar;; esac )    # 🤯
```
This subshell outputs bar when the variable $x is equal to foo. Whereas, on a more casual reading, this formulation might just look like a confusing syntax error.

Speaking of case statements, did you know that ;& and ;;& are also valid branch terminators? Without checking the manual — if you can find the single paragraph where it’s mentioned — can you tell me how they differ?
Bash will try to compute an array index if it looks like an arithmetic expression:
```
# Output the (foo - bar)th element of array
echo "${array[foo-bar]}"
```
However, if array in this example is an associative array (i.e., a hash map/dictionary), then foo-bar could be a valid key. In which case, it’s not evaluated and used verbatim.
Without backtracking, it’s not possible to distinguish between an arithmetic expansion and a command substitution containing a subshell at its beginning or end:
```
echo $((foo + bar))
echo $((foo); (bar))
```
Here, the first statement will output the value of the addition of those two variables; the second will execute foo then bar, each in a subshell, echoing their output. In the subshell case, the POSIX standards even recommend that you add spaces — e.g., $( (foo) ) — to remove this ambiguity.
Heredocs effectively switch the parser into a different state, where everything is interpreted literally except when it isn’t. This alone is tricky, but Bash introduces some variant forms that allow additional indentation (with hard tabs), switching off all string interpolation, or both.
```
# Indented, with interpolation
cat <<-HEREDOC
	I am a heredoc. Hear me roar.
	HEREDOC
```

Suffice to say, any formatter has their work cut out.

Battle of the Bash formatters

The de facto formatter for Bash is shfmt. It’s written in Go, by Daniel Martí, actively maintained and has been around for the best part of a decade.

Let’s compare Topiary’s Bash formatting with shfmt in a contest worthy of a Netflix special. I’ll look specifically at each tool’s parsing and formatting capabilities as well as their performance characteristics. I won’t, however, compare their subjective formatting styles, as this is largely a matter of taste.

What Topiary can’t do that `shfmt` can²

When it comes to formatting Bash in a way that is commonly attested in the wild, there are three things that Topiary cannot currently do. Unfortunately, these are either from the absence of a feature in Topiary, or a lack of fidelity in the Tree-sitter grammar; no amount of hacking on queries will fix them.

The worst offender is probably the inability to distinguish line continuations from other token boundaries. These are used in Bash scripts all the time to break up long commands into more digestible code. In the following example, the call to topiary was spread over multiple lines, with line continuations. Topiary slurps everything onto a single line, whereas shfmt preserves the original line continuations in the input:

# Topiary
topiary format --language bash --query bash.scm <"${script}"

# shfmt
topiary format \
    --language bash \
    --query bash.scm \
    <"${script}"

One saving grace is that Topiary’s Bash parser understands a trailing |, in a pipeline, to accept a line break. As such — while it isn’t my personal favourite style³ — Topiary does support multi-line pipelines. Arguably, they even look a little nicer in Topiary than in shfmt, which only preserves where the line breaks occurred in the input:

# Topiary
foo |
  bar |
  baz |
  quux

# shfmt
foo | bar |
    baz | quux

Otherwise, in Topiary, every command is a one-liner…whether you like it or not!

Next on the “nice to have” list is the long-standing (and controversial) feature request of “alignment blocks”; specifically for comments. That is, presumably related comments appearing on a series of lines should be aligned to the same column:

# Topiary
here # comment
is # comment
a # comment
sequence # comment
of # comment
commands # comment

# shfmt
here     # comment
is       # comment
a        # comment
sequence # comment
of       # comment
commands # comment

The tl;dr of the controversy is that, despite being a popular request — and we all know where popularity gets us, these days — it’s a slap in the face to one of Topiary’s core design principles: minimising diffs. Because we live in a universe where elastic tabstops never really took off, a small change to the above example — say, adding an option to one of the commands — would produce the following noisy diff:

-here     # comment
-is       # comment
-a        # comment
-sequence # comment
-of       # comment
-commands # comment
+here                      # comment
+is                        # comment
+a                         # comment
+sequence                  # comment
+of                        # comment
+commands --with-an-option # comment

For the time being, Topiary won’t be making alignment great again.

Finally, string interpolations — with command substitution and arithmetic expansions — cannot be formatted without potentially breaking the string itself. This is particularly true of heredocs; the full subtleties of which escape the Tree-sitter Bash grammar and so are easily corruptible with naive formatting changes. As such, Topiary has to treat these as immutable leaves and leave them untouched:

# Topiary
echo "2 + 2 = $((  2+  2 ))"

cat <<EOF
Today is $(   date )
EOF

# shfmt
echo "2 + 2 = $((2 + 2))"

cat <<EOF
Today is $(date)
EOF

So far, I have only found three constructions that are syntactically correct, but the Tree-sitter Bash grammar cannot parse (whereas, shfmt can):

A herestring that follows a file redirection (issue #282):
```
rev > output <<< hello
```
A workaround, for now, is to switch the order; so the herestring comes first.

A heredoc that uses an empty marker (issue #283):

cat <<''
Only a monster would do this, anyway!

Similar to line continuations, the Tree-sitter Bash grammar seems to swallow escaped spaces at the beginning of tokens, interpreting them as tokenisation whitespace rather than literals (issue #284):
```
# This should output:
# 
# 
# < >
# 
printf "<%s>\n" a b \  c
```

For what it’s worth, shfmt also supports POSIX shell and mksh (a KornShell implementation). As of writing, there are no Tree-sitter grammars for these shells. However, their syntax doesn’t diverge too far from Bash, so it’s likely that Topiary’s Bash support will be sufficient for large swathes of such scripts. Moreover, the halcyon years of the 1990s are a long way behind us, so maybe this doesn’t matter.
What shfmt can’t do that Topiary can²

shfmt is part of a wider project that includes a Bash parser for the Go ecosystem. A purpose-built parser, particularly for Bash, should perform better than the generalised promise of Tree-sitter and, indeed, that’s what we see. However, there are a few minor constructions that shfmt doesn’t like, but the Tree-sitter Bash grammar accepts:

An array index assignment which uses the addition augmented assignment operator:

my_array=( foo [0]+=bar )

To be fair to shfmt, while this is valid Bash, not even the venerable ShellCheck can parse this!

Topiary leaves array indices unformatted, despite them allowing arithmetic expressions. shfmt, however, will add whitespace to any index that looks like an arithmetic expression (e.g., [foo-bar] will become [ foo - bar ]); even if the original, unspaced version could be a valid associative array key.

(Neither Topiary nor shfmt can handle indices containing spaces. However, the standard Bash workaround™ is to quote these: ${array["foo bar"]}.)

Brace expansions can appear — perhaps surprisingly — almost anywhere. Particularly surprising to shfmt is when they appear in variable declarations, which it cannot parse:

declare {a,b,c}=123 # a=123 b=123 c=123 declare foo{1..10}=bar # foo1=bar foo2=bar ... foo10=bar

While it’s a bit of a hack,⁴ we also implement something akin to “rewrite rules” in our Topiary Bash formatting queries, which shfmt (mostly) doesn’t do. This is to enforce a canonical style over certain constructions. Namely:

All $... variables are rewritten in their unambiguous form of ${...}, excluding special variables such as $1 and $@. (Note that this doesn’t affect $'...' ANSI C strings, despite their superficial similarity.)

All function signatures are rewritten to the name() { ... } form, rather than function name { ... } or function name() { ... }.

All POSIX-style [ ... ] test clauses are rewritten to the Bash [[ ... ]] form.

All legacy $[ ... ] arithmetic expansions are rewritten to their $(( ... )) form.

All `...` command substitutions are rewritten to their $( ... ) form.

(This is one that shfmt does do.)

Technically, it is also possible to write rules that put quotes around unquoted command arguments, ignoring things like -o/--options. While this is good practice, we do not enforce this style as it changes the code’s semantics and there may be legitimate reasons to leave arguments unquoted.

Throughput

Let’s be honest: If you have so much Bash to format that throughput becomes meaningful, then formatting is probably the least of your worries. That being said, it is the one metric that we can actually quantify.

Our first problem is that we need a large corpus of normal scripts. By “normal,” I mean things that you’d see in the wild and could conceivably understand if you squint hard enough. This rules out the Bash test suite, for example, which — while quite large — is a grimoire of weird edge cases that neither Topiary nor shfmt handle well. Quite frankly, if you’re writing Bash that looks like this, then you don’t deserve formatting:

: $(case a in a) : ;#esac ;; esac)

Digging around on r/bash, I came across this repository of scripts. They’re all fairly short, but they’re quite sane. This will do.

We need to slam large amounts of Bash into the immovable objects that are our formatters; a “Bash test dummy,”⁵ if you will. It would be ideal if we could stream Bash into our formatters — so we could orchestrate sampling at regular time intervals — however, neither Topiary nor shfmt support streaming formatting. This stands to reason as there are cases where formatting will depend on some future context, so the whole input will need to be read upfront. As such, we need to invert our approach to collecting metrics and sample over input size instead.

The general method is:

Locate the scripts in the repository that are Bash, by looking at their shebang.

Filter this list to those which Topiary can handle without tripping over itself because of some obscure parsing failure. (We assume shfmt doesn’t require such a concession.)

Perform $N$ trials, in which:

The whitelist of scripts is randomised, to remove any potential confounding from caching.

The top $M$ scripts are concatenated to obtain a single trial input.⁶ This is to increase the input size to the formatters in each trial, which is presumed to be the dependent variable, but may be subject to confounding effects when the input is small.

The trial input is read to /dev/null a handful of times to warm up the filesystem cache.

The trial input is fed into the following, with benchmarks — trial input size (bytes) and runtime (nanoseconds) — recorded for each:

cat, which acts as a control;

Topiary (v0.5.1; release build, with the query changes described in this blog post);

Topiary, with its idempotence checking disabled;

shfmt (v3.10.0).

This identified 156 Bash scripts within the test repository; of which, 154 of them could be handled by Topiary.⁷ On an 11th generation Intel Core i7, at normal stepping, with $N=50$ and $M=25$ , on a Tuesday afternoon, I obtained the following results:

cat, which does nothing, is unsurprisingly way out in front; by two orders of magnitude. This is not interesting, but establishes that input can be read faster than it can be formatted. That is, our little experiment is not accidentally I/O bound.

What is interesting is that Topiary is about 3× faster than shfmt. We also see that the penalty imposed by idempotency checking — which formats twice, to check the output reaches a fixed point — is quite negligible. This indicates that most of the work Topiary is doing is in its startup overhead, which involves loading the grammar and parsing the formatting query file.

Since Topiary only has to do this once per trial, it’s a little unfair to set $M=25$ ; that is, an artificially enlarged input that is syntactically valid but semantically meaningless. However, if we set $M=1$ (i.e., individual scripts), then we see a similar comparison:

For small inputs, the idempotency check penalty is barely perceptible. Otherwise, the startup overhead dominates for both formatters — hence the much lower throughput values — but, still, Topiary comfortably outperforms shfmt by a similar factor.

And the winner is…

In an attempt to regain some professional integrity, I’ll fess up to the fact that Topiary has a bit of a home advantage and maybe — just maybe — I’m ever so slightly biased. That is, as we are in the (dubious) position of building a plane while attempting to fly it, I was able to tweak and fix a few of our formatting rules to improve Topiary’s Bash support during the writing of this blog post:

I added formatting rules for arrays (and associative arrays) and their elements.

I corrected the formatting of trailing comments that appear at the end of a script.

I corrected the function signature rewriting rule.

I corrected the formatting of a string of commands that are interposed by Bash’s & asynchronous operator.

I fixed the formatting of test commands and added a rewrite rule for POSIX-style [ ... ] tests.

I implemented multi-line support for pipelines.⁸

I updated the $... variable rewrite rule to avoid targeting special forms like $0, $? and $@, etc.

I implemented a rewrite rule that converts legacy $[ ... ] arithmetic expansions into their $(( ... )) form.

I implemented a rewrite rule that converts `...` command substitutions into their $(...) form.

I fixed the spacing within variable declarations, to accommodate arguments and expansions.

I forced additional spacing in command substitutions containing subshells, to remove any ambiguity with arithmetic expansions.

The point I’m making here is that these adjustments were very easy to conjure up; just a few minutes of thought for each, across our Tree-sitter queries, was required.

So who’s the winner?

Well, would it be terribly anticlimactic of me, after all that, not to call it? shfmt is certainly more resilient to Bash-weirdness and, of the “big three” I discussed, its line continuation handling is a must have. However, Topiary does pretty well, regardless: It’s much faster, for what that’s worth, and — more to the point — far easier to tweak and hack on.

Indeed, when the Topiary team first embarked upon this path, we weren’t even sure whether it would be possible to format Bash. Now that the Tree-sitter Bash grammar has matured, Topiary — perhaps with future fixes to address some of its shortcomings, uncovered by this blog post — is a contender in the Bash ecosystem.

Thanks to Nicolas Bacquey, Yann Hamdaoui, Tor Hovland, Torsten Schmits and Arnaud Spiwack for their reviews and input on this post, and to Florent Chevrou for his assistance with the side-by-side code styling.

It’s very likely that the syntax highlighting for the more exotic Bash snippets in this blog post will be completely broken.↩

…Yet.↩

My preferred multi-line pipeline style is to have a line continuation and then the | character on the next line, indented:

foo \ | bar \ | baz \ | quux

I personally find this much clearer, but Topiary cannot currently handle those pesky line continuations. For shame!↩

Topiary’s formatting rules include node deletion and delimiter insertion. However, delimiters can be any string, so we can coopt this functionality to create basic rewrite rules.↩

I’m also the “terrible pun guy.”↩

This exposed an unexpected bug, whereby Topiary’s formatting model breaks down when some complexity (or, by proxy, size) limit is reached. This behaviour had not been previously observed and further investigation is required.↩

The two failures were due to the aforementioned herestring and complexity⁶ problems.↩

It may also be possible to implement multi-line && and || lists in a similar way. However, the Tree-sitter grammar parses these into a left-associative nested (list) structure, which is tricky to query.↩

The refactoring of a Haskell codebase

Thu, 06 Feb 2025 00:00:00 GMT

Common engineering scenario: There is a large legacy codebase out there which is known to have a few pervasive problems that everyone wants to get rid of. But nobody understands all the details of the codebase, and few are willing to risk breaking the artifact in a long and costly surgery. This post is an experience report on one such refactoring of Liquid Haskell (LH), a tool to verify Haskell programs.

LH has grown mostly from academic contributions that demonstrate the feasibility of some proof technique or another. Since the focus of a demonstration is not always placed on generality, a new user can find unresolved problems, sometimes blockers that make adoption difficult. Let us look at one such example.

The problem: Name resolution

LH requires the user to write specifications for the various parts of the program she wants to verify. Suppose we have a module with a type to describe the verbosity of a program.

module Verbosity where data Verbosity = Quiet | Verbose

And suppose that we also have a module where we declare the configuration of the program.

module Config where import Verbosity data Config = Config Verbosity

For the sake of brevity, this program can only configure the verbosity. Now let us add some more definitions in the Config module to construct a configuration and to give it a specification.

{-@ measure isVerboseConfig @-} isVerboseConfig :: Config -> Bool isVerboseConfig (Config Verbose) = True isVerboseConfig _ = False {-@ verboseConfig :: {v:Config | isVerboseConfig v } @-} verboseConfig :: Config verboseConfig = Config Verbose

The annotation {-@ measure isVerboseConfig @-} indicates to LH that we want to use isVerboseConfig in specifications, as we do in the specification of verboseConfig:

{-@ verboseConfig :: {v:Config | isVerboseConfig v } @-}

This specification says that we expect verboseConfig to have verbosity Verbose, and LH will verify so, but first it has to find out what names like Verbosity and Verbose refer to. For this sake, LH inspects the imports of the module to learn that these names come from the Verbosity module.

Now, when we import the module elsewhere

module Main where import Config ...

the specification of verboseConfig should be available to verify the new module Main. This time we would hope we wouldn’t need to resolve the names of the imported specs again. Alas, when changing modules LH discards the name resolution of imported modules and needs to resolve names a second time. But module Main is missing the import of module Verbosity, which provides the names that LH needs to resolve.

Easily enough, we can import Verbosity in module Main and declare the problem solved. Unfortunately, this solution means that in large programs we need to import explicitly the transitive dependencies of the modules we want to verify, which is too much to ask of our kind users.

We must consider, then, why LH is discarding the name resolution of imported modules. It turns out that it is a pretty structural reason: all names in Liquid Haskell are represented as strings. While at places there is an effort to make the strings unambiguous, the representation makes resolved and unresolved names hard to distinguish without doing some parsing, and there are just too many opportunities to mistake one for the other. Matters are worsened by the fact that LH does not keep the keys (GHC Names) that allow us to retrieve type information used for verification, and finding these keys from the environments of different modules is not trivial either.

The refactoring process

LH is a tool of 28000 lines of code; changing its representation of names is not an easy refactoring, especially when much of the knowledge on the implementation details still needs to be acquired, as it was in my case. Another alternative would be to rewrite LH from scratch, this time making it right. But to accomplish this I would also need to have complete awareness of all the quirks in the old implementation. We couldn’t argue either that we have a method or a technology that promises a better outcome, so refactoring it had to be then.

Ideally, we would replace all strings with a more structured type to represent resolved names, which should weed out the accidental mistakes and omissions. The change was so massive though, that making a single contribution with the whole change was impractical.

It would have been difficult for anyone to review such a large contribution.

If tests in the testsuite didn’t pass, it would be difficult to identify which part of the changes affected the test outcome.

If tests uncovered issues with the design, we would have invested much effort into an implementation built on the wrong assumptions.

It would have been difficult to estimate the overall effort.

For these reasons, it is essential that whatever plan is chosen, the refactoring is carried over in sufficiently small and incremental steps. Fortunately, name resolution admits breaking down the task, as the many language constructs used in specifications can be resolved separately. I started by resolving names of Haskell types used in specifications, and then I could resolve names in measure annotations, and later names in assumptions, and later names of data constructors in specifications for algebraic data types, and a long etcetera.

There were also choices to make about introducing a new representation. For instance, I knew from the start that I wanted to use GHC Names for all names pointing to Haskell entities (type constructors, data constructors, functions). This makes the name representation as precise as it can ever be. But should I arrange data structures to be parametric on the name representation?

data Spec name = ...

The parser then would produce specifications that use strings when the names are unresolved, and later on convert them to a type of specifications that have resolved names.

parse :: String -> Either [Error] (Spec String) resolveNames :: Spec String -> Spec GHC.Name

This is close to how the GHC compiler manages different representations of names in the various phases of the compilation pipeline.

Making the abstract syntax tree (AST) parametric, and then implementing the traversals and updating function type signatures was going to be some work, and it didn’t look like the parametricity would help me catch a lot of mistakes. The alternative that I adopted was to replace strings with a sum type called LHName that could hold both resolved or unresolved names.

data LHName = LHNResolved ... | LHNUnresolved String

The parser produces LHNUnresolved values, and a generic syb-style traversal takes care of changing those to LHNResolved during name resolution. In intent, all names are resolved after name resolution, though this knowledge is not explicit in the types of the AST. This would be a problem if some odd function after name resolution expected unresolved names, or if name resolution accidentally left names unresolved. But I don’t regard the runtime errors arising in those cases too likely to escape the testing. After I modified the AST one string occurrence at a time, the type checker dutifully flagged every use of string names that needed updating.

Being in a strongly typed language, it feels sinful to defer to runtime the checks that could be detected via the type system, like parameterizing the AST. The implementation of the GHC compiler is a notable example where the common ASTs are parametric on the variable representation. Parameterizing the specification may still be considered in LH, though it wasn’t an absolute prerequisite to start.

One advantage of LHNames with respect to strings is that LHNames can hold GHC Names. And another advantage is that passing an unresolved LHName where a resolved name is expected produces a runtime error, whereas with strings we got undefined behavior. This is most helpful when serializing specifications to import them later. If any unresolved name is found at that time, an error is produced. Moreover, propagating the switch to LHName through the code helped finding the places that were mistakenly producing unresolved names.

Another interesting choice came when deciding the representation for logic names. These are names that refer to entities in the logic, usually unknown to the Haskell compiler. Logic names can refer to functions or type aliases that can be used in specifications. An example of a type alias is the one for non-negative integers.

{-@ type Nat = {v:Int | v >= 0} @-}

For the refactoring, the major difference between Haskell and logic names was that logic names need to be fed to liquid-fixpoint, the theorem prover that LH uses to discharge proof obligations, and liquid-fixpoint does expect strings as a representation for names. Because liquid-fixpoint is used as a library, data structures in LH and liquid-fixpoint share the name representation, which made using a sum type like LHName harder in this case.

One option would have been to generalize liquid-fixpoint to deal with other representations for names. But this was going to be another project on its own. It seemed more practical to keep the interface to liquid-fixpoint unaffected, so the plan was to parse names as strings, resolve them to LHName, serialize the specs, and then convert the LHName back to strings before interacting with liquid-fixpoint. In this way, I could still reuse the output of name resolution when importing specs.

If I didn’t want to have two versions of the AST with strings and with LHNames, then oh surprise, I had to parameterize specifications with logic names. Ignoring environments and other details I ended up with a schematic interface like

data Spec logicName = ... parse :: String -> Either [Error] (Spec String) resolveNames :: Spec String -> Spec LHName serializeSpec :: Spec LHName -> ByteString convertToLiquidFixpoint :: Spec LHName -> Spec String

Now the syb-traversals were no longer good to implement name resolution, as the transformation is changing the type of the AST, so I had to implement it with a mix of stock Traversable instances and manual traversals. And I find a bit amusing that I chose a parametric representation for the sake of reusing data structures, and I’m still not doing it to have more precise types.

The current state of the refactoring

At the time of writing, the resolution of all Haskell names in LH annotations is persisted and reused when importing specifications. Some of the logic names are handled in the same fashion, but there are a few cases needed still to complete the refactoring. The state of the refactoring and all of the related contributions can be checked in the corresponding GitHub issue.

There were quite a few side quests derived from the name resolution refactoring. I found it challenging to stay focused on name resolution and not try to fix all the things I discovered broken along the way. These were details like type parameters that could be removed since they were always instantiated to the same type, or fields in record data types that were never read, or functions that were almost dead-code if I could remove just that one use site that should be doing something different. I ended up fixing a bunch of secondary problems when they were easy enough to resolve. But I had to give up more than once on a few issues that turned out to be deeper than anticipated; a humbling exercise, if you will, where I had to admit my goals of the day to be too ambitious for the sake of progressing on the main refactoring.

I’m excited at the prospect of leaving behind the kind of user-facing errors that the old implementation induced. Much of the success rests on having formulated a way that allowed to perform the task incrementally, always keeping the test suite passing. The disarray of name resolution was identified as problematic by both contributors and users, and for much of the specification language it is already a thing of the past.

Writing a formatter has never been so easy: a Topiary tutorial

Thu, 30 Jan 2025 00:00:00 GMT

A bit more than one year ago, Tweag announced our open-source, universal formatting engine Topiary, based on the tree-sitter ecosystem. Since then, Topiary has been serving as the official formatter (under the hood) for the Nickel configuration language. Topiary also supports a bunch of other languages (CSS, TOML, OCaml, Bash) and we are seeing people trying it out to support even more languages such as Catala, Nushell, Nix, and more. While I’ve kind of been part of the project from a distance, I’m first and foremost a happy user of Topiary, which I genuinely find really cool both conceptually and practically. While the technical documentation provides an extensive description of Topiary’s capabilities, it doesn’t include (as of now) a complete step-by-step guide on how to write a new formatter for your own language starting from zero. In this post, I’ll show you precisely how to do that.

Why you should use Topiary

Let’s say that you’ve authored a great payroll management application and created a new niche programming language named Yolo to describe tax logic for different countries (tax calculation is all but a trivial subject!). Developers these days aren’t satisfied with an obscure command-line interpreter anymore. They expect beautiful colors, they expect auto-completion, they expect automatic and uniform formatting, they expect package management and a package registry to distribute their code!

While some of those features are just too much work for a niche language, formatting does sound like a basic commodity that you could provide. Alas, this is only true on the surface. At a high-level, a formatter performs the following steps:

Parse the input to a structured representation

Pretty-print the result while respecting parts of the original layout (comments, some line breaks, etc.)

Sometimes you can reuse the parser and the representation of your language implementation, but it’s not guaranteed, as parsing for formatting, interpretation or for compilation have different requirements. If you’ve ever written a serious pretty-printer, with indentation, single-line versus multi-line layout, line-wrapping and all, you’ll know that it’s also not as simple as it looks. For a serious formatter, you’ll need to search for a variety of patterns and treat them in a specific way.

The worst part about all of this is that many of these tasks are generic (not language specific) and laborious, but we still need to reimplement them for every formatter under the sun. It’s frustrating!

This is where Topiary comes in. Topiary is a generic formatter that leverages tree-sitter, an incremental parsing framework. Chances are your language already has a tree-sitter grammar, or it probably should, if you want basic editor support such as syntax highlighting. Given a tree-sitter grammar definition for a language, Topiary will handle parsing and pretty-printing automatically for you. What’s left to do is to use Topiary’s declarative language to write formatting rules. You can focus on the actual logic of the formatter and delegate the boring stuff to Topiary.

As a teaser, beyond the initial setup, you’ll only need to write rules that look like this somewhere in a file:

; Add indentation to the condition of pattern guards in a match branch (match_branch (pattern_guard "if" @append_indent_start (term) @append_indent_end ) )

And you’ll get a formatter! Neat, isn’t it?

There is one caveat: Topiary doesn’t plan to officially support formatting whitespace-sensitive languages, such as Python or Haskell. Depending on the language, it might or might not be doable, but it is likely to be troublesome.

Writing a formatter for Yolo

A Yolo file defines inputs and outputs for a tax calculation using the eponymous keywords:

input income, status output net_income, income_tax

The rest of the file defines the output as functions of the inputs and other outputs. They can be either simple arithmetic formulas, or they can be defined by case analysis with basic support for boolean conditions:

income_tax := case { status = "exempted" | income < 10000 => 0, _ => income * 0.2 } net_income := income - income_tax

Step 1: the tree-sitter grammar

This tutorial isn’t about writing a tree-sitter grammar, but since it’s a requirement for Topiary and I want this post to be exhaustive, I can’t just leave this part out. I’ll quickly cover how to spin up a tree-sitter grammar for a language and how to understand tree-sitter output.

Setup

You’ll need to install the tree-sitter CLI with a recent version (tested with 0.24). I’ll use Nix to install it, but other installation methods are documented in the tree-sitter documentation.

$ nix profile install nixpkgs#tree-sitter $ mkdir tree-sitter-yolo $ cd tree-sitter-yolo $ tree-sitter init [.. prompts from tree-sitter to init your repo ..]

tree-sitter init generates a bunch of files, but the one we care about is grammar.js. This is a grammar definition of your language in JavaScript. I won’t go into the details of tree-sitter grammar development but instead just provide a simple definition for our toy language Yolo.

Here is a simple tree-sitter grammar for Yolo. Even if you don’t know JavaScript nor tree-sitter very well, it should be reasonably readable.

Then, we need to ask tree-sitter to generate the parser source files for Yolo and build it:

tree-sitter generate tree-sitter build

If everything went well, you should have a file yolo.so at the root of your grammar directory.

The grammar

The grammar defines the shape of the tree that tree-sitter will produce and that your formatter will manipulate. You might need to refine the grammar later to support finer formatting rules.

What’s important to understand is how a parse tree is represented. Let’s take the original Yolo example in full and put it in a test.yolo file:

input income, status output net_income, income_tax income_tax := case { status = "exempted" | income < 10000 => 0, _ => income * 0.2 } net_income := income - income_tax

tree-sitter will parse it to a tree that looks like this¹ (some subtrees have been collapsed for brevity):

Images aren’t really suitable for interaction and automation, though. Fortunately, tree-sitter uses a syntax called S-expressions to represent and manipulate such trees as text. You can ask tree-sitter to print the text representation:

tree-sitter parse test.yolo --no-ranges

The full output is a bit verbose, but very instructive. Let’s take a quick look at it. I’ve added the corresponding source next to each node as a ;-delimited comment for clarity. The nesting structure is given by the parentheses, which introduce a new node starting with a name and followed by the node’s children.

(tax_rule (statement (input_statement ; input income, status (identifier) ; income (identifier))) ; status (statement (output_statement ; output net_income, income_tax (identifier) ; net_income (identifier))) ; income_tax (statement (definition_statement ; income_tax := case { ... } (identifier) ; income_tax (expression (case ; case { ... } (case_branch ; status = "exempted" | income < 10000 => 0 condition: (condition ; status = "exempted" | income < 10000 (condition ; status = "exempted" (identifier) ; status (expression (string))) ; "exempted" [..]) ; | income < 10000 body: (expression (number))) ; 0 (case_branch ; _ => income * 0.2 [..]) [..]

You can take another look at the image above and try to match each node with a line in the S-expression (beware that I didn’t collapse exactly the same parts in the S-expression and in the image). We can see labels such as condition: and body: which we have introduced in the grammar using the tree-sitter field() helper, to make things easier to read and to use.

Some nodes seem to be missing from the S-expression: where are the operators or keywords such as |, :=, or case? Those are unnamed nodes in the tree-sitter jargon, which are hidden by default in the S-expression representation — but they are there in the tree nonetheless.

Step 2: the Topiary setup

Let’s now install Topiary and extend it with our grammar. Since Topiary 0.5, we don’t need to mess with the source code nor rebuild it anymore to add a custom language. Instead we can configure it.

First, install Topiary version 0.5.1 or higher. I will once again use Nix magic², but the Topiary repository comes with pre-built binaries and other installation methods.

nix profile install github:tweag/topiary

Then, write the following Nickel configuration file in your grammar repository:

# topiary-yolo.ncl { languages = { yolo = { extensions = ["yolo"], grammar.source.path = "/path/to/tree-sitter-yolo/yolo.so", } } }

This defines the file extensions for yolo and the path to the compiled grammar³. If one day the grammar is published to a git repository, you can specify a git repository and a revision instead. See Topiary’s documentation for more information.

The last ingredient is the query file, which contains the formatting rules. We’ll start with an empty one:

mkdir -p ~/.config/topiary/queries touch ~/.config/topiary/queries/yolo.scm

Using TOPIARY_LANGUAGE_DIR to point Topiary to our extra query directory, we can now try to format our program. Topiary formats in-place by default, but for now we use shell redirections to avoid mutating the original file:

$ export TOPIARY_LANGUAGE_DIR=~/.config/topiary/queries $ topiary format --configuration topiary-yolo.ncl --skip-idempotence --language yolo < test.yolo inputincome,statusoutputnet_income,income_taxincome_tax:=case{status="exempted"|income<10000=>0,_=>income*0.2}net_income:=income-income_tax

Well, that’s not exactly what we expected, but something happened! Because our formatter is somehow empty, and Topiary consider that languages are whitespace-insensitive by default, all spaces have just been eaten up (--skip-idempotence disables a sanity check that would have rejected the output).

We can finally start to write the meat of our Yolo formatter to fix this!

Step 3: the queries

Queries are patterns that match subtrees of the input. A query is decorated with captures, which are attributes that are attached to matched nodes (prefixed with the @ sign). When a query matches, the tree is decorated with the corresponding captures. For tree-sitter, captures are generic extra annotations, but Topiary interprets them to format the output as desired.

I encourage you to read the reference documentation on tree-sitter queries at one point. Topiary’s README lists all captures that you can use with Topiary. Comments are introduced with a leading ; in the query file.

In the following, the code snippets are to be appended to the query file ~/.config/topiary/queries/yolo.scm. First, we’ll tell Topiary to ensure some spacing around operators:

; Do not mess with spaces within strings (string) @leaf ; Do not remove empty lines between statements, for readability and space (statement) @allow_blank_line_before ; Always surround operators with spaces [ "=" ">" "<" "&" "|" "_" "=>" "+" "-" "*" ":=" ] @prepend_space @append_space

Those queries will match the corresponding nodes wherever they appear in the tree. Now, let’s stipulate that each statement must be separated by at least a new line:

; Add a newline between two consecutive statements ( (statement) @append_hardline . (statement) )

We’ve used a tree-sitter anchor ., which ensures that this pattern matches two consecutive statements with nothing in between (except maybe unnamed nodes), so that we don’t add a new line before the first one or after the last one, but only between each consecutive pair. Topiary won’t add a second new line if the source already has one: existing spacing is mostly forgotten (except when using @allow_blank_line_before or @append/prepend_input_softline) while query-introduced spacing is accumulated and flattened (this includes whitespace and line breaks). For example, if two different queries append a space after a node, the final result will still be that only one space is appended.

The statement nodes have more content than the query makes it look like, if you look back at the output of tree-sitter parse (a single child and many grand-children) in step 1. Indeed, you can omit irrelevant siblings and children by default in tree-sitter queries.

Let’s format the case branches now. We want to put the initial case { on the same line, then each branch indented and on their own line, and finally the closing } alone on its line.

; Lay out the case skeleton (case "{" @append_hardline @append_indent_start "}" @prepend_indent_end ) ; Put case branches on their own lines (case (case_branch) @append_hardline )

Again, because extra children and siblings can appear in the matched subtree by default, the second query will match each branch of each case expression once, and not only a case expression with a single branch.

It looks like we could merge those two queries since they both control how the case is formatted. However, it’s in fact much harder to get the combined query right than just concatenating both, if even possible. In general, it’s both simpler and better to split your queries into small and topically coherent atoms, even if they apply to the same top-level node.

Let’s try to format a mangled version of our original Yolo file:

input income, status output net_income, income_tax income_tax := case { status="exempted" | income<10000 => 0, _ => income*0.2} net_income := income - income_tax

$ topiary format --configuration topiary-yolo.ncl --skip-idempotence --language yolo < mangled.yolo inputincome,status outputnet_income,income_tax income_tax := case{ status = "exempted" | income < 10000 => 0 , _ => income * 0.2 } net_income := income - income_tax

Better, but we have some troubleshooting to do.

First, spaces are missing between input or output and the list of identifiers.

Second, we’d like to add a space after the comma and make sure there’s no space before the comma: input income, status. We also want a space between case and the following {.

Finally, the comma following a case branch is wrongly laid out on the next line. We are impacted by the way we wrote our grammar here: the comma is actually grouped with the next branch in the grammar as repeat(seq(",", $.case_branch)). We could either change the grammar or adapt the query. We choose the latter for simplicity.

Here’s the diff of the fix:

--- a/yolo.scm +++ b/yolo.scm @@ -19,6 +19,21 @@ ":=" ] @prepend_space @append_space +; Add space after `input` and `output` decl +[ + "input" + "output" +] @append_space + +; Add a space after and remove space before the comma in an identifier list +( + (identifier) + . + "," @prepend_antispace @append_space + . + (identifier) +) + ; Add a newline between two consecutive statements ( (statement) @append_hardline @@ -28,11 +43,17 @@ ; Lay out the case skeleton (case - "{" @append_hardline @append_indent_start + "{" @prepend_space @append_hardline + "}" @prepend_hardline +) + +; Indent the content of case +(case + "{" @append_indent_start "}" @prepend_indent_end ) ; Put case branches on their own lines (case - (case_branch) @append_hardline + "," @append_hardline )

Now, we can try to format the mangled Yolo file again. We finally get rid of --skip-idempotence as we now output valid Yolo, and can format in-place.

$ topiary format --configuration topiary-yolo.ncl mangled.yolo $ cat mangled.yolo input income, status output net_income, income_tax income_tax := case { status = "exempted" | income < 10000 => 0, _ => income * 0.2 } net_income := income - income_tax

And voilà!

Conclusion

In this post, we’ve seen how to set up a formatter for a new language using Topiary from scratch, creating a tree-sitter grammar, configuring Topiary, and writing our formatting rules. I hope that it’s a convincing demonstration that writing a code formatter has never been easier than today thanks to Topiary. Our formatter is simple but honest. In a follow-up post, I’ll cover more advanced features, such as multi-line versus single-line formatting, measuring scopes, comments, and more. Stay tuned!

You can refer to Topiary’s documentation to learn how to generate those graphs.↩

Although the Nix way is the easiest, the installation can take some time. Don’t panic if Nix doesn’t show any output for a while. Also note that we don’t install from nixpkgs but directly from the GitHub repository: nixpkgs doesn’t have the latest Topiary version yet.↩

At the time of writing, using grammar.source.path unfortunately doesn’t work on Windows. You can still use the git revision style to point to your local tree-sitter-yolo repo, see Topiary documentation.↩

Contract Testing: Shifting Left with Confidence for Enhanced Integration

Thu, 23 Jan 2025 00:00:00 GMT

In software development, especially with microservices, ensuring seamless integration between components is crucial for delivering high-quality applications. One approach I really like, to tame this complexity, is contract testing.

Contract testing is a powerful technique that focuses on verifying interactions between software components early and in a controlled environment. In this post, I want to show why I think contract testing can often reduce the amount of integration testing.

Contract Testing

A contract, in this context, is a scenario that describes an interaction between two components. A very simple contract could describe a call to a REST API, and its response, but they can describe more complex scenarios.

Contract testing consists in testing both components against the contract and specifications. Crucially, there are two different tests: not only is Service-1 tested against the contract, but Service-2 is tested against the specifications. Contract testing doesn’t involve both components at the same time.

Contrary to unit testing with a mock, contract tests are bi-directional, verifying both requirements and implementation during build time. Also contrary to integration testing, we don’t need to tackle the preparation of dependencies in an integration testing environment. Contract testing can be run in an isolated way whenever there is an update, even locally. For these reasons, I consider contract testing as both easy and useful.

Challenges in Integration Testing

As the number of services grow, the number of interactions between them, that integration tests are traditionally responsible for testing, grows, and the challenges of interaction testing become more apparent:

Integration tests are late

Integration tests require a lot of context

Integration tests are expensive

Let me elaborate.

Integration tests are late

Integration testing is conducted after several stages in the pipeline, including static checks, code building, unit tests, reviews, and deployment to the test environment. Providing feedback on integration issues after all these steps requires repeating the entire process multiple times. Consequently, running integration tests can result in delayed feedback. While this approach may seem reasonable due to the structured process, it can become a significant bottleneck in the pipeline for large projects.

Integration tests require a lot of context

Integration testing environments encompass the integration of all necessary cloud resources, effectively creating production-like settings. This approach is valuable as it provides comprehensive feedback on the overall system’s performance. However, evaluating the impact of a single commit within such an environment is often slow and inefficient. Maintaining these environments is challenging, and any issues or failures in dependent services can lead to false-positive results.

Another significant challenge is flakiness, which frequently arises from improperly managed test data that might be used by one of the dependent services. Managing this data is complex due to the numerous dependencies involved in creating and manipulating it.

Integration tests are expensive

Challenges in integration testing makes it expensive to maintain and run tests. Building a complex production-like testing environment requires all dependent services and databases to be updated and functioning. Running a single check requires to go all the way down to the network. Imagine how hard, slow and expensive it is to run integration tests given the following network traffic:

The microservice architectural pattern has led to a notable increase of such complex networks. For this reason, the shift-right strategy of doing integration tests late in the pipeline became less relevant, and microservice pioneers like Netflix or Amazon have been advocating for a shift-left strategy such as contract testing to test their massive networks.

The first thing you observe from the image is that we have a lot of integrations between the microservices. With hundreds of microservices, the number of integrations between them becomes too many. Consider two services, which have one interaction. With three services, it can be up three, then six, ten, fifteen, twenty-one, and so on. This increases drastically as the number reaches hundreds. For example, 100 services have 4950 potential integrations, and 500 services have 124750. This is based on the $n$ -choose- $k$ formula where $n$ is the number of microservices, and $k=2$ (as we’re counting pairs of services that can interact bidirectionally):
${n\choose 2} = \frac{n!}{(n - 2)! \space \times \space 2!} = \frac{n(n-1)}{2}$
This calculates the maximum number of integrations with $n$ services. Asymptotically, the number of interactions grows as the square of the number of services. It is not realistic to say we will have that many of integrations but it gives an idea of how fast the number of interactions grows. On the other hand, one interaction can involve many API calls, each requiring tests.

Let’s give a real-world example. Say we have 10 microservices and the integration between the microservices are shown in the table:

Microservice Integrates With

User Service Authentication Service, Profile Service, Notification Service

Authentication Service User Service, Authorization Service, API Gateway

Profile Service User Service, Notification Service, Database Service

Notification Service User Service, Profile Service, External Email Service

Authorization Service User Service, Resource Service, API Gateway

Resource Service Authorization Service, Logging Service, Payment Service

Billing Service User Service, Payment Service, Notification Service

Payment Service Billing Service, User Service, Notification Service

Logging Service Resource Service, Monitoring Service, Notification Service

Monitoring Service Logging Service, Notification Service, Dashboard Service

Total Integration Points = $\sum$ (Number of integrations for each microservice)
Total Integration Points = 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 + 3 = 30

Again each interaction can require many tests.

Netflix, Google, and Amazon are pioneers in microservices testing. Netflix publicly shared their experience, showing how their testing evolved. Netflix and Spotify have also changed the traditional test pyramid, turning it into a test diamond/honeycomb. While unit tests are still important, the focus has shifted to writing more integration tests rather than extensive unit tests. To learn more about Spotify’s test pyramid transformation for microservice testing, read this post.

Consumer-Driven Contract Testing

In the terminology of contract testing, a consumer is a client of the API under test while a provider is a service that exposes the API. The most common architecture for contract-testing is consumer-driven contract testing, where the contract is defined in the consumer component, and shared with the provider. The converse, provider-driven contract testing is mostly useful when you have a public API and want to share contracts with unknown consumers. I’ll be focusing on consumer-driven contract testing.

Imagine the following scenario:

Order Service (Consumer): Responsible for managing orders and inventory.

Inventory Service (Provider): Maintains the inventory levels of products. The Order Service needs to check the availability of products in the Inventory Service before processing an order.

The Order Service needs to check the availability of products in the Inventory Service before processing an order. The Order Service sets the expectation as the consumer by defining this expectation in a specification which produces a contract document. The contract is then stored by a special service, called the broker, which makes the contract available to the provider for its own tests.

We’ll use Pact to ensure that the Inventory Service meets the expectations of the Order Service. Pact is the most popular contract-testing framework and supports many languages. Another popular contract-testing framework is Spring Cloud Contract which supports JVM based applications.

Consumer Implementation in Python (Order Service):

# test_order_service.py import unittest from pact import Consumer, Provider import requests class OrderServicePactTest(unittest.TestCase): def setUp(self): # create a `pact` object by defining the consumer and the provider self.pact = Consumer('OrderService').has_pact_with(Provider('InventoryService'), pact_specification_version="3.0.0") self.pact.start_service() self.addCleanup(self.pact.stop_service) self.base_url = 'http://localhost:1234' def test_order_service(self): # simple order object that should be the response expected = { 'product_id': '123', 'available': True } # setting the specification (self.pact .given('Product 123 exists') # set a precondition .upon_receiving('a request to check product availability') # this is the name of the interaction aka test case, `description` of the interaction in the contract .with_request('get', '/inventory/123') # request detail .will_respond_with(200, body=expected)) # response detail # running the specification with self.pact: result = requests.get(f'{self.base_url}/inventory/123') self.assertEqual(result.json(), expected) if __name__ == '__main__': unittest.main()

Let’s run the test on the consumer side (Order Service):

python -m unittest test_order_service.py

Upon running the command above, the specification: “a request to check product availability”, in the Pact consumer test, is turned into an interaction in a document. This document is the contract between OrderService and InventoryService which is generated by Pact in a JSON file (e.g. orderservice-inventoryservice.json):

{ "provider": { "name": "InventoryService" }, "consumer": { "name": "OrderService" }, "interactions": [ { "description": "a request to check product availability", "request": { "method": "GET", "path": "/inventory/123", "headers": {} }, "response": { "status": 200, "headers": {}, "body": { "available": true } } } ], "metadata": { "pactSpecification": { "version": "3.0.0" } } }

Notice how this test can easily be run locally on your machine. It can just as easily run in CI, even if the inventory service is in another repository, since no actual inventory service is required to run the test. It doesn’t require a complex setup or configuration, and runs on the local network loop, which is very fast.

When the contract is ready, we can publish it to the Pact broker which is a service for holding all the contracts.

export PACT_BROKER_BASE_URL=<patc-broker-url> export PACT_BROKER_USERNAME=<username> export PACT_BROKER_PASSWORD=<password> pact-broker publish ./pacts/orderservice-inventoryservice.json \ --consumer-app-version consumer-version \ --broker-base-url $PACT_BROKER_BASE_URL \ --broker-username $PACT_BROKER_USERNAME \ --broker-password $PACT_BROKER_PASSWORD \ --tag dev

You can either run the Pact broker locally or use the cloud services provided by Pact. Either way, you’ll have to set up a few environment variables for the Pact CLI to connect to the service. To run the Pact broker locally, you should select an database adapter such as sqlite or postgres and then run the Docker command.

docker run -d --name pact-broker -p 9292:9292 \ -e PACT_BROKER_DATABASE_ADAPTER=sqlite \ -e PACT_BROKER_DATABASE_NAME=/var/pact_broker/db.sqlite3 \ -v $(pwd)/pact_broker:/var/pact_broker \ pactfoundation/pact-broker

Provider Implementation in Python (Inventory Service):

# test_inventory_service.py import unittest from pact import Verifier class InventoryServicePactTest(unittest.TestCase): def test_inventory_service(self): # define the verifier by defining the `provider` which will be used to get all the contracts # whose provider are set to `InventoryService` so that to run all the verification tests verifier = Verifier(provider='InventoryService', provider_base_url='http://localhost:8000') pact_broker_url = '' broker_username = '' broker_password = '' # `verify_with_broker` connects to the pact broker and pulls all the related contract and does the verification verifier.verify_with_broker( broker_url=pact_broker_url, broker_username=broker_username, broker_password=broker_password, publish_version='1.0.0', provider_tags=['master'] ) if __name__ == '__main__': unittest.main()

First, we should run the provider service (inventory service):

python inventory_service.py

Then, let’s run the provider verification test for the inventory service:

python -m unittest test_inventory_service.py

Here again, the test was easy to run, and doesn’t require any knowledge of an actual implementation of the consumer.

What happens if there’s a mistake in the provider’s implementation, and it doesn’t actually satisfy the contract? In this case, Pact would respond with an error looking something like this.

> assert resp.status_code == 200, resp.text E AssertionError: Actual interactions do not match expected interactions for mock MockService. E E Missing requests: E GET /inventory/123 E E See pact-mock-service.log for details. venv/lib/python3.10/site-packages/pact/pact.py:209: AssertionError

After the error is fixed, you can check the status, for instance, in Pact’s UI (pactflow):

Conclusion

Contract testing addresses the challenges inherent in integration testing. By shifting integration testing to an earlier stage in the development process, it eliminates the need for maintaining complex integration testing environments, such as data preparation and deployment. Additionally, contract testing can be executed in isolation whenever a service changes, removing the necessity for integrating all services into a single running environment.

Contract testing doesn’t eliminate the need for integration tests altogether, such as testing end-to-end scenarios as system tests. But many integration tests can be replaced by contract tests, such as interactions between microservices. As a consequence we can have much fewer tests that depend on a complex, slow, unreliable network environment, rendering the whole process much faster.

The Developer Experience Upgrade: From Create React App to Vite

Thu, 19 Dec 2024 00:00:00 GMT

We all know how it feels: staring at the terminal while your development server starts up, or watching your CI/CD pipeline crawl through yet another build process. For many React developers using Create React App (CRA), this waiting game has become an unwanted part of the daily routine. While CRA has been the go-to build tool for React applications for years, its aging architecture is increasingly becoming a bottleneck for developer productivity. Enter Vite: a modern build tool that’s not just an alternative to CRA, but a glimpse into the future of web development tooling. I’ll introduce both CRA and Vite, share how switching to Vite transformed our development workflow with concrete numbers and benchmarks to demonstrate the dramatic improvements in build times, startup speed, and overall developer experience.

Create React App: A Historical Context

Create React App played a very important role in making React what it is today. By introducing a single, clear, and recommended approach for creating React projects, it enabled developers to focus on building applications without worrying about the complexity of the underlying build tools.

However, like many mature and widely established tools, CRA has become stagnant over time by not keeping up with features provided by modern (meta-)frameworks like server-side rendering, routing, and data fetching. It also hasn’t taken advantage of web APIs to deliver fast applications by default.

Let’s dive into some of the most noticeable limitations.

Performance Issues

CRA’s performance issues stem from one major architectural factor: its reliance on Webpack as its bundler. Webpack, while powerful and flexible, has inherent performance limitations. Webpack processes everything through JavaScript, which is single-threaded by nature and slower at CPU-intensive tasks compared to lower-level languages like Go or Rust.

Here’s a simplified version of what happens every time you make a code change:

CRA (using Webpack) needs to scan your entire project to understand how all your files are connected to build a dependency graph

It then needs to transform all your modern JavaScript, TypeScript, or JSX code into a version that browsers can understand

Finally, it bundles everything together into a single package that can be served to your browser

Rebuilding the app becomes increasingly time-consuming as the project grows. During development, Webpack’s incremental builds help mitigate performance challenges by only reprocessing modules that have changed, leveraging the dependency graph to minimize unnecessary work. However, the bundling step still needs to consider all files—both cached and reprocessed, to generate a complete bundle that can be served to the browser, which means Webpack must account for the entire codebase’s structure with each build.

Security Issues

When running npx create-react-app , after waiting for a while, a good amount of deprecated warnings (23 packages as of writing this) will be shown. At the end of the installation process, a message indicating 8 vulnerabilities (2 moderate, 6 high) will appear. This means that create-react-app relies on packages that have known critical security vulnerabilities.

Support Issues

The React team no longer recommends CRA for new projects, and they have stopped providing support for it. The last published version on npm was 3 years ago.

Instead, React’s official documentation now includes Vite in its recommendations for both starting new projects and adding React to existing projects.

While CRA served its purpose well in the past, its aging architecture, security vulnerabilities, and lack of modern features make it increasingly difficult to justify for new projects.

Introducing Vite

Vite is a build tool that is designed to be simpler, faster and more efficient for building modern web applications. It’s opinionated and comes with sensible defaults out of the box.

Vite was created by Evan You, author of Vue, in 2020 to solve the complexity, slowness and the heaviness of the JavaScript module bundling toolchain. Since then, Vite has become one of the most popular build tools for web development, with over 15 million downloads per week and a community that has rated it as the Most Loved Library Overall, No.1 Most Adopted (+30%) and No.2 Highest Retention (98%) in the State of JS 2024 Developer Survey.

In addition to streamlining the development of single-page applications, Vite can also power meta frameworks and has support for server-side rendering (SSR). Although its scope is broader than what CRA was meant for, it does a fantastic job replacing CRA.

Why Vite is Faster

Vite applies several modern web technologies to improve the development experience:

1. Native ES Modules (ESM)

During development mode, Vite serves source code over native ES modules basically letting the browser handle module loading directly and skipping the bundling step. With this approach, Vite only processes and sends code as it is imported by the browser, and conditionally imported modules are processed only if they’re actually needed on the current page. This means the dev server can start much faster, even in large projects.

2. Efficient Hot Module Replacement (HMR)

By serving source code as native ESM to the browser, thus skipping the bundling step, Vite’s HMR process can provide near-instant updates while preserving the application state. When code changes, Vite updates only the modified module and its direct dependencies, ensuring fast updates regardless of project size. Additionally, Vite leverages HTTP headers and caching to minimize server requests, speeding up page reloads when necessary. More information about what HMR is and how it works in Vite can be found in this exhaustive blog post.

3. Optimized Build Tooling

Even though ESM are now widely supported, dependencies can still be shipped as CommonJS or UMD. To leverage the benefits of ESM during development, Vite uses esbuild to pre-bundle dependencies when starting the dev server. This step involves transforming CommonJS/UMD to ES modules and converting dependencies with many internal modules into a single module, thus improving performance and reducing browser requests.

When it comes to production, Vite switches to Rollup to bundle the application. Bundling is still preferred over ESM when shipping to production, as it allows for more optimizations like tree-shaking, lazy-loading and chunk splitting.

While this dual-bundler approach leverages the strengths of each bundler, it’s important to note that it’s a trade-off that can potentially introduce subtle inconsistencies between development and production environments and adds to Vite’s complexity.

By leveraging modern web technologies like ESM and efficient build tools like esbuild and Rollup, Vite represents a significant leap forward in development tooling, offering speed and simplicity that CRA simply cannot match with the way it’s currently architected.

Practical Results

The Migration Process

The codebase we migrated from CRA to Vite had around 250 files and 30k lines of code. Built as a Single Page Application using React 18, it uses Zustand and React Context for state management, with Tailwind CSS and shadcn/ui and some Bootstrap legacy components.

Here is a high-level summary of the migration process as it applied to our project, which took roughly a day to complete. The main steps included:

Removing CRA-related dependencies

Installing Vite and its React plugin

Moving index.html to the root directory

Creating a Vite configuration file

Adding a type declaration file

Updating the npm scripts in package.json

Adjusting tsconfig.json to align with Vite’s requirements

All steps are well documented in the Vite documentation and in several step-by-step guides available on the web.

Most challenges encountered were related to environment variables and path aliases, which were easily resolved using Vite’s documentation, and its vibrant community has produced extensive resources, guides, and solutions for even the most specialized setups.

Build Time

The build time for the project using Create React App (CRA) was 1 minute and 34 seconds. After migrating to Vite, the build time was reduced to 29.2 seconds, making it 3.2 times faster.

This reduction in build time speeds up CI/CD cycles, enabling more frequent testing and deployment. This is crucial for our development workflow, where faster builds mean quicker turnaround times and fewer delays for other team members. It can also reduce the cost of running the build process.

Dev Server Startup Time

The speed at which the development server starts can greatly impact the development workflow, especially in large projects.

The development server startup times saw a remarkable improvement after migrating from Create React App (CRA) to Vite. With CRA, a cold start took 15.469 seconds, and a non-cold start was 6.241 seconds. Vite dramatically reduced these times, with a cold start at just 1.202 seconds—12.9 times faster—and a non-cold start at 598 milliseconds, 10.4 times faster. The graph below highlights these impressive gains.

This dramatic reduction in startup time is particularly valuable when working with multiple branches or when frequent server restarts are needed during development.

HMR Update Time

While both CRA and Vite perform well with Hot Module Replacement at our current project scale, there are notable differences in the developer experience. CRA’s Webpack-based HMR typically takes around 1 second to update—which might sound fast, but the difference becomes apparent when compared to Vite’s near-instantaneous updates.

This distinction becomes more pronounced as projects grow in size and complexity. More importantly, the immediate feedback from Vite’s HMR creates a noticeably smoother development experience, especially when designing features that require frequent code changes and UI testing cycles. The absence of even a small delay helps maintain a more fluid and enjoyable workflow.

Bundle Size

Another essential factor is the size of the final bundled application, which affects load times and overall performance.

This represents a 27.5% reduction in raw bundle size and a 9.3% reduction in gzipped size. For end users, this means faster page loads, less data usage, and better performance, especially on mobile devices.

The data clearly illustrates that Vite’s improvements in build times, startup speed, and bundle size provide a significant and measurable upgrade to our development workflow.

The Hidden Advantage: Reduced Context Switching

One of the less obvious but valuable benefits of migrating to a faster environment like Vite is the reduction in context switching. In environments with slower build and start-up times, developers are more likely to engage in other tasks during these “idle” moments. Research on task interruptions shows that even brief context switches can introduce cognitive “reorientation” costs, increasing stress and reducing efficiency.

By reducing build and start-up times, Vite allows our team to maintain focus on their primary tasks. Developers are less likely to switch tasks and better able to stay within the “flow” of development, ultimately leading to a smoother, more focused workflow and, over time, less cognitive strain.

Beyond the measurable metrics, the real victory lies in how Vite’s speed helps developers maintain their focus and flow, leading to a more enjoyable and happy experience overall.

The Future of Vite is Bright

Vite is aiming to be a unified toolchain for the JavaScript ecosystem, and it is already showing great progress by introducing new tools like Rolldown and OXC.

Rolldown, Vite’s new bundler written in Rust, promises to be even faster than esbuild while maintaining full compatibility with the JavaScript ecosystem. It also unifies Vite’s bundling approach across development and production environments, solving the previously mentioned trade-off. Meanwhile, OXC provides a suite of high-performance tools including the fastest JavaScript parser, resolver, and TypeScript transformer available.

These innovations are part of Vite’s broader vision to create a more unified, efficient, and performant development experience that eliminates the traditional fragmentation in JavaScript tooling.

Early benchmarks show impressive performance improvements:

OXC Parser is 3x faster than SWC

OXC Resolver is 28x faster than enhanced-resolve

OXC TypeScript transformer is 4x faster than SWC

OXLint is 50-100x faster than ESLint

With innovations like Rolldown and OXC on the horizon, Vite is not just solving today’s development challenges but is actively shaping the future of web development tooling.

Conclusion

Migrating from Create React App to Vite proved to be a straightforward process that delivered substantial benefits across multiple dimensions. The quantifiable improvements in terms of build time, bundle size and development server startup time were impressive and by themselves justify the migration effort.

However, the true value extends beyond these measurable metrics. The near-instant Hot Module Replacement, reduced context switching, and overall smoother development workflow have significantly enhanced our team’s development experience. Developers spend less time waiting and more time in their creative flow, leading to better focus and increased productivity.

The migration also positions our project for the future, as Vite continues to evolve with promising innovations like Rolldown and OXC. Given the impressive results and the relatively straightforward migration process, the switch from CRA to Vite stands as a clear win for both our development team and our application’s performance.

GHC's wasm backend now supports Template Haskell and ghci

Thu, 21 Nov 2024 00:00:00 GMT

Two years ago I wrote a blog post to announce that the GHC wasm backend had been merged upstream. I’ve been too lazy to write another blog post about the project since then, but rest assured, the project hasn’t stagnated. A lot of improvements have happened after the initial merge, including but not limited to:

Many, many bugfixes in the code generator and runtime, witnessed by the full GHC testsuite for the wasm backend in upstream GHC CI pipelines. The GHC wasm backend is much more robust these days compared to the GHC-9.6 era.

The GHC wasm backend can be built and tested on macOS and aarch64-linux hosts as well.

Earlier this year, I landed the JSFFI feature for wasm. This lets you call JavaScript from Haskell and vice versa, with seamless integration of JavaScript async computation and Haskell’s green threading concurrency model. This allows us to support Haskell frontend frameworks like reflex & miso, and we have an example repo to demonstrate that.

And…the GHC wasm backend finally supports Template Haskell and ghci!

Show me the code!

$ nix shell 'gitlab:haskell-wasm/ghc-wasm-meta?host=gitlab.haskell.org' $ wasm32-wasi-ghc --interactive GHCi, version 9.13.20241102: https://www.haskell.org/ghc/ :? for help ghci>

Or if you prefer the non-Nix workflow:

$ curl https://gitlab.haskell.org/haskell-wasm/ghc-wasm-meta/-/raw/master/bootstrap.sh | sh ... Everything set up in /home/terrorjack/.ghc-wasm. Run 'source /home/terrorjack/.ghc-wasm/env' to add tools to your PATH. $ . ~/.ghc-wasm/env $ wasm32-wasi-ghc --interactive GHCi, version 9.13.20241102: https://www.haskell.org/ghc/ :? for help ghci>

Both the Nix and non-Nix installation methods default to GHC HEAD, for which binary artifacts for Linux and macOS hosts, for both x86_64 and aarch64, are provided. The Linux binaries are statically linked so they should work across a wide range of Linux distros.

If you take a look at htop, you’ll notice wasm32-wasi-ghc spawns a node child process. That’s the “external interpreter” process that runs our Template Haskell (TH) splice code as well as ghci bytecode. We’ll get to what this “external interpreter” is about later, just keep in mind that whatever code is typed into this ghci session is executed on the wasm side, not on the native side.

Now let’s run some code. It’s been six years since I published the first blog post when I joined Tweag and worked on a prototype compiler codenamed “Asterius”; the first Haskell program I managed to compile to wasm was fib, time to do that again:

ghci> :{ ghci| fib :: Int -> Int ghci| fib 0 = 0 ghci| fib 1 = 1 ghci| fib n = fib (n - 2) + fib (n - 1) ghci| :} ghci> fib 10 55

It works, though with $O(2^n)$ time complexity. It’s easy to do an $O(n)$ version, using the canonical Haskell fib implementation based on a lazy infinite list:

ghci> :{ ghci| fib :: Int -> Int ghci| fib = (fibs !!) ghci| where ghci| fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs) ghci| :} ghci> fib 32 2178309

That’s still boring isn’t it? Now buckle up, we’re gonna do an $O(1)$ implementation… using Template Haskell!

ghci> import Language.Haskell.TH ghci> :{ ghci| genFib :: Int -> Q Exp ghci| genFib n = ghci| pure $ ghci| LamCaseE ghci| [ Match (LitP $ IntegerL $ fromIntegral i) (NormalB $ LitE $ IntegerL r) [] ghci| | (i, r) <- zip [0 .. n] fibs ghci| ] ghci| where ghci| fibs = 0 : 1 : zipWith (+) fibs (drop 1 fibs) ghci| :} ghci> :set -XTemplateHaskell ghci> :{ ghci| fib :: Int -> Int ghci| fib = $(genFib 32) ghci| :} ghci> fib 32 2178309

Joking aside, the real point is not about how to implement fib, but rather to demonstrate that the GHC wasm backend indeed supports Template Haskell and ghci now.

Here’s a quick summary of wasm’s TH/ghci support status:

The patch has landed in the GHC master branch and will be present in upstream release branches starting from ghc-9.12. I also maintain non-official backport branches in my fork, and wasm TH/ghci has been backported to 9.10 as well. The GHC release branch bindists packaged by ghc-wasm-meta are built from my branches.

TH splices that involve only pure computation (e.g. generating class instances) work. Simple file I/O also works, so file-embed works. Side effects are limited to those supported by WASI, so packages like gitrev won’t work because you can’t spawn subprocesses in WASI. The same restrictions apply to ghci.

Our wasm dynamic linker can load bytecode and compiled code, but the only form of compiled code it can load are wasm shared libraries. If you’re using wasm32-wasi-ghc directly to compile code that involves TH, make sure to pass -dynamic-too to ensure the dynamic flavour of object code is also generated. If you’re using wasm32-wasi-cabal, make sure shared: True is present in the global config file ~/.ghc-wasm/.cabal/config.

The wasm TH/ghci feature requires at least cabal-3.14 to work (the wasm32-wasi-cabal shipped in ghc-wasm-meta is based on the correct version).

Our novel JSFFI feature also works in ghci! You can type foreign import javascript declarations directly into a ghci session, use that to import sync/async JavaScript functions, and even export Haskell functions as JavaScript ones.

If you have c-sources/cxx-sources in a cabal package, those can be linked and run in TH/ghci out of the box. However, more complex forms of C/C++ foreign library dependencies like pkgconfig-depends, extra-libraries, etc. will require special care to build both static and dynamic flavours of those libraries.

For ghci, hot reloading and basic REPL functionality works, but the ghci debugger doesn’t work yet.

What happens under the hood?

For the curious mind, -opti-v can be passed to wasm32-wasi-ghc. This tells GHC to pass -v to the external interpreter, so the external interpreter will print all messages passed between it and the host GHC process:

$ wasm32-wasi-ghc --interactive -opti-v GHCi, version 9.13.20241102: https://www.haskell.org/ghc/ :? for help GHC iserv starting (in: {handle: }; out: {handle: }) [ dyld.so] reading pipe... [ dyld.so] discardCtrlC ... [ dyld.so] msg: AddLibrarySearchPath ... ... [ dyld.so] msg: LoadDLL ... ... [ dyld.so] msg: LookupSymbol "ghczminternal_GHCziInternalziBase_thenIO_closure" [ dyld.so] writing pipe: Just (RemotePtr 2950784) ... [ dyld.so] msg: CreateBCOs ... [ dyld.so] writing pipe: [RemoteRef (RemotePtr 33)] ... [ dyld.so] msg: EvalStmt (EvalOpts {useSandboxThread = True, singleStep = False, breakOnException = False, breakOnError = False}) (EvalApp (EvalThis (RemoteRef (RemotePtr 34))) (EvalThis (RemoteRef (RemotePtr 33)))) 4 [ dyld.so] writing pipe: EvalComplete 15248 (EvalSuccess [RemoteRef (RemotePtr 36)]) ...

Why is any message passing involved in the first place? There’s a past blog post which contains an overview of cross compilation issues in Template Haskell, most of the points still hold today, and apply to both TH as well as ghci. To summarise:

When GHC cross compiles and evaluates a TH splice, it has to load and run code that’s compiled for the target platform. Compiling both host/target code and running host code for TH is never officially supported by GHC/Cabal.

The “external interpreter” runs on the target platform and handles target code. Messages are passed between the host GHC and the external interpreter, so GHC can tell the external interpreter to load stuff, and the external interpreter can send queries back to GHC when running TH splices.

In the case of wasm, the core challenge is dynamic linking: to be able to interleave code loading and execution at run-time, all while sharing the same program state. Back when I worked on Asterius, it could only link a self-contained wasm module that wasn’t able to share any code/data with other Asterius-linked wasm modules at run-time.

So I went with a hack: when compiling each single TH splice, just link a temporary wasm module and run it, get the serialized result and throw it away! That completely bypasses the need to make a wasm dynamic linker. Needless to say, it’s horribly slow and doesn’t support cross-splice state or ghci. Though it is indeed sufficient to support compiling many packages that use TH.

Now it’s 2024, time to do it the right way: implement our own wasm dynamic linker! Some other toolchains like emscripten also support dynamic linking of wasm, but there’s really no code to borrow here: each wasm dynamic linker is tailored to that toolchain’s specific needs, and we have JSFFI-related custom sections in our wasm code that can’t be handled by other linkers anyway.

Our wasm dynamic linker supports loading exactly one kind of wasm module: wasm shared libraries. This is something that you get by compiling C with wasm32-wasi-clang -shared, which enables generation of position-independent code. Such machine code can be placed anywhere in the address space, making it suitable for run-time code loading. A wasm shared library is yet another wasm module; it imports the linear memory and function table, and you can specify any base address for memory data and functions.

So I rolled up my sleeves and got to work. Below is a summary of the journey I took towards full TH & ghci support in the GHC wasm backend:

Step one was to have a minimum NodeJS script to load libc.so: it is the bottom of all shared library dependencies, the first and most important one to be loaded. It took me many cans of energy drink to debug mysterious memory corruptions! But finally I could invoke any libc function and do malloc/free, etc. from the NodeJS REPL, with the wasm instance state properly persisted.

Then load multiple shared libraries up to libc++.so and running simple C++ snippets compiled to .so. Dependency management logic of shared libraries is added at this step: the dynamic linker traverses the dependency tree of a .so, spawns async WebAssembly.compile tasks, then sequentially loads the dynamic libraries based on their topological order.

Then figure out a way to emit wasm position-independent-code from GHC’s wasm backend’s native code generator. The GHC native code generator emits a .s assembly file for the target platform, and while assembly format for x86_64 or aarch64, etc. is widely taught, there’s really no tutorial nor blog post to teach me about assembly syntax for wasm! Luckily, learning from Godbolt output examples was easy enough and I quickly figured out how the position-independent entities are represented in the assembly syntax.

The dynamic linker can now load the Haskell ghci shared library! It contains the default implementation of the external interpreter; it almost worked out of the box, though the linker needed some special logic to handle the piping logic across wasm/JS and the host GHC process.

In ghci, the logic to load libraries, lookup symbols, etc. are calling into the RTS linker on other platforms. Given all the logic exists on the JS side instead of C for wasm, they are patched to call back into the linker using JSFFI imports.

The GHC build system and driver needed quite a few adjustments, to ensure that shared libraries are generated for the wasm target when TH/ghci is involved. Thanks to Matthew Pickering for his patient and constructive review of my patch, I was able to replace many hacks in the GHC driver with more principled approaches.

The GHC driver also needs to learn to handle the wasm flavour of the external interpreter. Thanks to the prior work of the JS backend team here, my life is a lot easier when adding wasm external interpreter logic.

The GHC testsuite also needed quite a bit of work. In the end, there are over 1000 new test case passes after I flip on TH/ghci support for the wasm target.

What comes next?

The GHC wasm backend TH/ghci feature is way faster and more robust than what I hacked in Asterius back then. One nice example I’d like to show off here is pandoc-wasm: it’s finally possible to compile our beloved pandoc tool to wasm again since Asterius is deprecated.

The new pandoc-wasm is more performant not only at run-time, but also at compile-time. On a GitHub-hosted runner with just 4 CPU cores and 16 GB of memory, it takes around 16min to compile pandoc from scratch, and the time consumption can even be halved on my own laptop with peak memory usage at around 10.8GB. I wouldn’t doubt that time/memory usage can triple or more with legacy GHC-based compilers like Asterius or GHCJS to compile the same codebase!

The work on wasm TH/ghci is not fully finished yet. I do have some things in mind to work on next:

Support running the wasm external interpreter in the browser via puppeteer. So your ghci session can connect to the browser, all your Haskell code runs in the browser main thread, and all JSFFI logic in your code can access the browser’s window context. This would allow you to do Haskell frontend livecoding using ghci.

Support running an interactive ghci session within the browser. Which means a truly client side Haskell playground in the browser. It’ll only support in-memory bytecode, since it can’t invoke compiler processes to do any heavy lifting, but it’s still good for teaching purposes.

Maybe make it even faster? Performance isn’t my concern right now, though I haven’t done any serious profiling and optimization in the wasm dynamic linker either, so we’ll see.

Fix ghci debugger support.

You’re welcome to join the Haskell wasm Matrix room to chat about the GHC wasm backend. Do get in touch if you feel it is useful to your project!

Exploring Effect in TypeScript: Simplifying Async and Error Handling

Thu, 07 Nov 2024 00:00:00 GMT

Effect is a powerful library for TypeScript developers that brings functional programming techniques into managing effects and errors. It aims to be a comprehensive utility library for TypeScript, offering a range of tools that could potentially replace specialized libraries like Lodash, Zod, Immer, or RxJS.

In this blog post, we will introduce you to Effect by creating a simple weather widget app. This app will allow users to search for weather information by city name, making it a good example as it involves API data fetching, user input handling, and error management. We will implement this project in both vanilla TypeScript and using Effect to demonstrate the advantages Effect brings in terms of code readability and maintainability.

What is Effect?

Effect promises to improve TypeScript code by providing a set of modules and functions that are composable with maximum type-safety. The term “effect” refers to an effect system, which provides a declarative approach to handling side effects. Side effects are operations that have observable consequences in the real world, like logging, network requests, database operations, etc. The library revolves around the Effect type, which can be used to represent an immutable value that lazily describes a workflow or job. Effects are not functions by themselves, they are descriptions of what should be done. They can be composed with other effects, and they can be interpreted by the Effect runtime system. Before we dive into the project we will build, let’s look at some basic concepts of Effect.

Creating effects

We can create an effect based on a value using the Effect.succeed and Effect.fail functions:

const success: Effect.Effect<number, never, never> = Effect.succeed(42) const fail: Effect.Effect<never, Error, never> = Effect.fail( new Error("Something went wrong") )

An effect with never as the Error means it never fails

An effect with never as the Success means it never produces a successful value.

An effect with never as the Requirements means it doesn’t require any context to run.

With the functions above, we can create effects like this:

const divide = (a: number, b: number): Effect.Effect<number, Error, never> => b === 0 ? Effect.fail(new Error("Cannot divide by zero")) : Effect.succeed(a / b)

To create an effect based on a function, we can use the Effect.sync and Effect.promise for synchronous and asynchronous functions that can’t fail, respectively, and Effect.try and Effect.tryPromise for synchronous and asynchronous functions that can fail.

// Synchronous function that can't fail const log = (message: string): Effect.Effect<void, never, never> => Effect.sync(() => console.log(message)) // Asynchronous function that can't fail const delay = (message: string): Effect.Effect<string, never, never> => Effect.promise<string>( () => new Promise(resolve => { setTimeout(() => { resolve(message) }, 2000) }) ) // Synchronous function that can fail const parse = (input: string): Effect.Effect<any, Error, never> => Effect.try({ // JSON.parse may throw for bad input try: () => JSON.parse(input), // remap the error catch: _unknown => new Error(`something went wrong while parsing the JSON`), }) // Asynchronous function that can fail const getTodo = (id: number): Effect.Effect<Response, Error, never> => Effect.tryPromise({ // fetch can throw for network errors try: () => fetch(`https://jsonplaceholder.typicode.com/todos/${id}`), // remap the error catch: unknown => new Error(`something went wrong ${unknown}`), })

For more details about creating effects you can check the Effect documentation.

Running effects

In order to run an effect, we need to use the appropriate function depending on the effect type. In our application we’ll use the Effect.runPromise function, which is used for effects that are asynchronous and can’t fail:

Effect.runPromise(delay("Hello, World!")).then(console.log) // -> Hello, World! (after 2 seconds)

You can read about other ways to run effects, and what happens when you don’t use the correct function, in the “Running Effects” page of the Effect documentation.

Pipe

When writing a program using Effect, we usually need to run a sequence of operations, and we can use the pipe function to compose them:

const double = (n: number) => n * 2 const divide = (b: number) => (a: number): Effect.Effect<number, Error> => b === 0 ? Effect.fail(new Error("Cannot divide by zero")) : Effect.succeed(a / b) const increment = (n: number) => Effect.succeed(n + 1) const result = pipe( 42, // Here we have an Effect.Effect with the value 21 divide(2), // To run a function over the value changing the effect's value, we use Effect.map Effect.map(double), // To run a function over the value without changing the effect's value, we use Effect.tap Effect.tap(n => console.log(`The double is ${n}`)), // To run a function that returns a new effect, we use Effect.andThen Effect.andThen(increment), Effect.tap(n => console.log(`The incremented value is ${n}`)) ) Effect.runSync(result) // -> The double is 42 // -> The incremented value is 43

If you want to know more about the pipe function, you can check this page on the Effect documentation.

The project

Now that we have a basic understanding of Effect, we can start the project! We will build a simple weather app in which the user types the name of a city, selects the desired one from a list of suggestions, and then the app shows the current weather in that city.
The project will have three main components: the input field, the list of suggestions, and the weather information.

We will use the Open-Meteo API to get the weather information as it doesn’t require an API key.

Setup

We begin by creating a new TypeScript project:

mkdir weather-app cd weather-app npm init -y

Next, we install the dependencies. We will use Parcel to bundle the project as it works without any configuration:

npm install --save-dev parcel

Now we create the project structure:

mkdir src touch src/index.html touch src/styles.scss touch src/index.ts

The index.html file contains a main element with sections: one with a text input for city input and another for displaying weather information.

You can check the HTML and SCSS code in the GitHub repository.

In order to run the project, we need to add the following keys to the package.json file:

{ "source": "./src/index.html", "scripts": { "dev": "parcel", "build": "parcel build" } }

Now we can run the project:

npm run dev Server running at http://localhost:1234 ✨ Built in 8ms

By accessing the URL, you should see the application, but it won’t work yet.

Figure 1. Application's initial state

Let’s write the TypeScript code!

Without Effect

All the following code examples should be placed in the src/index.ts file.

First, we query the elements from the DOM:

// The field input const cityElement = document.querySelector<HTMLInputElement>("#city") // The list of suggestions const citiesElement = document.querySelector<HTMLUListElement>("#cities") // The weather information const weatherElement = document.querySelector<HTMLDivElement>("#weather")

Next, we’ll define the types for the data we’ll fetch from the API.
To validate the data, we’ll use a library called Zod. Zod is a TypeScript-first schema declaration and validation library.

npm install zod

First, we define the schema by using z.object and, for each property, we use z.string, z.number and other functions to define its type:

import { z } from "zod" // ... const CityResponse = z.object({ name: z.string(), country_code: z.string().length(2), latitude: z.number(), longitude: z.number(), }) const GeocodingResponse = z.object({ results: z.array(CityResponse), })

With the schema defined, we can use the z.infer utility type to infer the type of the data based on the schema:

type CityResponse = z.infer<typeof CityResponse> type GeocodingResponse = z.infer<typeof GeocodingResponse>

Now, we create the function to fetch the cities from the Open-Meteo API. It fetches the cities that match the given name and returns a list of suggestions. In order to validate the API response, we use the safeParse method that our GeocodingResponse Zod schema provides. This method returns an object with two key properties:

success: A boolean indicating if the parsing succeeded.

data: The parsed data if successful, matching our defined schema.

const getCity = async (city: string): Promise<CityResponse[]> => { try { const response = await fetch( `https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json` ) // Convert the response to JSON const geocoding = await response.json() // Parse the response using the GeocodingResponse schema const parsedGeocoding = GeocodingResponse.safeParse(geocoding) if (!parsedGeocoding.success) { return [] } return parsedGeocoding.data.results } catch (error) { console.error("Error:", error) return [] } }

To make the input field work, we need to attach an event listener to it to call the getCity function:

const getCities = async function (input: HTMLInputElement) { const { value } = input // Check if the HTML element exists if (citiesElement) { // Clear the list of suggestions citiesElement.innerHTML = "" } // Check if the input is empty if (!value) { return } // Fetch the cities const results = await getCity(value) renderCitySuggestions(results) } cityElement?.addEventListener("input", function (_event) { getCities(this) })

Next, we create the renderCitySuggestions function to render the list of suggestions or display an error message if there are no suggestions:

const renderCitySuggestions = (cities: CityResponse[]) => { // If there are cities, populate the suggestions if (cities.length > 0) { populateSuggestions(cities) return } // Otherwise, show a message that the city was not found if (weatherElement) { const search = cityElement?.value || "searched" weatherElement.innerHTML = `City ${search} not found` } }

The populateSuggestions function is very simple - it creates a list item for each city:

const populateSuggestions = (results: CityResponse[]) => results.forEach(city => { const li = document.createElement("li") li.innerText = `${city.name} - ${city.country_code}` citiesElement?.appendChild(li) })

Now if we type a city name in the input field, we should see the list of suggestions:

Figure 2. City suggestions

Great!

The next step is to implement the selectCity function that fetches the weather information of a city and displays it:

const selectCity = async (result: CityResponse) => { // If the HTML element doesn't exist, return if (!weatherElement) { return } try { const data = await getWeather(result) if (data.tag === "error") { throw data.value } const { temperature_2m, apparent_temperature, relative_humidity_2m, precipitation, } = data.value.current weatherElement.innerHTML = ` ${result.name} Temperature: ${temperature_2m}°C Feels like: ${apparent_temperature}°C Humidity: ${relative_humidity_2m}% Precipitation: ${precipitation}mm ` } catch (error) { weatherElement.innerHTML = `An error occurred while fetching the weather: ${error}` } }

Then we call it in the populateSuggestions function:

const populateSuggestions = (results: CityResponse[]) => results.forEach(city => { // ... li.addEventListener("click", () => selectCity(city)) citiesElement?.appendChild(li) })

The last piece of the puzzle is the getWeather function. Once again, we’ll use Zod to create the schema and the type for the weather information.

type WeatherResult = | { tag: "ok"; value: WeatherResponse } | { tag: "error"; value: unknown } const WeatherResponse = z.object({ current_units: z.object({ temperature_2m: z.string(), relative_humidity_2m: z.string(), apparent_temperature: z.string(), precipitation: z.string(), }), current: z.object({ temperature_2m: z.number(), relative_humidity_2m: z.number(), apparent_temperature: z.number(), precipitation: z.number(), }), }) type WeatherResponse = z.infer<typeof WeatherResponse> const getWeather = async (result: CityResponse): Promise<WeatherResult> => { try { const response = await fetch( `https://api.open-meteo.com/v1/forecast?latitude=${result.latitude}&longitude=${result.longitude}¤t=temperature_2m,relative_humidity_2m,apparent_temperature,precipitation&timezone=auto&forecast_days=1` ) // Convert the response to JSON const weather = await response.json() // Parse the response using the WeatherResponse schema const parsedWeather = WeatherResponse.safeParse(weather) if (!parsedWeather.success) { return { tag: "error", value: parsedWeather.error } } return { tag: "ok", value: parsedWeather.data } } catch (error) { return { tag: "error", value: error } } }

We have a type WeatherResult for error handling; it can be ok or error. The getWeather function fetches the weather information based on the latitude and longitude of a city and returns the result. We are passing some parameters to the API to get the current temperature, humidity, apparent temperature, and precipitation. If you want to know more about these parameters, you can check the API documentation.

One last thing we need to do is to use a debounce function to avoid making too many requests to the API while the user is typing. To do that, we’ll install Lodash which provides many useful functions for everyday programming.

npm install lodash npm install --save-dev @types/lodash

We’ll wrap the getCities function with the debounce function:

import { debounce } from "lodash" // ... const getCities = debounce(async function (input: HTMLInputElement) { // The same code as before }, 500)

This way, the getCities function will be called only after the user stops typing for 500 milliseconds.

Our small weather app is now complete: when we type a city name in the input field, a list of suggestions is displayed, and when we click on one of them, we can see the weather information for that city.

Figure 3. Weather information

While our current code works and handles errors well, let’s explore how using Effect can potentially improve its robustness and simplicity.

With Effect

To get started with Effect, we need to install it:

npm install effect

We will start by refactoring the functions in the order we implemented them in the previous section.

First, we refactor the querySelector calls. We’ll use the Option type from Effect: it represents a value that may or may not exist. If the value exists, it’s a Some, if it doesn’t, it’s a None.

import { Option } from "effect" // The field input const cityElement = Option.fromNullable( document.querySelector<HTMLInputElement>("#city") ) // The list of suggestions const citiesElement = Option.fromNullable( document.querySelector<HTMLUListElement>("#cities") ) // The weather information const weatherElement = Option.fromNullable( document.querySelector<HTMLDivElement>("#weather") )

Using the Option type, we can chain operations without worrying about null or undefined values. This approach simplifies our code by eliminating the need for explicit null checks. We can use functions like Option.map and Option.andThen to handle the transformations and checks in a more elegant way. To know more about the Option type, take a look at the page about it in the documentation.

Now, let’s move to the getCity function. We’ll use the Schema.Struct to define the types of the CityResponse and GeocodingResponse objects. Those schemas will be used to validate the response from the API. This is the same thing we did before with Zod, but this time we don’t have to install any library. Instead, we can just use the Schema module that Effect provides.

import { /* ... */, Effect, Scope, pipe } from "effect"; import { Schema } from "@effect/schema" import { FetchHttpClient, HttpClient, HttpClientResponse, HttpClientError } from "@effect/platform"; // ... const CityResponse = Schema.Struct({ name: Schema.String, country_code: pipe(Schema.String, Schema.length(2)), latitude: Schema.Number, longitude: Schema.Number, }) type CityResponse = Schema.Schema.Type<typeof CityResponse> const GeocodingResponse = Schema.Struct({ results: Schema.Array(CityResponse), }) type GeocodingResponse = Schema.Schema.Type<typeof GeocodingResponse> const getRequest = (url: string): Effect.Effect<HttpClientResponse.HttpClientResponse, HttpClientError.HttpClientError, Scope.Scope> => pipe( HttpClient.HttpClient, // Using `Effect.andThen` to get the client from the `HttpClient.HttpClient` tag and then make the request Effect.andThen(client => client.get(url)), // We don't need to send the tracing headers to the API to avoid CORS errors HttpClient.withTracerPropagation(false), // Providing the HTTP client to the effect Effect.provide(FetchHttpClient.layer) ) const getCity = (city: string): Effect.Effect<readonly CityResponse[], never, never> => pipe( getRequest( `https://geocoding-api.open-meteo.com/v1/search?name=${city}&count=10&language=en&format=json` ), // Validating the response using the `GeocodingResponse` schema Effect.andThen(HttpClientResponse.schemaBodyJson(GeocodingResponse)), // Providing a default value in case of failure Effect.orElseSucceed<GeocodingResponse>(() => ({ results: [] })), // Extracting the `results` array from the `GeocodingResponse` object Effect.map(geocoding => geocoding.results), // Providing a scope to the effect Effect.scoped )

Here we already have some interesting things happening!

The getRequest function sets up the HTTP client. While we could use the built-in fetch API as our HTTP client, Effect provides a solution called HttpClient in the @effect/platform package. It’s important to note that this package is currently in beta, as mentioned in the official documentation. Despite its beta status, we’ll be using it to explore more of Effect’s capabilities and showcase how it integrates with the broader Effect ecosystem. This choice allows us to demonstrate Effect’s approach to HTTP requests and error handling in a more idiomatic way. HttpClient.HttpClient is something called a “tag” that we can use to get the HTTP client from the context. To do that, we use the Effect.andThen function.
After that, we’re setting withTracerPropagation to false to avoid sending the tracing headers to the API and getting a CORS error.

Since we’re using the HttpClient service, it’s a requirement to our effect (remember the Effect type?) and we need to provide this requirement in order to run the effect.
With the Effect.provide function we can add a layer to the effect that provides the HttpClient service. For more information about the Effect.provide function and how it works, take a look at the runtime page on the Effect documentation.

In the getCity function, we call the getRequest function to get the response from the API. Then we validate the response using the HttpClientResponse.schemaBodyJson function, which validates the response body using the GeocodingResponse schema.
In the last line of the function, we use the Effect.scoped function to provide a scope to the effect, this is a requirement for the HttpClient service that we’re using in the getRequest function. The scope ensures that if the program is interrupted, any request will be aborted, preventing memory leaks. getCity returns a Effect.Effect: the two never means it never fails (we’re providing a default value in case of failure), and it doesn’t require any context to run.

Next, we refactor the getCities function:

import { /* ... */, Effect, Option, pipe } from "effect"; // ... const getCities = (search: string): Effect.Effect<Option.Option<void>, never, never> => { Option.map(citiesElement, citiesEl => (citiesEl.innerHTML = "")) return pipe( getCity(search), Effect.map(renderCitySuggestions), // Check if the input is empty Effect.when(() => Boolean(search)) ) }

We’re using the Option.map function to access the actual citiesElement and clear the list of suggestions. After that, it’s pretty straightforward: we call the getCity function with the search term, then we map the renderCitySuggestions function over the successful value, and finally, we apply a condition that makes the effect run only if the search term is not empty.

Here is how we add the event listener to the input field:

import { /* ... */, Effect, Option, pipe, Stream, Chunk, StreamEmit } from "effect"; // ... Option.map(cityElement, cityEl => { const stream = Stream.async( (emit: StreamEmit.Emit<never, never, string, void>) => cityEl.addEventListener("input", function (_event) { emit(Effect.succeed(Chunk.of(this.value))) }) ) pipe( stream, Stream.debounce(500), Stream.runForEach(getCities), Effect.runPromise ) })

Actually, we’re doing more than just adding an event listener. The debounce function that we had to import from Lodash before is now part of Effect as the Stream.debounce function. In order to use this function, we need to create a Stream.
A Stream has the type Stream and it’s a program description that, when executed, can emit zero or more values of type A, handle errors of type E, and operates within a context of type R. There are a couple of ways to create a Stream, which are detailed in the page about streams in the documentation. In this case, we’re using the Stream.async function as it receives a callback that emits values to the stream.

After creating the Stream and assigning it to the stream variable, we use a pipe to build a pipeline where we debounce the stream by 500 milliseconds, run the getCities function whenever the stream gets a value (that is, when we emit a value), and finally run the effect with Effect.runPromise.

Let’s move on to the renderCitySuggestions function:

import { /* ... */, Array, Option, pipe } from "effect"; // ... const renderCitySuggestions = (cities: readonly CityResponse[]): void | Option.Option<void> => // If there are multiple cities, populate the suggestions // Otherwise, show a message that the city was not found pipe( cities, Array.match({ onNonEmpty: populateSuggestions, onEmpty: () => { const search = Option.match(cityElement, { onSome: (cityEl) => cityEl.value, onNone: () => "searched", }); Option.map( weatherElement, (weatherEl) => (weatherEl.innerHTML = `City ${search} not found`), ); }, }), );

Instead of manually checking the length of the cities array, we’re using the Array.match function to handle that. If the array is empty, it calls the callback defined in the onEmpty property, and if the array is not empty, it calls the callback defined in the onNonEmpty property.

The populateSuggestions function remains almost the same. The only change is that we now wrap the forEach operation in an Option.map to safely handle the optional cities element. This ensures we only attempt to populate suggestions when the element exists.

The selectCity function is simpler now:

import { /* ... */, Option, pipe } from "effect"; // ... const selectCity = (result: CityResponse): Option.Option<Promise<string>> => Option.map(weatherElement, weatherEl => pipe( result, getWeather, Effect.match({ onFailure: error => (weatherEl.innerHTML = `An error occurred while fetching the weather: ${error}`), onSuccess: (weatherData: WeatherResponse) => (weatherEl.innerHTML = ` ${result.name} Temperature: ${weatherData.current.temperature_2m}°C Feels like: ${weatherData.current.apparent_temperature}°C Humidity: ${weatherData.current.relative_humidity_2m}% Precipitation: ${weatherData.current.precipitation}mm `), }), Effect.runPromise ) )

There is no checking for the data.tag any more, we’re using the Effect.match function to handle both cases, success and failure, and we don’t throw anything anymore.

Finally, the getWeather function:

import { /* ... */, Effect, pipe } from "effect"; import { Schema, ParseResult } from "@effect/schema"; import { /* ... */, HttpClientResponse, HttpClientError } from "@effect/platform"; // ... const WeatherResponse = Schema.Struct({ current_units: Schema.Struct({ temperature_2m: Schema.String, relative_humidity_2m: Schema.String, apparent_temperature: Schema.String, precipitation: Schema.String, }), current: Schema.Struct({ temperature_2m: Schema.Number, relative_humidity_2m: Schema.Number, apparent_temperature: Schema.Number, precipitation: Schema.Number, }), }) type WeatherResponse = Schema.Schema.Type<typeof WeatherResponse> const getWeather = ( result: CityResponse, ): Effect.Effect<WeatherResponse, HttpClientError.HttpClientError | ParseResult.ParseError, never> => pipe( getRequest( `https://api.open-meteo.com/v1/forecast?latitude=${result.latitude}&longitude=${result.longitude}¤t=temperature_2m,relative_humidity_2m,apparent_temperature,precipitation&timezone=auto&forecast_days=1` ), Effect.andThen(HttpClientResponse.schemaBodyJson(WeatherResponse)), Effect.scoped )

We’re again using the Schema.Struct to define the WeatherResponse type. However, we don’t need to have a WeatherResult anymore as the Effect type already handles the success and failure cases.

After this refactoring, the app works the same way it did before, but now we have the confidence that our code is more robust and type-safe. Let’s see the benefits of Effect when comparing to the code without it.

Conclusion

Now that we have the two versions of the application, we can analyze them and highlight the pros and cons of using Effect:

Pros

Type-safety: Effect provides a way to handle errors and requirements in a type-safe way and using it increases the overall type safety of our app.

Error handling: The Effect type has built-in error handling, making the code more robust.

Validation: We don’t need to use a library like Zod to validate the response - we can use the Schema module to validate the response.

Utility functions: We don’t need to use a library like Lodash to use utility functions. Instead, we can use the Array, Option, Stream, and other modules.

Declarative style: Writing code with Effect means we’re using a more declarative approach: we’re describing “what” we want our program to do, rather than “how” we want it to do it.

Cons

Complexity: The code is more complex than the one without Effect; it may be hard to understand for people who are not familiar with the library.

Learning curve: You need to learn how to use the library - it’s not as simple as writing plain TypeScript code.

Documentation: The documentation is good, but could be better. Some parts are not clear.

While the code written with Effect may initially appear more complex to those unfamiliar with the library, its benefits far outweigh the initial learning curve. Effect offers powerful tools for maximum type-safety, error handling, asynchronous operations, streams and more, all within a single library that is incrementally adoptable. In our project, we used two separate libraries (Zod and Lodash) to achieve what Effect accomplishes on its own.

While plain TypeScript may be adequate for small projects, we believe Effect can truly shine in larger, more complex applications. Its robust handling of side-effects and comprehensive error management have the potential to make it a game changer for taming complexity and maintaining code quality at scale.

Introducing rules_gcs

Thu, 17 Oct 2024 00:00:00 GMT

At Tweag, we are constantly striving to improve the developer experience by contributing tools and utilities that streamline workflows. We recently completed a project with IMAX, where we learned that they had developed a way to simplify and optimize the process of integrating Google Cloud Storage (GCS) with Bazel. Seeing value in this tool for the broader community, we decided to publish it together under an open source license. In this blog post, we’ll dive into the features, installation, and usage of rules_gcs, and how it provides you with access to private resources.

What is rules_gcs?

rules_gcs is a Bazel ruleset that facilitates the downloading of files from Google Cloud Storage. It is designed to be a drop-in replacement for Bazel’s http_file and http_archive rules, with features that make it particularly suited for GCS. With rules_gcs, you can efficiently fetch large amounts of data, leverage Bazel’s repository cache, and handle private GCS buckets with ease.

Key Features

Drop-in Replacement: rules_gcs provides gcs_file and gcs_archive rules that can directly replace http_file and http_archive. They take a gs://bucket_name/object_name URL and internally translate this to an HTTPS URL. This makes it easy to transition to GCS-specific rules without major changes to your existing Bazel setup.

Lazy Fetching with gcs_bucket: For projects that require downloading multiple objects from a GCS bucket, rules_gcs includes a gcs_bucket module extension. This feature allows for lazy fetching, meaning objects are only downloaded as needed, which can save time and bandwidth, especially in large-scale projects.

Private Bucket Support: Accessing private GCS buckets is seamlessly handled by rules_gcs. The ruleset supports credential management through a credential helper, ensuring secure access without the need to hardcode credentials or use gsutil for downloading.

Bazel’s Downloader Integration: rules_gcs uses Bazel’s built-in downloader and repository cache, optimizing the download process and ensuring that files are cached efficiently across builds, even across multiple Bazel workspaces on your local machine.

Small footprint: Apart from the gcloud CLI tool (for obtaining authentication tokens), rules_gcs requires no additional dependencies or Bazel modules. This minimalistic approach reduces setup complexity and potential conflicts with other tools.

Understanding Bazel Repositories and Efficient Object Fetching with rules_gcs

Before we dive into the specifics of rules_gcs, it’s important to understand some key concepts about Bazel repositories and repository rules, as well as the challenges of efficiently managing large collections of objects from a Google Cloud Storage (GCS) bucket.

Bazel Repositories and Repository Rules

In Bazel, external dependencies are managed using repositories, which are declared in your WORKSPACE or MODULE.bazel file. Each repository corresponds to a package of code, binaries, or other resources that Bazel fetches and makes available for your build. Repository rules, such as http_archive or git_repository, and module extensions define how Bazel should download and prepare these external dependencies.

However, when dealing with a large number of objects, such as files stored in a GCS bucket, using a single repository to download all objects can be highly inefficient. This is because Bazel’s repository rules typically operate in an “eager” manner—they fetch all the specified files as soon as any target of the repository is needed. For large buckets, this means downloading potentially gigabytes of data even if only a few files are actually needed for the build. This eager fetching can lead to unnecessary network usage, increased build times, and larger disk footprints.

The rules_gcs Approach: Lazy Fetching with a Hub Repository

rules_gcs addresses this inefficiency by introducing a more granular approach to downloading objects from GCS. Instead of downloading all objects at once into a single repository, rules_gcs uses a module extension that creates a “hub” repository, which then manages individual sub-repositories for each GCS object.

How It Works

Hub Repository: The hub repository acts as a central point of reference, containing metadata about the individual GCS objects. This follows the “hub-and-spoke” paradigm with a central repository (the bucket) containing references to a large number of small repositories for each object. This architecture is commonly used by Bazel module extensions to manage dependencies for different language ecosystems (including Python and Rust).

Individual Repositories per GCS Object: For each GCS object specified in the lockfile, rules_gcs creates a separate repository using the gcs_file rule. This allows Bazel to fetch each object lazily—downloading only the files that are actually needed for the current build.

Methods of Fetching: Users can choose between different methods in the gcs_bucket module extension. The default method of creating symlinks is efficient while preserving the file structure set in the lockfile. If you need to access objects as regular files, choose one of the other methods.

Symlink: Creates a symlink from the hub repo pointing to a file in its object repo, ensuring the object repo and symlink pointing to it are created only when the file is accessed.

Alias: Similar to symlink, but uses Bazel’s aliasing mechanism to reference the file. No files are created in the hub repo.

Copy: Creates a copy of a file in the hub repo when accessed.

Eager: Downloads all specified objects upfront into a single repository.

This modular approach is particularly beneficial for large-scale projects where only a subset of the data is needed for most builds. By fetching objects lazily, rules_gcs minimizes unnecessary data transfer and reduces build times.

Integrating with Bazel’s Credential Helper Protocol

Another critical aspect of rules_gcs is its seamless integration with Bazel’s credential management system. Accessing private GCS buckets securely requires proper authentication, and Bazel uses a credential helper protocol to handle this.

How Bazel’s Credential Helper Protocol Works

Bazel’s credential helper protocol is a mechanism that allows Bazel to fetch authentication credentials dynamically when accessing private resources, such as a GCS bucket. The protocol is designed to be simple and secure, ensuring that credentials are only used when necessary and are never hardcoded into build files.

When Bazel’s downloader prepares a request and a credential helper was configured, it invokes the credential helper with the command get. Additionally, the request URI is passed to the helpers standard input encoded as JSON. The helper is expected to return a JSON object containing HTTP headers, including the necessary Authorization token, which Bazel will then include in its requests.

Here’s a breakdown of how the credential_helper script used in rules_gcs works:

Authentication Token Retrieval: The script uses the gcloud CLI tool to obtain an access token via gcloud auth application-default print-access-token. This token is tied to the user’s current authentication context and can be used to fetch any objects the user is allowed to access.

Output Format: The script outputs the token in a JSON format that Bazel can directly use:

{ "headers": { "Authorization": ["Bearer ${TOKEN}"] } }

This JSON object includes the Authorization header, which Bazel uses to authenticate its requests to the GCS bucket.

Integration with Bazel: To use this credential helper, you need to configure Bazel by specifying the helper in the .bazelrc file:

common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper

This line tells Bazel to use the specified credential_helper script whenever it needs to access resources from storage.googleapis.com. If a request returns an error code or unexpected content, credentials are invalidated and the helper is invoked again.

How rules_gcs Hooks Into the Credential Helper Protocol

rules_gcs leverages this credential helper protocol to manage access to private GCS buckets securely and efficiently. By providing a pre-configured credential helper script, rules_gcs ensures that users can easily set up secure access without needing to manage tokens or authentication details manually.

Moreover, by limiting the scope of the credential helper to the GCS domain (storage.googleapis.com), rules_gcs reduces the risk of credentials being misused or accidentally exposed. The helper script is designed to be lightweight, relying on existing gcloud credentials, and integrates seamlessly into the Bazel build process.

Installing rules_gcs

Adding rules_gcs to your Bazel project is straightforward. The latest version is available on the Bazel Central Registry. To install, simply add the following to your MODULE.bazel file:

bazel_dep(name = "rules_gcs", version = "1.0.0")

You will also need to include the credential helper script in your repository:

mkdir -p tools wget -O tools/credential-helper https://raw.githubusercontent.com/tweag/rules_gcs/main/tools/credential-helper chmod +x tools/credential-helper

Next, configure Bazel to use the credential helper by adding the following lines to your .bazelrc:

common --credential_helper=storage.googleapis.com=%workspace%/tools/credential-helper # optional setting to make rules_gcs more efficient common --experimental_repository_cache_hardlinks

These settings ensure that Bazel uses the credential helper specifically for GCS requests. Additionally, the setting --experimental_repository_cache_hardlinks allows Bazel to hardlink files from the repository cache instead of copying them into a repository. This saves time and storage space, but requires the repository cache to be located on the same filesystem as the output base.

Using rules_gcs in Your Project

rules_gcs provides three primary rules: gcs_bucket, gcs_file, and gcs_archive. Here’s a quick overview of how to use each:

gcs_bucket: When dealing with multiple files from a GCS bucket, the gcs_bucket module extension offers a powerful and efficient way to manage these dependencies. You define the objects in a JSON lockfile, and gcs_bucket handles the rest.

gcs_bucket = use_extension("@rules_gcs//gcs:extensions.bzl", "gcs_bucket") gcs_bucket.from_file( name = "trainingdata", bucket = "my_org_assets", lockfile = "@//:gcs_lock.json", )

gcs_file: Use this rule to download a single file from GCS. It’s particularly useful for pulling in assets or binaries needed during your build or test processes. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).

gcs_file = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl", "gcs_file") gcs_file( name = "my_testdata", url = "gs://my_org_assets/testdata.bin", sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", )

gcs_archive: This rule downloads and extracts an archive from GCS, making it ideal for pulling in entire repositories or libraries that your project depends on. Since it is a repository rule, you have to invoke it with use_repo_rule in a MODULE.bazel file (or wrap it in a module extension).

gcs_archive = use_repo_rule("@rules_gcs//gcs:repo_rules.bzl", "gcs_archive") gcs_archive( name = "magic", url = "gs://my_org_code/libmagic.tar.gz", sha256 = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", build_file = "@//:magic.BUILD", )

Try it Out

rules_gcs is a versatile and simple solution for integrating Google Cloud Storage with Bazel. We invite you to try out rules_gcs in your projects and contribute to its development. As always, we welcome feedback and look forward to seeing how this tool enhances your workflows. Check out the full example to get started!

Thanks to IMAX for sharing their initial implementation of rules_gcs and allowing us to publish the code under an open source license.

Python Packaging in the Real World: Biomedical projects vs. PyPI

Tue, 24 Sep 2024 00:00:00 GMT

The Python programming language, and its huge ecosystem (there are more than 500,000 projects hosted on the main Python repository, PyPI), is used both for software engineering and scientific research. Both have similar requirements for reproducibility. But, as we will see, the practices are quite different.

In fact, the Python ecosystem and community is notorious for the countless ways it uses to declare dependencies. As we were developping FawltyDeps¹, a tool to ensure that declared dependencies match the actual imports in the code, we had to accommodate many of these ways. This got us thinking: Could FawltyDeps be used to gain insights into how packaging is done across Python ecosystems?

In this blog post, we look at project structures and dependency declarations across Python projects, both from biomedical scientific papers (as an example of scientific usage of Python) as well as from more general and widely used Python packages. We’ll try to answer the following questions:

What practices does the community actually follows? And how do they differ between software engineering and scientific research?

Could such differences be related to why it’s often hard to reproduce results from scientific notebooks published in the data science community?

Experiment setup

In the following, we discuss the experimental setup — how we decided which data to use, where to get this data from, and what tools we use to analyze it, before we discuss our results in depth.

Data

First, we need to collect the names and source code locations of projects that we want to include in the analysis. Now, where did we find these projects? We selected projects for analysis based on two key areas: impactful real-world applications and broad community adoption.

Biomedical data analysis repositories: biomedical data plays a vital role in healthcare and research. To capture its significance, we focused on packages directly linked to biomedical data, sourced from repositories supported or referenced by scientific biomedical articles. This criterion anchored our experiment in real-world scientific applications.

To analyze software engineering practices, we’ve chosen to use the most popular PyPI packages: acknowledging the importance of widely adopted packages, we included a scan of the most downloaded and frequently used PyPI packages.

Biomedical data

We leverage a recent study by Samuel, S., & Mietchen, D. (2024): Computational reproducibility of Jupyter notebooks from biomedical publications. This study analyzed 2,177 GitHub repositories associated with publications indexed in PubMed Central to assess computational reproducibility. Specifically, we reused the dataset they generated (found here) for our own analyses.

PyPI data

In order to start analyzing actual projects published to PyPI, we still needed to access some basic metadata about these projects: the project’s name, source URL, and any extra metadata which could be useful for further analysis such as project tags.

While this information is available via the PyPI REST API, this API is subject to rate limiting and is not really designed for bulk analyses such as ours. Conveniently, Google maintains a public BigQuery dataset of PyPI download statistics and project metadata which we leveraged instead. As a starting point for our analysis, we produced a CSV with relevant metadata for top packages downloaded in 2023 using a simple SQL query. Since the above-mentioned biomedical database contains 2,177 projects, we conducted a scan of the first 2,000 PyPI packages to create a dataset of comparable size.

Using FawltyDeps to analyze the source code data

Now that we have the source URLs of our projects of interest, we downloaded all sources and ran an analysis script that wraps around FawltyDeps on the packages. For safety, all of this happened in a virtual machine.

Post-processing and filtering of FawltyDeps analysis results

While the data we collected from PyPI was quite clean (modulo broken or inaccessible project URLs), the biomedical dataset contained some projects written in R and some projects written in Python 2.X, which are outside of our scope. To further filter for relevant projects that are written in Python 3.X, we applied the following rules:

there should be .py or .ipynb files in the source code directory of the data. If there are only .ipynb files and no imports, then it is most likely an R project and not taken into account.

we are also only interested in Python projects that have 3rd-party imports, as these are the project we would expect to declare their dependencies.

After these filtering steps, we have 1,260 biomedical projects and 1,118 PyPI packages to be analyzed.

Results

Now that we had crunched thousands of Python packages, we were curious to see what secrets the data produced by FawltyDeps would reveal!

Dependency declaration patterns

First, we investigated which dependency declaration file choices were made in both samples. The following pie charts show the proportion of projects with and without dependency declaration files, and whether these files actually contain dependency declarations.

Figure 1. Percent of projects with dependency declaration files and actual dependency(ies) declared.

We find that about 60% of biomedical projects have dependency declaration files, while for PyPI packages, that number is almost 100%. That is expected, as the top PyPI projects are written to be reproducible: they are downloaded by a large group of people and if they are not working due to lack of dependency declarations, it would be noticed immediately by the users.

Interestingly, we found that some biomedical projects (6.8%) and PyPI packages (16.0%) have dependency declaration files with no dependencies listed inside them. This might be because they genuinely have no third-party dependencies, but more commonly it is a symptom of either:

setup.py files with complex dependency calculations: although FawltyDeps supports parsing simple setup.py files with a single setup()call and no computation involved for setting the install_requires and extras_require arguments, it is currently not able to analyze more complex scenarios.

pyproject.toml might be used to configure tools with sections like [tool.black] or [tool.isort], and declaring dependencies (and other project metadata) in the same file is not strictly required.

For the remainder of the analysis, we do not take these cases into account.

We then examined how different package types utilize various dependency declaration methods. The following chart shows the distribution of requirements.txt, pyproject.toml, and setup files across biomedical projects and PyPI packages (note that these three categories are not exclusive):

Figure 2. Percent of projects with dependencies declared in `requirements.txt`, `pyproject.toml` and setup files.

For biomedical projects, requirements.txt and setup.py/setup.cfg files are a majority of declaration files. In contrast, PyPI projects show a higher occurrence of pyproject.toml compared to biomedical projects. pyproject.toml is a suggested modern way of declaring dependencies. This result should not come as a surprise: top PyPI projects are actively maintained and are more likely to follow best practices. A requirements.txt file, on the other hand, is easier to add and if you do not need to package your projects it is a simpler option.

Now let’s have a more detailed view in which categories are exclusive:

Figure 3. Distribution of mutually exclusive dependency file choices.

For biomedical data there are a lot of projects that have either requirements.txt or setup.py/setup.cfg files (or a combination of both) present. The traditional method of using setup files utilizing setuptools to create Python packages has been around for a while and is still heavily relied upon in the scientific community.

On the PyPI side, no single method for declaring dependencies stood out, as different approaches were used with similar frequency across all projects. However, when it comes to using pyproject.toml, PyPI packages were about five times more likely to adopt this method compared to biomedical projects, suggesting that PyPI package authors tend to favor pyproject.toml significantly more often for dependency management.

Also, almost no top biomedical projects (only 2 out of 1,260) and very few PyPI packages (only 25 out of 1,118) used pyproject.toml and setup files together: it seems that projects don’t often mix the older method - setup files - with the more modern one - pyproject.toml - at the same time.

A different method of visualizing the subset of results pertaining to requirements.txt, pyproject.toml and setup.py/setup.cfg files are Venn diagrams:

Figure 4. Venn diagram of projects with dependencies declared with categories including combination of dependency files.

While these diagrams don’t contain new insights, they show clearly how much more common pyproject.toml usage is for PyPI packages.

Source code directories

We next examined where projects store their source code, which we refer to as the “source code directory”. In the following analysis, we defined this directory as the directory that contains the highest number of Python code files and does not have names like “test”, “example”, “sample”, “doc”, or “tutorial”.

Figure 5. Source code directories choices.

We can make some interesting observations: Over half (53%) of biomedical projects store their main source code in a directory with a name different than the project itself, and source code is not commonly stored in directories named src or src-python (7%). For PyPI projects, the numbers are lower, with 37% storing their main code in a directory that matches the project name. However, naming the source code directory differently from the package name is still fairly common for PyPI projects, appearing in 36% of cases. A somewhat surprising finding: the src layout, recommended by Python packaging user guide, appears in only 14% of cases.

Another noteworthy observation is that 23% of biomedical projects store all their source code in the root directory of the project. In contrast, only 12% of PyPI projects follow this pattern. This difference makes sense, as scientists working on biomedical projects might be less concerned about maintaining a strict code structure compared to developers on PyPI. Additionally, a lot of biomedical projects might be a loose collection of notebooks/scripts not intended to be packaged/importable, and thus will typically not need to add any subdirectories at all. On the other hand, everything from the PyPI data set is an importable package. Even in the “flat” layout (according to discussion), related modules are collected in a subdirectory named after the package.

The top PyPI projects that keep their code in the root directory are often small Python modules or plugins, like “python-json-patch”, “appdirs”, and “python-json-pointer”. These projects usually have all their source code in a single file, so storing it in the root directory makes sense.

Key results

Many people have preconceptions about how a Python project should look, but the reality can be quite different. Our analysis reveals distinct differences between top PyPI projects and biomedical projects:

PyPI projects tend to use modern tools like pyproject.toml more frequently, reflecting better overall project structure and dependency management practices.

In contrast, biomedical projects display a wide variety of practices; some store code in the root directory and fail to declare dependencies altogether.

This discrepancy is partially explained by the selection criteria: popular PyPI packages, by necessity, must be usable and thus correctly declare their dependencies, while biomedical projects accompanying scientific papers do not face such stringent requirements.

Conclusion

We found that biomedical projects are written with less attention to the coding best practices, which compromises their reproducibility. There are many projects without dependencies declared. The use of pyproject.toml, which is current state-of-the-art way to declare dependencies is less frequently present in biomedical packages. In our opinion, though, it’s essential for any package to adhere to the same high standards of reproducibility as top PyPI packages. This includes implementing robust dependency management practices and embracing modern packaging standards. Enhancing these practices will not only improve reproducibility but also foster greater trust and adoption within the scientific community.

While our initial analysis revealed some interesting insights, we feel that there might be some more interesting treasures to be found within this dataset - you can check yourself in our FawltyDeps-analysis repository! We invite you to join the discussion on FawltyDeps and reproducibility in package management on our Discord channel.

Finally, this experiment also served as a real-world stress test for FawltyDeps itself and identified several edge cases we had not yet accounted for, suggesting avenues of further development for FawltyDeps: One of the main challenges was to parse unconventional require and extra-require sections in setup.py files. This issue has been addressed by the FawltyDeps project, specifically through the improvements made in FawltyDeps PR #440. Furthermore, it was also not trivial to handle projects with multiple packages declared in one. Addressing these issues will be a focus as we continue to refine and improve FawltyDeps.

Stay tuned as we will drill deeper into the data we’ve collected. So far, we’ve reused part of FawltyDeps‘ code for our analysis, but the next step will be to run the full FawltyDeps tool on a large number of packages. Join us as we examine how FawltyDeps performs under rigorous testing and what improvements can be made to enhance its capabilities!

For more insights, refer to our previous talk at PyData Global: Finding undeclared and unused dependencies in your notebooks and projects.↩

Reflecting away from definitions in Liquid Haskell

Thu, 12 Sep 2024 00:00:00 GMT

We’ve all been there: wasting a couple of days on a silly bug. Good news for you: formal methods have never been easier to leverage.

In this post, I will discuss the contributions I made during my internship to Liquid Haskell (LH), a tool that makes proving that your Haskell code is correct a piece of cake.

LH lets you write contracts for your functions inside your Haskell code. In other words, you write pre-conditions (what must be true when you call it) and post-conditions (what must always be true when you leave the function). These are then fed into an SMT solver that proves your code satisfies them! You may have to write a few lemmas to guide LH, but it makes verification easier than proving them completely in a proof assistant.

My contributions enhance the reflection mechanism, which allows LH to unfold function definitions in logic formulas when verifying a program. I have explored three approaches that are described in what follows.

The problem

Imagine that, in the course of your work, you wanted to define a function that inserts into an association list.

{-@ smartInsert :: k:String -> v:Int -> l:[(String, Int)] -> {res : [(String, Int)] | lookup k l = Just v || head res = (k , v) } @-} smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)] smartInsert k v l | lookup k l == Just v = l | otherwise = (k, v) : l

LH runs as a compiler plugin. While the bulk of the compiler ignores the special comments {-@ ... @-}, LH processes the annotations therein.

The annotation that you see in the first snippet is the specification of smartInsert, with the post-condition establishing that the result of the function must have the pair (k, v) at the front, or the pair must be already present in the original list.

Let us say that you also want to use that smartInsert function later in the logic or proofs, so you want to reflect it to the logic. For that, you will introduce another annotation:

{-@ reflect smartInsert @-}

This annotation is telling LH that the equations of the Haskell definition of smartInsert can be used to unfold calls to smartInsert in logic formulas.

As a human, you may agree that the specification is valid for this implementation, but you get this error from the machine:

error: Illegal type specification for `Test.smartInsert` [...] Unbound symbol GHC.Internal.List.lookup --- perhaps you meant: GHC.Internal.Base.. ?

Do not despair! This tells you that lookup is not defined in the logic. Despite lookup being a respectable function in Haskell, defined in GHC.List, LH knows nothing about it. Not all functions in Haskell can simply be used in the logic, at least not without reflecting them first. Far from being discouraged, you decide to reflect it like the others, but you realize that lookup wasn’t defined in your own module, it comes from the Prelude! This makes reflection impossible, as LH points out:

error: Cannot lift Haskell function `lookup` to logic "lookup" is not in scope

If you consider for a moment, LH needs the definition of the function in order to reflect it. So it can only complain when it is asked to reflect a function whose definition is not available because it was defined in some library dependency.

This is a recurring problem, especially when working with dependencies, and this is exactly what I have been working on during this internship at Tweag, in three different ways, as described below.

Idea #1: Define our own reflection of the function

Your first thought might be: “if I cannot reflect lookup because it comes from a foreign library, I will just define my own version of it myself”. Even better would be if you could still link your custom definition of lookup to the original symbol. Creating this link was my first contribution.

Step one is to define the pretend function. For this to work out correctly in the end, its definition must be equivalent to the original definition of the imported function.

The definition of the pretend function might look like this:

myLookup :: Eq a => a -> [(a, b)] -> Maybe b myLookup _ [] = Nothing myLookup key ((x, y):xys) | key == x = Just y | otherwise = myLookup key xys

So far, so good. Of course, we give it a different name from the actual function, as they refer to different definitions, and we want to be able to refer to both so that we can link them together later.

Now, we reflect this myLookup function, which LH has no problem doing, since this reflect command is located in the same module as its definition.

{-@ reflect myLookup @-}

Then, the magic happens with this annotation that links the two lookups together:

{-@ assume reflect lookup as myLookup @-}

Read it as “reflect lookup, assuming that its definition is the same as myLookup”. This is enough to get the smartInsert function verified. Just for the record, here is the working snippet:

{-@ reflect myLookup @-} myLookup :: Eq a => a -> [(a, b)] -> Maybe b myLookup _ [] = Nothing myLookup key ((x, y):xys) | key == x = Just y | otherwise = myLookup key xys {-@ assume reflect lookup as myLookup @-} {-@ reflect smartInsert smartInsert :: k:String -> v:Int -> l:[(String, Int)] -> {res : [(String, Int)] | lookup k l = Just v || head res = (k , v) } @-} smartInsert :: String -> Int -> [(String, Int)] -> [(String, Int)] smartInsert k v l | lookup k l == Just v = l | otherwise = (k, v) : l

The question you may be asking at this point is: why does it work?

In order to verify the code, LH has to prove side-conditions (called subtyping relations) between the actual output and the post-condition to be verified. For the first equation of smartInsert, it needs to be proved that

lookup k l = Just v && res = l => lookup k l = Just v || head res = (k , v)

For the second equation, it needs to be proved that

res = (k, v) : l => lookup k l = Just v || head res = (k , v)

Because we started with such a simple example, the reflection of lookup is actually unused here (even though LH conservatively insists on it). But that’s just a coincidence; in fact, we can use a more direct post-condition that does actually use the reflection:

{-@ smartInsert :: k:String -> v:Int -> l:[(String, Int)] -> {res : [(String, Int)] | lookup k res = Just v} @-}

This time, the subtyping constraints require proving:

-- constraint for the first equation lookup k l = Just v && res = l => lookup k res = Just v -- constraint for the second equation res = (k, v) : l => lookup k res = Just v

The first constraint can still be solved without going into the definition of lookup. But the second constraint isn’t something that we can prove for any definition of lookup. Thanks to reflection, we have the following unfoldings at our disposal:

lookup key l = myLookup k l myLookup key l = if isEmpty l then Nothing else if key = fst (head l) then Just (snd (head l)) else myLookup key (tail l)

The first equality is from assume-reflection. It links the pretend and actual functions. The second one is the reflection of myLookup.

With that in mind, let’s move on to prove the second constraint. We reduce the left-hand side to the right-hand side.

lookup k res = lookup k ((k, v):l) (hypothesis) = myLookup k ((k, v) : l) (lookup unfolding) = Just v (myLookup unfolding)

Q.E.D. Furthermore, you notice that the equation connecting lookup and myLookup was crucial. That is the gist of what we added to LH to make the proof work.

In addition to the implementation, I contributed a specification of assume-reflection that spells out the validation of the new annotation and the resolution rules when the same function is assume-reflected at different locations. It is worth noting that if there exist two assume-reflections in your imports that contradict each other, then one of them must be false, so your axiom environment will not be sound.

Idea #2: opaque reflection

We noted already that we didn’t truly need to know what lookup was about to prove the first, simpler specification, namely:

{-@ smartInsert :: k:String -> v:Int -> l:[(String, Int)] -> {res : [(String, Int)] | lookup k res = Just v || head res = (k, v) } @-}

The only issue we had was that lookup was not defined in the logic. Similarly, it is possible that our own functions to be reflected use imported, unreflected functions whose content is irrelevant. We want to reflect the expressions of our functions, but do not care about the expression of some of the functions that appear inside them. Here, we want to reflect smartInsert, which contains lookup, but we don’t need to know exactly what lookup is about to prove our lemmas. Either lookup comes from a dependency, or it has a non-trivial implementation, or it uses primitives not implemented in Haskell.

We allowed this through what we call opaque reflection. Opaque reflection introduces a symbol, without any equation, for all the symbols in your reflections that aren’t defined yet in the logic.

For instance, when reflecting the definition of smartInsert,

smartInsert k v l | lookup k l == Just v = l | otherwise = (k, v) : l

LH looks for any free symbols in there that are not present in the logic. Here, it will see that lookup is something new to the logic, and it will introduce an uninterpreted function for it. Uninterpreted functions are symbols used by the SMT solver, for which it only knows it satisfies function congruence, i.e. that if two values are equal v = w, then when the function is applied to them, the result is still the same f v = f w.

As it turns out, we could also do that manually using the measure annotation. These annotations let you introduce an uninterpreted function in the logic yourself, and specify the refinement type of it.

For instance, we could define a measure like this:

{-@ measure GHC.Internal.List.lookup :: k:a -> xs:[(a, b)] -> Maybe b GHC.Internal.List.lookup :: k:a -> xs:[(a, b)] -> {VV : Maybe b | VV == GHC.Internal.List.lookup k xs} @-}

The measure annotation creates an uninterpreted function with the same name as the function in the Haskell code. The second line links both the uninterpreted and Haskell functions by strengthening the post-condition of the Haskell function with the uninterpreted function from the logic.

The new opaque reflection does all that for you automatically! It’s even more powerful when you think about imports. If two modules are opaque-reflecting the same function from some common import, the uninterpreted symbols are considered the same because they refer to the same thing.

Whereas, if you were to use measure annotations in both imports for the same external functions (say, lookup), and then to import those in another module, LH would complain about it. Indeed, there can not be two measures with identical names in scope. Since LH doesn’t know what you’re using those measures for, or whether they actually stand for the same uninterpreted function, it cannot resolve the ambiguity. The full specification is here.

Idea #3: Using the unfoldings

At this point, someone might object that Haskell can inline even imported functions when optimizing the code, so it must have access to the original definitions. As such, there is no need for assume-reflection or opaque-reflection, if we could just reflect the function definition wherever the optimizer finds it.

It is indeed the case for some functions, and under some circumstances (note the precautions I’m taking here), that some information about the implementation of functions is passed in interface files.

What are interface files? These are the files that contain the information that the other modules need to know. Part of this information is the unfoldings of the exported functions, in a syntax that is slightly different from the GHC’s CoreExprs, but can easily be converted to it.

After some experimentation, I observed that the unfoldings of many functions are available in interface files, unless prevented by the -fignore-interface-pragmas or -fomit-interface-pragmas flags (note that -O0 implies those flags, but -O1 does not). Since most packages are compiled with at least -O1, the unfolding of many functions are available without any further tuning. In particular, those functions that are small enough to be included in the interface files are available.

Once implemented, it suffices to use the same reflect annotation as before, but this time even for imported functions!

{-@ reflect flip -@}

LH will automatically detect if this function is defined in the current module or in the dependencies, and in the latter case it will look for possible unfoldings.

Unfortunately, these unfoldings turned out to have some drawbacks.

The presence of these unfoldings depends on some GHC flags, and heuristics from GHC. As such, it’s possible for a new version of a library to suddenly exclude an unfolding without the library author realizing it. This predicament is akin to that of the HERMIT tool, and it is difficult to solve without rebuilding the dependencies with custom configuration.

The unfoldings are based on the optimized version of the functions, which is sometimes harder to reason about. Also, it is subject to change if the GHC optimizations change, which means that any proof based on these unfoldings could be broken by a change to those optimizations.

Many functions are not possible to reflect as they are. If they use local recursive definitions, or lambda abstractions, LH cannot reflect them at the moment.

If the unfolding of a function depends on non-exported definitions, LH does not offer a mechanism to request these definitions to be reflected. Even if it did, this breaks encapsulation to some point, and makes our code dependent on internal implementation details of imported code, to the point where even a dot release could break the verification.

Reflections are still limited in their capabilities. At the time of writing, reflected functions cannot contain lambda abstractions or local recursive bindings. Recursive bindings are allowed, but local ones are not, since LH has no sense of locality (yet). Because unfoldings tend to have a lot of these, we cannot reflect them (yet).

For these reasons, further work and experimentation will be needed to make this approach truly useful. Nevertheless, we have included the implementation in a PR in the hope that it may be helpful in some cases, and that improving the capabilities of reflections in general will make it more and more valuable.

Conclusion

Liquid Haskell’s reflection is handy and powerful, but if your function uses some dependencies that are not yet reflected, you were stuck. We presented three ways to proceed: assert an equivalence between the imported function and a definition in the current module (ideally copy-pasted from the original source file), introduce some uninterpreted function in the logic for dependencies, or try to find the unfoldings of those dependencies in interface files.

All of these features have been implemented and pulled into Liquid Haskell. The implementation fits well into LH’s machinery, reusing the existing pipeline for uninterpreted symbols and reflections. We also added tests, especially for module imports, and checked the implementation against the numerous regression tests already in place. An enticing next step would be to improve the capabilities of reflection, which would also allow diving deeper into the reflection of unfoldings in interface files.

I hope this will improve the ease of proof-writing in LH, and that reading this post will encourage you to write more specifications and proofs about your code, seeing how much of a breeze it can be!

I would like to thank Tweag for this wonderful opportunity to work on Liquid Haskell; it has been an enriching internship that has allowed me to grow in Haskell experience and in contributing to large codebases. In particular, I’d like to express my heartfelt thanks to my supervisor, Facundo Domínguez, for his constant support, guidance, and invaluable assistance.

Adding algebraic data types to Nickel

Thu, 05 Sep 2024 00:00:00 GMT

Our Nickel language is a configuration language. It’s also a functional programming language. Functional programming isn’t a well-defined term: it can encompass anything from being vaguely able to pass functions as arguments and to call them (in that respect, C and JavaScript are functional) to being a statically typed, pure and immutable language based on the lambda-calculus, like Haskell.

However, if you ask a random developer, I can guarantee that one aspect will be mentioned every time: algebraic data types (ADTs) and pattern matching. They are the bread and butter of typed functional languages. ADTs are relatively easy to implement (for language maintainers) and easy to use. They’re part of the 20% of the complexity that makes for 80% of the joy of functional programming.

But Nickel didn’t have ADTs until recently. In this post, I’ll tell the story of Nickel and ADTs, starting from why they were initially lacking, the exploration of different possible solutions and the final design leading to the eventual retro-fitting of proper ADTs in Nickel. This post is intended for Nickel users, for people interested in configuration management, but also for anyone interested in programming language design and functional programming. It doesn’t require prior Nickel knowledge.

A quick primer on Nickel

Nickel is a gradually typed, functional, configuration language. From this point, we’ll talk about Nickel before the introduction of ADTs in the 1.5 release, unless stated otherwise. The core language features:

let-bindings: let extension = ".ncl" in "file.%{extension}"

first-class functions: let add = fun x y => x + y in add 1 2

records (JSON objects): {name = "Alice", age = 42}

static typing: let mult : Number -> Number -> Number = fun x y => x * y. By default, expressions are dynamically typed. A static type annotation makes a definition or an inline expression typechecked statically.

contracts look and act almost like types but are evaluated at runtime: { port | Port = 80 }. They are used to validate configurations against potentially complex schemas.

The lifecycle of a Nickel configuration is to be 1) written, 2) evaluated and 3) serialized, typically to JSON, YAML or TOML. An important guideline that we set first was that every native data structure (record, array, enum, etc.) should be trivially and straightforwardly serializable to JSON. In consequence, Nickel started with the JSON data model: records (objects), arrays, booleans, numbers and strings.

There’s one last primitive value: enums. As in C or in JavaScript, an enum in Nickel is just a tag. An enum value is an identifier with a leading ', such as in {protocol = 'http, server = "tweag.io"}. An enum is serialized as a string: the previous expression is exported to JSON as {"protocol": "http", "server": "tweag.io"}.

So why not just using strings? Because enums can better represent a finite set of alternatives. For example, the enum type [| 'http, 'ftp, 'sftp |] is the type of values that are either 'http, 'ftp or 'sftp. Writing protocol : [| 'http, 'ftp, 'sftp |] will statically (at typechecking time) ensure that protocol doesn’t take forbidden values such as 'https. Even without static typing, using an enum conveys to the reader that a field isn’t a free-form string.

Nickel has a match which corresponds to C or JavaScript’s switch:

is_http : [| 'http, 'ftp, 'sftp |] -> Bool = match { 'http => true, _ => false, }

As you might notice, there are no ADTs in sight yet.

ADTs in a configuration language

While Nickel is a functional language, it’s first and foremost a configuration language, which comes with specific design constraints.

Because we’re telling the story of ADTs before they landed in Nickel, we can’t really use a proper Nickel syntax yet to provide examples. In what follows, we’ll use a Rust-like syntax to illustrate the examples: enum Foo { Bar(i32), Baz(bool, T) } is an ADT parametrized by a generic type T with two constructors Bar and Baz, where the first one takes an integer as an argument and the other takes a pair of a boolean and a T. Concrete values are written as Bar(42) or Baz(true, "hello").

An unexpected obstacle: serialization

As said earlier, we want values to be straightforwardly serializable to the JSON data model.

Now, take a simple ADT such as enum Foo = { SomePair(T,U), Nothing }. You can find reasonable serializations for SomePair(1,2), such as {"tag": "SomePair", "a": 1, "b": 2}. But why not {"flag": "SomePair", "0": 1, "1": 2} or {"mark": "SomePair", "data": [1, 2]}? While those representations are isomorphic, it’s hard to know the right choice for the right use-case beforehand, as it depends on the consumer of the resulting JSON. We really don’t want to make an arbitrary choice on behalf of the user.

Additionally, while ADTs are natural for a classical typed functional language, they might not entirely fit the configuration space. A datatype like enum Literal { String(String), Number(Number) } that can store either a string or a number is usually represented directly as an untagged union in a configuration, that {"literal": 5} or {"literal": "hello"}, instead of the less natural tagged union (another name for ADTs) {"literal": {"tag": "Number", "value": 5}}.

This led us to look at (untagged) union types instead. Untagged unions have the advantage of not making any choice about the serialization: they aren’t a new data structure, as are ADTs, but rather new types (and contracts) to classify values that are already representable.

The road of union types

A union type is a type that accepts different alternatives. We’ll use the fictitious \/ type combinator to write a union in Nickel (| is commonly used elswhere but it’s already taken in Nickel). Our previous example of a literal that can be either a string or a number would be {literal: Number \/ String}. Those types are broadly useful independently of ADTs. For example, JSON Schema features unions through the core combinator any_of.

Our hope was to kill two birds with one stone by adding unions both as a way to better represent existing configuration schemas, but also as a way to emulate ADTs. Using unions lets users represent ADTs directly as plain records using their preferred serialization scheme. Together with flow-sensitive typing, we can get as expressive as ADTs while letting the user decide on the encoding. Here is an example in a hypothetical Nickel enhanced with unions and flow-sensitive typing:

let sum : {tag = 'SomePair, a : Number, b : Number} \/ {tag = 'Nothing} -> Number = match { {tag = 'SomePair, a, b} => a + b, {tag = 'Nothing} => 0, }

Using unions and flow-sensitive typing as ADTs is the approach taken by TypeScript, where the previous example would be:

type Foo = { tag: "SomePair"; a: number; b: number } | { tag: "Nothing" } function sum(value: Foo): number { switch (value.tag) { case "SomePair": return value.a + value.b case "Nothing": return 0 } }

In Nickel, any type must have a contract counter-part. Alas union and intersection contracts are hard (in fact, union types alone are also not a trivial feat to implement!). In the linked blog post, we hint at possible pragmatic solutions for union contracts that we finally got to implement for Nickel 1.8. While sufficient for practical union contracts, this is far from the general union types that could subsume ADTs. This puts a serious stop to the idea of using union types to represent ADTs.

What are ADTs really good for?

As we have been writing more and more Nickel, we realized that we have been missing ADTs a lot for library functions - typically the types enum Option { Some(T), None } and Result = { Ok(T), Error(E) } - where we don’t care about serialization. Those ADTs are “internal” markers that wouldn’t leak out to the final exported configuration.

Here are a few motivating use-cases.

std.string.find

std.string.find is a function that searches for a substring in a string. Its current type is:

String -> String -> { matched : String, index : Number, groups : Array String }

If the substring isn’t found, {matched = "", index = -1, groups []} is returned, which is error-prone if the consumer doesn’t defend against such values. We would like to return a proper ADT instead, such as Found {matched : String, index : Number, groups : Array String} or NotFound, which would make for a better and a safer interface¹.

Contract definition

Contracts are a powerful validation system in Nickel. The ability to plug in your own custom contracts is crucial.

However, the general interface to define custom contracts can seem bizarre. Custom contracts need to set error reporting data on a special label value and use the exception-throwing-like std.contract.blame function. Here is a simplified definition of std.number.Nat which checks that a value is natural number:

fun label value => if std.typeof value == 'Number then if value % 1 == 0 && value >= 0 then value else let label = std.contract.label.with_message "not a natural" in std.contract.blame label else let label = std.contract.label.with_message "not a number" in std.contract.blame label

There are good (and bad) reasons for this situation, but if we had ADTs, we could cover most cases with an alternative interface where custom contracts return a Result, which is simpler and more natural:

fun value => if std.typeof value == 'Number then if value % 1 == 0 && value >= 0 then Ok else Error("not a natural") else Error("not a number")

Of course, we could just encode this using a record, but it’s just not as nice.

Let it go, let it go!

The list of other examples of using ADTs to make libraries nicer is endless.

Thus, for the first time, we decided to introduce a native data structure that isn’t serializable.

Note that this doesn’t break any existing code and is forward-compatible with making ADTs serializable in the future, should we change our mind and settle on one particular encoding. Besides, another feature is independently explored to make serialization more customizable through metadata, which would let users use custom (de)serializer for ADTs easily.

Ok, let’s add the good old-fashioned ADTs to Nickel!

The design

Structural vs nominal

In fact, we won’t exactly add the old-fashioned version. ADTs are traditionally implemented in their nominal form.

A nominal type system (such as C, Rust, Haskell, Java, etc.) decides if two types are equal based on their name and definition. For example, values of enum Alias1 { Value(String) } and enum Alias2 { Value(String) } are entirely interchangeable in practice, but Rust still doesn’t accept Alias1::Value(s) where a Alias2 is expected, because those types have distinct definitions. Similarly, you can’t swap a class for another in Java just because they have exactly the same fields and methods.

A structural type system, on the other hand, only cares about the shape of data. TypeScript has a structural type system. For example, the types interface Ball { diameter: number; } and interface Sphere { diameter: number; } are entirely interchangeable, and {diameter: 42} is both a Ball and a Sphere. Some languages, like OCaml² or Go³, mix both.

Nickel’s current type system is structural because it’s better equipped to handle arbitrary JSON-like data. Because ADTs aren’t serializable, this consideration doesn’t weight as much for our motivating use-cases, meaning ADTs could still be either nominal or structural.

However, nominal types aren’t really usable without some way of exporting and importing type definitions, which Nickel currently lacks. Structural ADTs look like the better choice for Nickel. We can build, typecheck, and match on ADTs locally without having to know or to care about any type declaration. Structural ADTs are a natural extension of Nickel (structural) enums, syntactically, semantically, and on the type level, as we will see.

While less common, structural ADTs do exist in the wild and they are pretty cool. OCaml has both nominal ADTs and structural ADTs, the latter being known as polymorphic variants. They are an especially powerful way to represent a non trivial hierarchy of data types with overlapping, such as abstract syntax trees or sets of error values.

Syntax

C-style enums are just a special case of ADTs, namely ADTs where constructors don’t have any argument. The dual conclusion is that ADTs are enums with arguments. We thus write the ADT Some("hello") as an enum with an argument in Nickel: 'Some "hello".

We apply the same treatment to types. [| 'Some, 'None |] was a valid enum type, and now [| 'Some String, 'None |] is also a valid type (which would correspond to Rust’s Option).

There is a subtlety here: what should be the type inferred for 'Some now? In a structural type system, 'Some is just a free-standing symbol. The typechecker can’t know if it’s a constant that will stay as it is - and thus has the type [| 'Some |] - or a constructor that will be eventually applied, of type a -> [| 'Some a |]. This difficulty just doesn’t exist in a nominal type system like Rust: there, Option::Some refers to a unique, fixed ADT constructor that is known to require precisely one argument.

To make it work, 'Ok 42 isn’t actually a normal function application in Nickel: it’s an ADT constructor application, and it’s parsed differently. We just repurpose the function application syntax⁴ in this special case. 'Ok isn’t a function, and let x = 'Ok in x 42 is an error (applying something that isn’t a function).

You can still recover Rust-style constructors that can be applied by defining a function (eta-expanding, in the functional jargon): let ok = fun x => 'Ok x.

We restrict ADTs to a single argument. You can use a record to emulate multiple arguments: 'Point {x = 1, y = 2}.

ADTs also come with pattern matching. The basic switch that was match is now a powerful pattern matching construct, with support for ADTs but also arrays, records, constant, wildcards, or-patterns and guards (if side-conditions).

Typechecking

Typechecking structural ADTs is a bit different from nominal ADTs. Take the simple example (the enclosing : _ annotation is required to make the example statically typed in Nickel)

( let data = 'Ok 42 in let process = match { 'Ok x => x + 1, 'Error => 0, } in process data ) : _

process is inferred to have type [| 'Ok Number, 'Error |] -> Number. What type should we infer for data = 'Ok 42? The most obvious one is [| 'Ok Number |]. But then [| 'Ok Number |] and [| 'Ok Number, 'Error |] don’t match and process data doesn’t typecheck! This is silly, because this example should be perfectly valid.

One possible solution is to introduce subtyping, which is able to express this kind of inclusion relation: here that [| 'Ok Number |] is included in [| 'Ok Number, 'Error |]. However, subtyping has some defects and is whole can of worms when mixed with polymorphism (which Nickel has).

Nickel rather relies on another approach called row polymorphism, which is the ability to abstract over not just a type, as in classical polymorphism, but a whole piece of an enum type. Row polymorphism is well studied in the literature, and is for example implemented in PureScript. Nickel already features row polymorphism for basic enum types and for records types.

Here is how it works:

let process : forall a. [| 'Ok Number, 'Error; a |] -> Number = match { 'Ok x => x + 1, 'Error => 0, _ => -1, } in process 'Other

Because there’s a catch-all case _ => -1, the type of process is polymorphic, expressing that it can handle any other variant beside 'Ok Number and 'Error (this isn’t entirely true: Ok String is forbidden for example, because it can’t be distinguished from Ok Number). Here, a can be substituted for a subsequence of an enum type, such as 'Foo Bool, 'Bar {x : Number}.

Equipped with row polymorphism, we can infer the type forall a. [| 'Ok Number; a |]⁵ for 'Ok 42. When typechecking process data in the original example, a will be instantiated to the single row 'Error and the example typechecks. You can learn more about structural ADTs and row polymorphism in the corresponding section of the Nickel user manual.

Conclusion

While ADTs are part of the basic package of functional languages, Nickel didn’t have them until relatively recently because of peculiarities of the design of a configuration language. After exploring the route of union types, which came to a dead-end, we settled on a structural version of ADTs that turns out to be a natural extension of the language and didn’t require too much new syntax or concepts.

ADTs already prove useful to write cleaner and more concise code, and to improve the interface of libraries, even in a gradually typed configuration language. Some concrete usages can be found in try_fold_left and validators already.

Unfortunately, we can’t change the type of std.string.find without breaking existing programs (at least not until a Nickel 2.0), but this use-case still applies to external libraries or future stdlib functions↩

In OCaml, Objects, polymorphic variants and modules are structural while records and ADTs are nominal.↩

In Go, interfaces are structural while structs are nominal.↩

Repurposing application is theoretically backward incompatible because 'Ok 42 was already valid Nickel syntax before 1.5, but it was meaningless (an enum applied to a constant) and would always error out at runtime, so it’s ok.↩

In practice, we infer a simpler type [| 'Ok Number; ?a |] where ?a is a unification variable which can still have limitations. Interestingly, we decided early on to not perform automatic generalization, as opposed to the ML tradition, for reasons similar to the ones exposed here. Doing so, we get (predicative) higher-rank polymorphism almost for free, while it’s otherwise quite tricky to combine with automatic generalization. It turned out to pay off in the case of structural ADTs, because it makes it possible to side-step those usual enum types inclusion issues (widening) by having the user add more polymorphic annotations. Or we could even actually infer the polymorphic type forall a. [| 'Ok Number; a |] for literals.↩

Deploying Buildbarn on Kubernetes with mTLS on the side

Thu, 29 Aug 2024 00:00:00 GMT

We have shown the benefits of using a shared build cache as well as using remote build execution (RBE) to offload builds to a remote build farm. Our customers are interested in leveraging RBE to improve developer experience and reduce continuous integration (CI) run times, giving us an opportunity to learn all aspects of deploying different RBE solutions. I would like to share how one can deploy one of them, Buildbarn, and secure all communications in it.

What is it and why do we care?

We want developers to be productive. Being productive requires spending as little time as possible waiting for build/test feedback, not having to switch to a different task while the build is running.

Remote caching

One part of achieving this is to never build the same thing twice. Tools like Bazel support caching the result of every action, every tool execution. While many tools support storing results in a local directory, Bazel tracks the actions and their inputs with high granularity, resulting in more frequent “cache hits”. This is already a good gain for a single developer working on one machine. However Bazel also supports conducting builds in a controlled environment with identical tooling and using a remote cache that can be shared between team members and CI, taking things a significant step further. You won’t have to rebuild anything that has been built by your colleagues or by CI, which means starting up on a new machine, onboarding a new team member or reproducing issues becomes faster.

Remote build execution

The second part of keeping developers productive is allowing them to use the right tools for the job. They still often need to build new things, and their local machine may be not be the fastest, not have enough charge or have the wrong architecture or OS. Remote build execution extends remote caching by executing actions on shared builders when their results are not cached already. This allows setting up a shared pool of necessary hardware or virtual compute for both developers and CI. In Bazel this was implemented using RBE API.

RBE implementations

Since the last post, RBE for Google Cloud Platform (GCP) has disappeared, and several new self-service and commercial services have been created. The RBE API has also gained popularity with different build systems, including Bazel (where it started), Buck2, and BuildStream. It is also used in projects that cannot change their build systems easily, but can use reclient to wrap all build actions and forward them to an RBE service. Examples of such setup include Android, Fuchsia and Chromium.

We’ll focus on one of opensource RBE API servers, Buildbarn.

Securing remote cache and builds

Any shared infrastructure implies some security risks. When sending code to be built remotely we expose it on the network, where it can be intercepted or altered. When reading from the cache, we trust it to contain valid, unaltered results. When setting up a pool of compute resources, we expect them to be used only for building our code, and not for enriching third parties. All these expectations mean that we require all communications with remote infrastructure and within it to be encrypted and authenticated. The industry standard for achieving this is mTLS: Transport Layer Security (TLS) protocol with mutual authentication. It uses public key infrastructure (PKI) to allow both clients and servers to verify each other’s identities before sending any data, and makes sure that the data sent on one side matches the data received on the other side.

Overview

In this extended blog post we’ll start by showing how to deploy Buildbarn on a Kubernetes cluster running in a local VM and configure a simple Bazel example to use it. Then we’ll turn on mTLS with the help of cert-manager for all Buildbarn pieces communicating with one another, and, finally, configure Bazel on a developer or CI machine to authenticate over the RBE API with a certificate and verify the one presented by the build server.

This blog post contains a lot of code snippets that let you follow the installation process step by step. If you copy each command into your terminal in order, you should see the same results as described. If you prefer to jump to the final result and look at the complete picture, you can check out our fork of the upstream buildbarn/bb-deployments repository and follow the instructions there.

Deploying Buildbarn

In this section we’ll create a local Buildbarn deployment on a Kubernetes cluster running in a VM. We’ll create a local VM with Kubernetes using an example config provided by lima. Then we’ll configure persistent volumes for Buildbarn storage inside that VM. After that we’ll use the Kubernetes example from a repository provided by Buildbarn to deploy Buildbarn itself.

Setting up a Kubernetes instance

If you already have access to a Kubernetes cluster that you can use, you can skip this section. Here we’ll deploy a local VM with Kubernetes running in it. In subsequent steps below it’s assumed that you’re using a local VM, so you’ll have to adjust some parameters accordingly if you use different means.

I’ve found that the easiest and most portable way to get a Kubernetes running locally is using the lima (Linux Machines) project. You can follow the official docs to install it. I prefer using Nix and direnv, so I’ve created a .envrc file with one line use nix and shell.nix with the following contents:

{ nixpkgs ? builtins.getFlake "nixpkgs" , system ? builtins.currentSystem , pkgs ? nixpkgs.legacyPackages.${system} }: pkgs.mkShell { packages = with pkgs; [ kubectl lima-bin jq ]; }

Then you just need to run direnv allow and it will fetch the necessary packages and make them available in your shell.

Now we can create a Lima VM from the k8s template. We remove mounts from the template to specify our own later. We also need to add some special options for running on macOS:

limactl create template://k8s --name k8s --tty=false \ --set '.provision |= . + {"mode":"system","script":"#!/bin/bash for d in /mnt/fast-disks/vol{0,1,2,3}; do sudo mkdir -p $d; sudo mount --bind $d $d; done"}' \ $([ "$(uname -s)" = "Darwin" ] && { echo "--vm-type vz"; [ "$(uname -m)" = "arm64" ] && echo "--rosetta"; })

Here arguments are:

--name k8s sets a name for the new VM; it defaults to the template name, but let’s keep it explicit

--set '.provision ...' uses a jq expression to add an additional provision step to the resulting YAML file creating necessary mountpoints for persistent volumes

--tty=false disables console prompts and confirmations

for macOS we also add --vm-type vz to use the native macOS Virtualization framework instead of QEMU for a faster VM

for Apple Silicon we also add --rosetta to enable the translation layer, allowing us to run x86_64 containers in the VM with little overhead

You can start the final VM and check if it is ready with:

limactl start k8s export KUBECONFIG=~/.lima/k8s/copied-from-guest/kubeconfig.yaml kubectl get node

It will take some time to bootstrap Kubernetes, after which it should show you one node called lima-k8s with Ready status:

NAME STATUS ROLES AGE VERSION lima-k8s Ready control-plane 4m54s v1.29.2

Buildbarn will need some PersistentVolumes to store data. Let’s teach it to use the mounts that we created earlier for that. First, configure a storage class:

kubectl apply -f - <<EOF apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-disks annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer EOF

It should respond with storageclass.storage.k8s.io/fast-disks created.

Then start a local volume provisioner from sig-storage-local-static-provisioner:

curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-storage-local-static-provisioner/master/deployment/kubernetes/example/default_example_provisioner_generated.yaml | kubectl apply -f -

Run kubectl get pv to see that it created four volumes. They may take several seconds to appear. You can check the provisioner’s logs for any errors with kubectl logs daemonset/local-volume-provisioner.

Deploying Buildbarn

bb-deployments provides a Kustomize template to deploy Buildbarn. Let’s clone it, patch one service so that we can run it locally, and deploy:

git clone https://github.com/buildbarn/bb-deployments.git pushd bb-deployments/kubernetes cat >> kustomization.yaml <<EOF # patch frontend service to not require external load balancers patches: - target: kind: Service name: frontend patch: | - op: replace path: /spec/type value: NodePort - op: add path: /spec/ports/0/nodePort value: 30080 EOF kubectl apply -k . kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

The last command will wait for everything to start. We’ve filtered out all messages about resources that it doesn’t know how to wait for.

To check that the Buildbarn frontend is accessible, we can use grpc-client-cli. Add it to the list in shell.nix, save it and run:

grpc-client-cli -a 127.0.0.1:30080 health

It should report that it is SERVING:

{ "status": "SERVING" }

We can exit the bb-deployments directory now:

popd

In this section we’ve deployed Buildbarn and verified that its API is accessible. Now we’ll move on to setting up a small Bazel project to use it. Then we’ll configure mTLS on Buildbarn, and finally configure Bazel to work with mTLS.

Using Buildbarn

Let’s set up a small Bazel project to use our Buildbarn instance. In this section we’ll use Bazel examples repo and show how to build it using Bazel locally and with RBE. We’ll also see how remote caching speeds up builds by caching intermediate results.

We will be using Bazelisk to fetch and run upstream distribution of Bazel. First we’ll need to install Bazelisk by adding bazelisk to shell.nix. If you are running NixOS, you will have to create an FHS environment to run Bazel. If you are running macOS and don’t have Xcode command line tools installed, you also need to provide necessary libraries to bazel invocation. Add this to your shell.nix:

pkgs.mkShell { packages = with pkgs; [ ... bazelisk ]; env = pkgs.lib.optionalAttrs pkgs.stdenv.isDarwin { BAZEL_LINKOPTS = with pkgs.darwin.apple_sdk; "-F${frameworks.Foundation}/Library/Frameworks:-L${objc4}/lib"; BAZEL_CXXOPTS = "-I${pkgs.libcxx.dev}/include/c++/v1"; }; # fhs is only used on NixOS passthru.fhs = (pkgs.buildFHSUserEnv { name = "bazel-userenv"; runScript = "zsh"; # replace with your shell of choice targetPkgs = pkgs: with pkgs; [ libz # required for bazelisk to unpack Bazel itself ]; }).env; }

Then on NixOS you can run nix-shell -A fhs to enter an environment where directories like /bin, /usr and /lib are set up as tools made for other Linux distributions expect.

Now we can clone Bazel examples repo and enter the simple C++ example in it:

git clone --depth 1 https://github.com/bazelbuild/examples pushd examples/cpp-tutorial/stage1

On macOS we’ll need to configure compiler and linker flags to look for libraries in Nix store:

echo "build:macos --action_env=BAZEL_CXXOPTS=${BAZEL_CXXOPTS}" >> .bazelrc echo "build:macos --action_env=BAZEL_LINKOPTS=${BAZEL_LINKOPTS}" >> .bazelrc

We will be building remotely for the Linux platform later, so we should specify a concrete platform and toolchain to use for Linux:

echo "build:linux --platforms=@aspect_gcc_toolchain//platforms:x86_64_linux" >> .bazelrc echo "build:linux --extra_execution_platforms=@aspect_gcc_toolchain//platforms:x86_64_linux" >> .bazelrc

And then build and run the example locally:

bazelisk run //main:hello-world

You should see output like:

Starting local Bazel server and connecting to it... INFO: Analyzed target //main:hello-world (38 packages loaded, 165 targets configured). INFO: Found 1 target... Target //main:hello-world up-to-date: bazel-bin/main/hello-world INFO: Elapsed time: 7.545s, Critical Path: 0.94s INFO: 8 processes: 6 internal, 2 processwrapper-sandbox. INFO: Build completed successfully, 8 total actions INFO: Running command line: bazel-bin/main/hello-world Hello world

Note that if we run bazelisk run //main:hello-world again, it’ll be much faster, because Bazel only spends a fraction of a second on computing the action graph and making sure that nothing needs to be rebuilt:

... INFO: Elapsed time: 0.113s, Critical Path: 0.00s INFO: 1 process: 1 internal. INFO: Build completed successfully, 1 total action ...

We can also run bazelisk clean to remove previous output and re-run it to make sure we can rebuild from scratch.

Now let’s try building it using Buildbarn. First we need to configure execution properties to match ones set up in Buildbarn’s worker config:

echo "build:remote --remote_default_exec_properties OSFamily=linux" >> .bazelrc echo "build:remote --remote_default_exec_properties container-image=docker://ghcr.io/catthehacker/ubuntu:act-22.04@sha256:5f9c35c25db1d51a8ddaae5c0ba8d3c163c5e9a4a6cc97acd409ac7eae239448" >> .bazelrc

Then we should tell Bazel to use Buildbarn as a remote executor:

echo "build:remote --remote_executor grpc://127.0.0.1:30080" >> .bazelrc

Now we can build it with bazelisk build --config=linux --config=remote //main:hello-world. Note that it will take some time to extract the Linux compiler and supplemental files first:

INFO: Invocation ID: d70b9d30-1865-4d1f-8d52-77c6fc5ec607 INFO: Build options --extra_execution_platforms, --incompatible_enable_cc_toolchain_resolution, and --platforms have changed, discarding analysis cache. INFO: Analyzed target //main:hello-world (3 packages loaded, 6315 targets configured). INFO: Found 1 target... Target //main:hello-world up-to-date: bazel-bin/main/hello-world INFO: Elapsed time: 96.249s, Critical Path: 52.72s INFO: 5 processes: 3 internal, 2 remote. INFO: Build completed successfully, 5 total actions

As you can see, two actions were executed remotely: compilation and linking. But we can find the result locally in bazel-bin/main/hello-world (and run it if we’re on an appropriate platform):

% file bazel-bin/main/hello-world bazel-bin/main/hello-world: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 4.9.0, not stripped

Now if we clean local caches and rebuild, we can see that it reuses results already stored in Buildbarn (remote cache hits):

% bazelisk clean INFO: Invocation ID: d655d3f2-071d-48ff-b3e9-e0b1c61ae5fb INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes. % bazelisk build --config=linux --config=remote //main:hello-world INFO: Invocation ID: d38526d8-0242-4b91-92da-20ddd110d3ae INFO: Analyzed target //main:hello-world (41 packages loaded, 6315 targets configured). INFO: Found 1 target... Target //main:hello-world up-to-date: bazel-bin/main/hello-world INFO: Elapsed time: 0.663s, Critical Path: 0.07s INFO: 5 processes: 2 remote cache hit, 3 internal. INFO: Build completed successfully, 5 total actions

We can exit the examples directory now:

popd

In this section we’ve configured a Bazel project to be built using our Buildbarn instance. Now we’ll configure mTLS on Buildbarn and then finally reconfigure this Bazel project to access Buildbarn using mTLS.

Configuring TLS in Buildbarn

We want each component of Buildbarn to have its own automatically generated certificate and use it to connect to other components. On the other side, each component that accepts connections should verify that the incoming connection is accompanied by a valid certificate as well. In this section we’ll use cert-manager to generate certificates and a more secure CSI driver to request certificates and propagate them to Buildbarn components. Then we’ll configure Buildbarn components to verify both sides of each connection. Here’s how this process should look like for frontend and storage containers, for example:

Node 1 │ Kubernetes API │ Node 2 │ │ ┌─────────────────────────┐ │ │ ┌─────────────────────────┐ │ Frontend pod │ │ mTLS │ │ Storage pod │ │ bb-storage process │<───────────────────────────────────────>│ bb-storage process │ ├─────────────────────────┤ │ ┌──────────────┐ │ ├─────────────────────────┤ │ CSI volume ca.crt │ │ │ cert-manager │ │ │ ca.crt CSI volume │ │ tls.key tls.crt │ │ └─────┬────────┘ │ │ tls.crt tls.key │ └──────────^─────────^────┘ │ │ fills out │ └───^─────────^───────────┘ │ │ │ V │ │ │ generates stores │ apiVersion: cert-manager.io/v1 │ stores generates │ │ kind: CertificateRequest │ │ ┌┴─────────┴─┐ creates spec: ┌┴─────────┴─┐ │ CSI driver │────────> request: LS0tLS... │ CSI driver │ └────────────┘ status: └────────────┘ ^ retrieves certificate: ... └─────────── ca: ...

CSI driver sees CSI volume, generates a key in tls.key in there.

CSI driver uses key from tls.key to generate a Certificate Signing Request (CSR) and creates CertificateRequest resource in Kubernetes API with it.

cert-manager signs the CertificateRequest with CA certificate and puts both resulting certificate and CA certificate in the CertificateRequest’s status.

CSI driver stores them in tls.crt and ca.crt respectively in CSI volume.

bb-storage process in the frontend pod uses certificate and key from tls.crt and tls.key to establish TLS connection to the storage pod, verifying that the later presents a valid certificate signed by a CA certificate from ca.crt.

On the storage side tls.key, tls.crt and ca.crt are filled out in the similar manner

bb-storage process in the storage pod verifies the incoming certificate with CA certificate from ca.crt and presents certificate from tls.crt to the frontend.

Notice how with this approach secret keys never leave the node where they are generated and used, and the connection between frontend and storage pods is authenticated on both ends.

Installing cert-manager

To generate certificates for our Buildbarn we need to install and configure cert-manager itself and its CSI driver. cert-manager is responsible for generating and updating certificates requested via Kubernetes API objects. The CSI driver lets users create special volumes in pods where private keys are generated locally and certificates are requested from cert-manager and provided to the pod.

First, let’s fetch all necessary manifests and add them to our deployment. The cert-manager project publishes a ready-to-use Kubernetes manifest, so we can manually fetch it:

pushd bb-deployments/kubernetes curl -LO https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.yaml

And then add it to the resources section of our kustomization.yaml:

resources: - ... - cert-manager.yaml

Unfortunately, the cert-manager CSI driver doesn’t directly provide a k8s manifest, but rather a Helm chart. Add kubernetes-helm to your shell.nix and then run:

helm template -n cert-manager -a storage.k8s.io/v1/CSIDriver https://charts.jetstack.io/charts/cert-manager-csi-driver-v0.7.1.tgz > cert-manager-csi-driver.yaml

-a storage.k8s.io/v1/CSIDriver makes sure that chart uses the latest version of the Kubernetes API to register itself.

Then we can add it to resources section of our kustomization.yaml:

resources: - ... - cert-manager.yaml - cert-manager-csi-driver.yaml

Let’s deploy and wait for everything to start. We will use cmctl to check that cert-manager is working correctly, so you’ll need to add it to shell.nix.

kubectl apply -k . kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode" cmctl check api --wait 10m kubectl get csinode -o yaml

cmctl should report The cert-manager API is ready, and the last command should output your only node with one driver called csi.cert-manager.io installed:

namespace/buildbarn unchanged namespace/cert-manager created ... mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created ... The cert-manager API is ready apiVersion: v1 items: - apiVersion: storage.k8s.io/v1 kind: CSINode metadata: ... name: lima-k8s ... spec: drivers: - name: csi.cert-manager.io nodeID: lima-k8s topologyKeys: null kind: List metadata: resourceVersion: ""

If it says drivers: null, re-run kubectl get csinode -o yaml a bit later to allow more time for driver deployment and startup.

Creating CA certificate

First we need to create a CA certificate and an Issuer that cert-manager will use to generate certificates for our needs. Note that to generate a self-signed certificate we’ll also need to create another issuer. Put this in ca.yaml:

apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: selfsigned namespace: buildbarn spec: selfSigned: {} --- apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: ca namespace: buildbarn spec: isCA: true commonName: ca secretName: ca privateKey: algorithm: ECDSA size: 256 issuerRef: name: selfsigned kind: Issuer group: cert-manager.io --- apiVersion: cert-manager.io/v1 kind: Issuer metadata: name: ca namespace: buildbarn spec: ca: secretName: ca

Then add it to resources section of our kustomization.yaml:

resources: - ... - ca.yaml

And apply it and check their status:

kubectl apply -k . kubectl -n buildbarn get issuers -o wide

Both issuers should be there, and ca issuer should have the Signing CA verified status:

NAME READY STATUS AGE ca True Signing CA verified 14s selfsigned True 14s

If it says something like secrets "ca" not found, it means it needs some time to generate the certificate. Re-run kubectl -n buildbarn get issuers -o wide.

Generating certificates for Buildbarn components

As mentioned before, we will be generating certificates for each component using cert-manager’s CSI driver. To do this, we need to add a volume to each pod and mount it into the main container so that the service can read it. We also need to pass CA certificate into all these containers to verify other side of each connection. Unfortunately, Buildbarn doesn’t support reading these from file, so we’ll have to pass it statically via config. Let’s prepare this config file using this command that reads the CA certificate via the Kubernetes API and formats it using jq into a JSON string:

kubectl -n buildbarn get certificaterequests ca-1 -o jsonpath='{.status.ca}' | base64 -d | jq --raw-input --slurp . > config/ca-cert.jsonnet

Now we can configure all pods by adding the following patches in kustomization.yaml:

patches: - ... - target: kind: Deployment namespace: buildbarn patch: | - op: add path: /spec/template/spec/volumes/- value: name: tls-cert csi: driver: csi.cert-manager.io readOnly: true volumeAttributes: csi.cert-manager.io/issuer-name: ca - op: add path: /spec/template/spec/containers/0/volumeMounts/- value: mountPath: /cert name: tls-cert readOnly: true - target: kind: Deployment namespace: buildbarn name: frontend patch: | - op: add path: /spec/template/spec/volumes/0/configMap/items/- value: key: ca-cert.jsonnet path: ca-cert.jsonnet - op: add path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names value: frontend,frontend.${POD_NAMESPACE},frontend.${POD_NAMESPACE}.svc.cluster.local - op: add path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1ip-sans value: 127.0.0.1 - target: kind: Deployment namespace: buildbarn name: browser patch: | - op: add path: /spec/template/spec/volumes/0/configMap/items/- value: key: ca-cert.jsonnet path: ca-cert.jsonnet - op: add path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names value: browser,browser.${POD_NAMESPACE},browser.${POD_NAMESPACE}.svc.cluster.local - target: kind: Deployment namespace: buildbarn name: scheduler-ubuntu22-04 patch: | - op: add path: /spec/template/spec/volumes/0/configMap/items/- value: key: ca-cert.jsonnet path: ca-cert.jsonnet - op: add path: /spec/template/spec/volumes/1/csi/volumeAttributes/csi.cert-manager.io~1dns-names value: scheduler,scheduler.${POD_NAMESPACE} - target: kind: Deployment namespace: buildbarn name: worker-ubuntu22-04 patch: | - op: add path: /spec/template/spec/volumes/1/configMap/items/- value: key: ca-cert.jsonnet path: ca-cert.jsonnet - op: add path: /spec/template/spec/volumes/3/csi/volumeAttributes/csi.cert-manager.io~1dns-names value: worker,worker.${POD_NAMESPACE} - target: kind: StatefulSet namespace: buildbarn name: storage patch: | - op: add path: /spec/template/spec/volumes/0/configMap/items/- value: key: ca-cert.jsonnet path: ca-cert.jsonnet - op: add path: /spec/template/spec/volumes/- value: name: tls-cert csi: driver: csi.cert-manager.io readOnly: true volumeAttributes: csi.cert-manager.io/issuer-name: ca csi.cert-manager.io/dns-names: ${POD_NAME}.storage,${POD_NAME}.storage.${POD_NAMESPACE} - op: add path: /spec/template/spec/containers/0/volumeMounts/- value: mountPath: /cert name: tls-cert readOnly: true

To avoid repetition, the first patch is applied to all Deployment objects, and consecutive patches only add the proper list of DNS names for each certificate. Note that many of those DNS names will not be used as only some of these services actually accept connections. For the frontend Deployment we also add 127.0.0.1 IP so that it can be accessed via a port forwarded to localhost as we currently use it on the host machine. For the storage StatefulSet we configure unique DNS name for each Pod because they are contacted directly and not through a common service. For each of these we also add ca-cert.jsonnet to the list of files used from the configuration ConfigMap. We also need to add it to the ConfigMap itself by adding it to the list in config/kustomization.yaml:

configMapGenerator: - name: buildbarn-config namespace: buildbarn files: - ... - ca-cert.jsonnet

We can apply all these changes with:

kubectl apply -k . kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

Now you can fetch the list of CertificateRequest objects to see their statuses:

kubectl -n buildbarn get certificaterequest

It will output one request for the ca certificate named ca-1 and a bunch of requests generated for each pod:

NAME APPROVED DENIED READY ISSUER REQUESTOR AGE 14468f64-909f-43d1-b67d-07b0844c0683 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m 1d9e41a6-e58f-4c13-b9e6-0b1ba1d5a4f6 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s 2c2f1177-81fc-45e5-8487-9b66bc0d6f73 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s 31fdb0ef-0c0b-4a06-94af-fb17875ee05d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s 376d0933-c0e9-4d39-b5c6-b76071c65966 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m58s 3967cdd6-7d48-4814-8cec-542041182dd0 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s 464a1f35-f0ba-4236-aeec-294f880d9675 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s 5181e602-276e-413e-8888-76c4bd1ede21 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s 6f02092d-b8a3-4eb7-8ff2-5e4a433d59bb True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s 710a458e-6ba0-4a44-87ab-5115b5a2c213 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m58s 753c4653-71ae-447e-bbe5-022ce35cee9d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s 8bcbb5a0-4575-40ad-b842-9c86bde8fdb8 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m56s 8df59bf5-ed23-47af-bfcc-3cf8a9053b9b True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m1s b47fff23-40b4-43ed-8e34-35d988eb434d True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m56s be72bdc6-c61d-4f1b-928e-f743df0f6188 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 4m57s c14a52d5-dc20-4626-afe6-975442103d8b True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m ca-1 True True selfsigned system:serviceaccount:cert-manager:cert-manager 3d22h ceabf1ab-06a7-47c0-855a-2009bbbd2418 True True ca system:serviceaccount:cert-manager:cert-manager-csi-driver 5m

Using certificates

Now that we’ve generated all necessary certificates and made them available to all pods, we can configure all components to use them. We’ll use similar stanzas for each service, so let’s first add some helper functions to the top of config/common.libsonnet:

local localKeyPair = { files: { certificate_path: '/cert/tls.crt', private_key_path: '/cert/tls.key', refresh_interval: '3600s', }, }; local grpcClientWithTLS = function(address) { address: address, tls: { server_certificate_authorities: import 'ca-cert.jsonnet', client_key_pair: localKeyPair, }, }; local oneListenAddressWithTLS = function(address) [{ listenAddresses: [address], authenticationPolicy: { tls_client_certificate: { client_certificate_authorities: import 'ca-cert.jsonnet', validation_jmespath_expression: '`true`', metadata_extraction_jmespath_expression: '`{}`', }, }, tls: { server_key_pair: localKeyPair, }, }];

And then expose these functions to use in other configs at the end of the file:

... grpcClientWithTLS: grpcClientWithTLS, oneListenAddressWithTLS: oneListenAddressWithTLS, }

Note that local certificate and key files will be reloaded every hour per the refresh_interval setting, but the CA certificate will need to be reconfigured manually every time it refreshes.

Also note that we accept all valid certificates by setting validation_jmespath_expression to `true`. This expression can be configured later for each service if needed.

Now we’re ready to configure the Buildbarn services.

Storage

Let’s start with storage. The client side configuration is the same for all services that connect to it and is stored in config/common.libsonnet. Replace lines like this one:

backend: { grpc: { address: 'storage-0.storage.buildbarn:8981' } },

with usage of our new function:

backend: { grpc: grpcClientWithTLS('storage-0.storage.buildbarn:8981') },

Keep the address the same (storage-0 and storage-1 should remain in place).

Now in config/storage.jsonnet replace these GRPC server configuration lines:

grpcServers: [{ listenAddresses: [':8981'], authenticationPolicy: { allow: {} }, }],

With a call to another function:

grpcServers: common.oneListenAddressWithTLS(':8981'),

Make sure that the address itself is the same again.

Now let’s apply it and wait for all pods to restart:

kubectl apply -k . kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

Let’s check that the storage service is still accessible via the frontend service by rebuilding our example project:

pushd ../../examples/cpp-tutorial/stage1 bazelisk clean bazelisk build --config=linux --config=remote //main:hello-world popd

It should show that it fetched output from the remote cache:

... INFO: 5 processes: 2 remote cache hit, 3 internal. ...

Scheduler

The scheduler exposes at least four GRPC endpoints, but we’ll cover only the client (frontend) and worker sides as we don’t use other endpoints yet. Just like with storage, you should replace clientGrpcServers and workerGrpcServers settings with calls to oneListenAddressWithTLS in config/scheduler.jsonnet, passing the addresses themselves as an argument:

... clientGrpcServers: common.oneListenAddressWithTLS(':8982'), workerGrpcServers: common.oneListenAddressWithTLS(':8983'), ...

The scheduler itself only connects to storage, and that part has already been configured in config/common.jsonnet.

Workers

Workers only connect to the scheduler and storage. With the latter being already configured, we need to only change scheduler setting in config/worker-ubuntu22-04.jsonnet:

... scheduler: common.grpcClientWithTLS('scheduler:8983'), ...

Frontend

The frontend listens for incoming connections from clients and fans them out, either to storage or to the scheduler. Storage access has already been covered, so we only need to replace grpcServers and schedulers settings in config/frontend.jsonnet:

grpcServers: common.oneListenAddressWithTLS(':8980'), schedulers: { '': { endpoint: common.grpcClientWithTLS('scheduler:8982') { addMetadataJmespathExpression: ||| { "build.bazel.remote.execution.v2.requestmetadata-bin": incomingGRPCMetadata."build.bazel.remote.execution.v2.requestmetadata-bin" } |||, }, }, },

Note that we preserve all addresses and keep the additional addMetadataJmespathExpression field that augments requests to the scheduler.

Applying it all

Now we can apply all these settings with:

kubectl apply -k . kubectl rollout status -k . 2>&1 | grep -Ev "no status|unable to decode"

All deployments should eventually roll out and work. This means that all internal communications between Buildbarn components are encrypted and authenticated.

In this section we’ve achieved our goal of securing Buildbarn deployment using mTLS. Now all that’s left is to reconfigure Bazel to use and verify certificates while accessing Buildbarn’s RBE API endpoint.

Configuring certificates on client

So far we’ve configured Buildbarn to always use TLS encrypted connections. It means that our current client setup for using it will not work because it doesn’t expect TLS. In this section we’ll generate a client certificate for it using the cmctl tool, configure Bazel to both validate the server certificate and use this new client certificate when communicating with Buildbarn, and show the final complete example.

First, note that as said, if we run Bazel with current client configuration it will fail due to using a non-encrypted connection to an encrypted endpoint:

pushd ../../examples/cpp-tutorial/stage1 bazelisk clean bazelisk build --config=linux --config=remote //main:hello-world

The error will look like this:

INFO: Invocation ID: dc8188ca-e77f-4884-a596-612779c6ae33 ERROR: Failed to query remote execution capabilities: UNAVAILABLE: Network closed for unknown reason

To configure the client to use an encrypted connection, we need to replace the grpc protocol with grpcs in .bazelrc and try again:

sed -i s/grpc/grpcs/ .bazelrc bazelisk build --config=linux --config=remote //main:hello-world

Now the error will indicate that something else is missing - in this case, a client certificate:

INFO: Invocation ID: 7dcb900f-17eb-4dbb-ab9c-df9c70bc2c92 ERROR: Failed to query remote execution capabilities: UNAVAILABLE: io exception Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]

To address that, we need to generate client certificates and configure Bazel to use them.

Generating the client certificate

We will use cert-manager and its CLI client cmctl to generate a certificate for our client. First, we need to create a Certificate object template in cert-template.yaml:

cat > cert-template.yaml <<EOF apiVersion: cert-manager.io/v1 kind: Certificate spec: commonName: client usages: - client auth privateKey: algorithm: ECDSA size: 256 issuerRef: name: ca kind: Issuer group: cert-manager.io EOF

Then we can use it to create the actual certificate:

cmctl create certificaterequest -n buildbarn client --from-certificate-file cert-template.yaml --fetch-certificate

It will use this certificate template as if it was created in Kubernetes: it will generate a key in client.key, create a Certificate Signing Request (CSR) from it, embed that in a cert-manager CertificateRequest and send it, wait for the server to sign it, and finally retrieve the resulting certificate to client.crt.

We also need a CA certificate to verify server certificates. We can use the same command we used for Buildbarn configuration here:

kubectl -n buildbarn get certificaterequests ca-1 -o jsonpath='{.status.ca}' | base64 -d > ca.crt

You can make sure that client certificate is signed with this CA certificate by adding openssl to shell.nix and running:

openssl verify -CAfile ca.crt client.crt

It will output client.crt: OK if everything is correct.

Building with certificates

All that’s left is to tell Bazel to use these certificates to connect to Buildbarn. We’ll need to convert the private key to PKCS#8 format for it and add these settings to .bazelrc:

openssl pkcs8 -topk8 -nocrypt -in client.key -out client.pem echo "build:remote --tls_certificate=ca.crt" >> .bazelrc echo "build:remote --tls_client_certificate=client.crt" >> .bazelrc echo "build:remote --tls_client_key=client.pem" >> .bazelrc

Now let’s clean the Bazel cache and run the build:

bazelisk clean bazelisk build --config=linux --config=remote //main:hello-world

You will see that the remote cache is in use, which means that TLS has been configured successfully:

... INFO: Elapsed time: 0.601s, Critical Path: 0.10s INFO: 5 processes: 2 remote cache hit, 3 internal. ...

To make sure that the actual build also works, we can change the source file a bit and re-run the build:

echo >> main/hello-world.cc bazelisk build --config=linux --config=remote //main:hello-world

It will now take some time and actually show that it has built one action remotely:

... INFO: Elapsed time: 15.866s, Critical Path: 15.69s INFO: 2 processes: 1 internal, 1 remote. ...

Conclusion

We’ve shown how to deploy Buildbarn on Kubernetes, how to configure mTLS between all its components, and how to use TLS authentication with RBE API clients using Bazel as an example. This is a starting configuration that can be improved in several aspects not covered here:

The Buildbarn browser and the scheduler web UIs are neither exposed nor encrypted;

cert-manager is not configured to limit access to certificate generation, meaning that anyone with access to Kubernetes API has access to all its capabilities;

no limits are imposed on client certificates, they only need to be valid;

there is no automation for client certificate renewal;

and only certificates are used for authentication, which is secure but can be enhanced or replaced with OAuth which is more flexible and provides better control

All these are interesting topics that would each deserve their own blog post.

Programming Languages & Compilers Activity Report - Q2 2024

Thu, 22 Aug 2024 00:00:00 GMT

One core value of Tweag is its dedication to the open-source community. Although our interests and expertise have become significantly broader over the years, our love for immutable, composable and typed architecture have made functional programming and programming languages in general an important part of our DNA. This long-standing activity was formalized last year as the Programming Languages & Compilers Group. The PL&C group has been busy in the second quarter of 2024, and this post is a summary of what we’ve been doing.

Our involvement varies depending on the availability of each team member and client engagements; if some projects might seem idle, this is usually just temporary. All projects appearing below are actively developed.

Rust

We have a bunch of Rust engineers at Tweag, and some of them have recently started to contribute to the Rust package manager cargo, which is a key part of the ecosystem. This is a choice motivated by our interest and expertise in build systems, our love for cargo and the need for contributions there.

Currently, in a cargo workspace spanning multiple crates, we can only publish one crate at a time, in dependency order. It’s been a long-standing issue as well as a focus area to be able to publish all the crates at once. To get there:

Joe Neeman implemented local registry overlays; which make it possible to package a crate even if its dependencies aren’t published yet. (#13926)

Joe Neeman and Tor Hovland added support to package all the crates in the workspace in a single command. Crates must still be published one at a time. (#13947, #14074)

Typically, cargo update is used to update dependencies to the latest versions that satisfy the version requirements defined in Cargo.toml. If you wanted to update the version requirements themselves to the latest available versions, you might use the 3rd party command cargo upgrade from cargo-edit. Another focus area for cargo is to bring this capability into cargo update.

Tor Hovland implemented cargo update --breaking, which will upgrade the version requirements in Cargo.toml if there are breaking changes. (#13979, #14049, #14259)

Tor Hovland implemented support for making breaking upgrades when doing a specific version update with cargo update --precise. At the time of writing, this isn’t merged yet. (#14140)

Furthermore, we have contributed with various smaller fixes: (#13874, #13886, #13960)

Haskell

GHC

GHC is the de facto standard Haskell compiler. Several of our GHC engineers are currently working on making it easier to seamlessly interface GHC with external build tools, such as Buck2. Most of our work on GHC is on behalf of Mercury.

Torsten Schmits and Cheng Shao have been working on supporting bytecode linking for Template Haskell dependencies in single-file compilation mode, which allows external build systems to take advantage of the performance improvements for Template Haskell that Cabal builds have been enjoying for some time now. (GHC MR 13042)

Torsten Schmits and Cheng Shao have implemented a way to print dependency metadata for a set of Haskell modules as JSON, for which Buck2 had to parse GHC’s generated Makefiles before. (GHC MR 11994)

Sjoerd Visscher added single-file processing support to Haddock, allowing external build tools to incrementally (re-)build documentation for individual modules without compilation. (GHC MR 12707)

Torsten Schmits has been working on performance improvements for dependency analysis. He wrote a patch (for -Wmissing-home-modules) that replaced a quadratic algorithm and reduces the startup time in a project with 10,000 modules by over a minute. He also wrote a WIP patch that introduces parallelism into the first phase of dependency graph computation, promising a reduction of the duration of this phase by a factor of 4 in some of our projects.

Cheng Shao looked into GHC’s ARM64 Windows support (GHC issue) and made an MVP that can cross-compile simple Haskell programs to ARM64 Windows executable from Linux.

Cheng Shao performed GHC housecleaning and removed legacy code paths related to 32-bit Darwin/Windows (GHC announcement).

Cheng Shao has been working on Template Haskell support in GHC’s WASM backend (GHC issue). A WASM dynamic code loader based on LLVM’s WASM shared library ABI is being prototyped at the moment; once it’s finished, remaining Template Haskell support should be straightforward.

Joseph Fourment joined us for an internship, during which he researched and implemented the initial steps towards more general and flexible let-bound types, which he wrote about on this blog. This effort introduces the capability for GHC to reuse in-memory data structures for type subexpressions that are shared by multiple larger types, promising substantial performance improvements when compiling programs with complex type-level computation.

Liquid Haskell

The Liquid Haskell contributions from Tweag are spearheaded by Facundo Domínguez. Liquid Haskell is a verification tool that allows you to write additional lightweight formal specifications for your Haskell programs. These specifications are then checked by the tool which discharges the proofs to SMT solvers, so that you don’t have to do it yourself.

Facundo Domínguez released a new version of smtlib-backends with updates to documentation. smtlib-backends is a library to interface with SMT solvers via SMTLIB.

We welcomed Jonathan Arnoult, a new intern who will work on reflecting functions away from their definitions.

Nickel

Nickel is a configuration programming language developed by Tweag aimed at infrastructure-as-code, build systems, Nix, or any complex system that needs to be configured and where YAML, JSON, TOML and the like aren’t sufficient.

The Nickel team released versions 1.6 and 1.7.

Yann Hamdaoui revived the previously stale nickel-kubernetes repository. In combination with updates to json-schema-to-nickel, we are able to auto-generate Nickel contracts (think “schemas”) for all Kubernetes resources at any given version.

Yann Hamdaoui implemented pattern matching extensions (constants, wildcards, guards, arrays and or-patterns) following the addition of structural ADTs aka enum variants (#1897, #1904, #1910, #1912, #1916).

Joe Neeman experimented with package management for Nickel and wrote an RFC to discuss and decide on various points in the design space.

Yann Hamdaoui reworked the contract system quite a bit to better support missing boolean operation on contracts, such as JSON Schema’s any_of and not. Beside boolean operators, this rework also made it much more ergonomic to write and compose custom contracts with custom error reporting. (#1964, #1970, #1975, #1987, #1995)

Yann Hamdaoui added span information for data imported from TOML to make validation errors more precise. (#1949)

Topiary

Topiary is a lightweight universal code formatter that relies on Tree-sitter grammars to handle a variety of languages. It has been developed by Tweag, and is used under the hood by the Nickel language to format code, but is also a standalone tool.

The Topiary team released version 0.4.0, “Exquisite Elm”. Highlights include improved Nickel formatting support and new CSS formatting support from external contributor Eric Lavigne.

Erin van der Veen moved all of Topiary’s dependencies to either published or vendored crates; that is, either those available on crates.io, or subsumed directly into our codebase. This prepares the ground for future releases of Topiary to crates.io, where projects with ad hoc dependencies (such as those direct from GitHub) are forbidden. (#672)

Christopher Harrison feature gated language support, mainly to allow the development of experimental language formatters without impinging on the supported formatters. This ties in with his development (still in progress) of formatting rules for Pact, the smart contract language for the Kadena blockchain. (#711, #713)

Erin van der Veen made a number of background, “quality of life” improvements, such as transitioning from TOML to Nickel for Topiary’s configuration. This allows for less complicated merging using Nickel’s record merging, especially in the future when Nickel implements custom merge functions. Another goal of this PR was to evaluate the use of Nickel as a library, which was a great success! (#703)

Closing words

The PL&C group will continue to contribute to the projects mentioned above in the near future. Stay tuned for the next quarterly report! In the meantime, you can find Tweag’s open source portfolio on Github and come chat with us on our Discord dedicated to our open-source activity, be it as a user, as a potential contributor, or simply to satisfy your own curiosity.

Microservice	Integrates With
User Service	Authentication Service, Profile Service, Notification Service
Authentication Service	User Service, Authorization Service, API Gateway
Profile Service	User Service, Notification Service, Database Service
Notification Service	User Service, Profile Service, External Email Service
Authorization Service	User Service, Resource Service, API Gateway
Resource Service	Authorization Service, Logging Service, Payment Service
Billing Service	User Service, Payment Service, Notification Service
Payment Service	Billing Service, User Service, Notification Service
Logging Service	Resource Service, Monitoring Service, Notification Service
Monitoring Service	Logging Service, Notification Service, Dashboard Service

Tweag - Engineering blog

The minimal megaparsec tutorial

Running example

The Parsec monad

Our first parser

Parser combinators

Running parsers

Parsing expressions, lists, etc.

Conclusion

Frontend live-coding via ghci

How to use it

Hot reloading

Loading object code

Interrupting via ^C

Read the :doc, Luke

Importing an npm library in ghci

Using ghci to debug other websites

No host file system in wasm

Don’t press F5

Doesn’t work on Safari yet

Huge libraries don’t work yet

Wishlist, as usual

Practical recursion schemes in Rust: traversing and extending trees

(In)flexible representations

The functorial representation

Traversals

Are recursion schemes useful in Rust?

How we use recursion schemes in Nickel

Conclusion

A hundred pull requests for Liquid Haskell

Reflection improvements

Reasoning and reflection of lambdas

Integration with GHC

cvc5 support

Name resolution overhaul

The road ahead

Bazel and Testwell CTC++, revisited

The status quo

Outlining the problems to be solved

Achieving reproducibility

Enabling correct cache/reuse of symbol data

Results

Status quo

Worst case after our changes: No cache to be reused

Best case after our changes: Rebuild with no changes

Common case after our changes: Rebuild with limited change set

Conclusion

Future work

Evaluating the evaluators: know your RAG metrics

Background

Example

RAG evaluation metrics

Evaluating retrieval

Evaluating generation

Evaluating the answer

When the judges don’t agree

Experimental results

When faithfulness gets difficult

Experimental results

Conclusion

From minimal skeletons to comprehensive transactions with cooked-validators

Validating transactions in cooked-validators

Transaction skeletons

From manual ADA payments to automated transaction balancing

From manual payments to automated minimal amount of ADA

From spending scripts to automated script witness binding

From issuing proposals to automated deposit payment

Conclusion

Bashfulness

Hello darkness, my old friend

Battle of the Bash formatters

What Topiary can’t do that shfmt can2

What shfmt can’t do that Topiary can2

Throughput

And the winner is…

The refactoring of a Haskell codebase

The problem: Name resolution

The refactoring process

The current state of the refactoring

Writing a formatter has never been so easy: a Topiary tutorial

The `Parsec` monad

Read the `:doc`, Luke

Importing an `npm` library in ghci

Validating transactions in `cooked-validators`

What Topiary can’t do that `shfmt` can²

What `shfmt` can’t do that Topiary can²

What is `rules_gcs`?

Understanding Bazel Repositories and Efficient Object Fetching with `rules_gcs`

The `rules_gcs` Approach: Lazy Fetching with a Hub Repository

How `rules_gcs` Hooks Into the Credential Helper Protocol

Installing `rules_gcs`

Using `rules_gcs` in Your Project