Extending destination-passing style programming to arbitrary data types in Linear Haskell

7 March 2024 — by Thomas Bagrel

Three years ago, a blog post introduced destination-passing style (DPS) programming in Haskell, focusing on array processing, for which the API was made safe thanks to Linear Haskell. Today, I’ll present a slightly different API to manipulate arbitrary data types in a DPS fashion, and show why it can be useful for some parts of your programs.

The present blog post is mostly based on my recent paper Destination-passing style programming: a Haskell implementation, published at JFLA 2024. It assumes basic knowledge of Linear Haskell and intermediate fluency in Haskell.

Tail Modulo Cons

Haskell is a lazy language by default, but a lot of algorithms are in fact more efficient in a strict setting. That’s one reason why Haskell has been extending support for opt-in strictness, via strict field annotation for example.

Non-tail recursive functions such as map are decently efficient in a lazy setting. On strict data structures, however, non-tail recursive consume stack space. That’s why the quest for tail-recursive implementations is even more central in strict languages such as OCaml than in Haskell.

If any function can be made tail-recursive using a CPS transformation, this transformation trades stack space for heap space (where the built continuations are allocated), which is rarely a win performance-wise. We actually want to focus on tail-recursive implementations which don’t resort to continuations, and unfortunately, some functions don’t have one in a purely functional setting.

For example, some functions are almost tail-recursive, in the sense that the recursive call is the penultimate computation in the returned value, and the last one is just a constructor application. This is actually the case for map:

map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x : xs) = (f x) : map f xs

One could argue that a tail-recursive version of map could be written using an accumulator to store the result list, and then reverse it at the end. But that incurs an extra linear operation (reversing the accumulator) that isn’t present in the naive version.

Actually, Bour et al. found in 2021 that whenever a function is of this particular shape — with just a constructor application over the recursive call — named tail-recursive modulo constructor, this function can be easily and automatically converted into an equivalent tail-recursive one in destination-passing style (DPS).

In OCaml¹, this transformation happens entirely in the compiler. In this blog post, on the other hand, I’ll show how to do it in user-land in Haskell thanks to linear types, which make the API safe.

For map, here’s the translation to DPS Haskell, although we will come back to it later:

mapDPS :: (a -> b) -> [a] -> Dest [b] %1 -> ()
mapDPS f [] d = fill @'[] d
mapDPS f (x : xs) = let !(dh :: Dest b, dt :: Dest [b]) = fill @'(:) d
                     in fillLeaf (f x) dh `lseq` mapDPS f xs dt

It’s time to see what DPS means and what it offers.

What is Destination-passing style programming?

Destination-passing style (DPS) is a programming idiom in which functions don’t return a result, but rather write their result directly to a memory location they receive as a parameter. This gives more control over memory to the caller of a function, instead of that control lying exclusively in the hands of the callee. In non-GC languages, or for array processing (as in the aforementioned blog post), it allows the allocation of a big chunk of memory at once, and then gives each piece of the program the responsibility to fill a small part of that chunk (represented by a glorified pointer, aka. destination), giving mostly alloc-free code. In early imperative languages such as C, this is actually quite common: memcpy and strcpy both receive a destination as a parameter.

In the context of a functional, immutable, GC-based language, we cannot circumvent the allocation of heap objects to obtain alloc-free code. Instead, we get one interesting feature: being able to build functional structures in the reverse order compared to the regular constructor-based approach. This goes hand-in-hand with the ability to create and manipulate incomplete data structures (containing unspecified fields, aka. holes) safely. This is exactly what we will focus on in this blog post.

Incomplete structures, you say?

An incomplete structure can be seen as a tree of constructor objects, much like a regular data structure. However, some of the constructor’s fields might not be specified, leaving holes in the structure.

Having incomplete structures is very different from having optional fields in a structure represented by the Maybe a type. For so-called incomplete structures, we do not indicate the absence of value (or presence of a hole) through a different type for the leaf itself, but instead we forbid any read on the whole structure as long as (at least) one hole exists somewhere. That way, the field’s value can (in fact, must) be updated later without allocating the whole structure a second time.

To update a yet-unfilled field of an incomplete structure, we use a destination. A destination is a unique pointer to a hole inside an incomplete structure, that is no longer usable as soon as the hole has been filled. Those pointers are carried alongside the structure until they get consumed. As a result, destinations are also a way to know whether or not a structure has any remaining holes. When an incomplete structure no longer has any accompanying destinations, it can be read safely.

At this point, incomplete structures could be seen as the nemesis of Haskell, as they bring a form of mutability and a range of memory errors if not handled properly. However, with a proper linear API, which is the real novelty in this blog post, they are both powerful and safe to use. In particular, a linear discipline on destinations guarantees that:

when a structure no longer has accompanying destinations, it’s a complete structure (that is, it has no holes remaining);
once a hole has been filled with a value, that value cannot be changed anymore (i.e. holes are write-once).

Implementation of `Incomplete` structures

As teased before, I introduce an opaque data Incomplete a b to represent incomplete objects. The a part is the structure being built that may contain holes, and the b part carries the destinations pointing to these holes. Destinations are raw pointers under the hood, put in a pretty box: data Dest a represents a pointer to a hole of type a.

What can we do with an Incomplete a b? We cannot read the structure on the a side (yet) as long as the b side still contains destinations (as they indicate the presence of holes in the a side). The b side is what must be linearly consumed to make the structure readable. What we can do is map over the b side, to consume the destinations little-by-little until there are none left. This is exposed through a (linear) Functor instance:

instance Control.Functor (Incomplete a) where
  fmap :: (b %1 -> c) %1 -> Incomplete a b %1 -> Incomplete a c
  (<&>) :: Incomplete a b %1 -> (b %1 -> c) %1 -> Incomplete a c  -- flipped arguments

This Functor instance lets us access destinations of an incomplete object through a linear continuation of type b %1 -> c.

Let’s take a step back and look at our previous example. mapDPS has signature (a -> b) -> [a] -> Dest [b] %1 -> (). This means that mapDPS f list is in fact a linear continuation of type Dest [b] %1 -> ().

In other terms, given an incomplete structure having a hole of type [b], i.e. i :: Incomplete u (Dest [b]), we can write the result of f mapped to list to this hole, using i <&> mapDPS f list. The resulting structure will have type Incomplete u () (no more destinations), and can be made readable as we’ll see in a moment.

Here we can see the essence of DPS: functions get less responsibility as they don’t get to choose where they write their result; instead, the output location is now passed as an explicit parameter to the function. Moreover, inside a function such as mapDPS, we can, and in fact we have no choice but to forget about the global structure we are building — it becomes implicit — and only focus on the handling of destinations. The Functor instance is thus the glue that lets us assign a location to a producer of data like mapDPS for it to write its output.

Operating on `Dest`s

Let’s take a closer look at the mapDPS implementation:

mapDPS f [] d = fill @'[] d
mapDPS f (x : xs) = let !(dh :: Dest b, dt :: Dest [b]) = fill @'(:) d
                     in fillLeaf (f x) dh `lseq` mapDPS f xs dt

In the base case, there is no element left in the input list, but we still receive a destination d :: Dest [b] that needs to be dealt with linearly. The only meaningful operation here is to write the empty list to the hole represented by d, which is what fill @'[] d does.

The recursive case is more interesting:

one cons cell should be added to the list, carrying the value f x :: b;
we somehow need to create another destination of type Dest [b] to pass to the recursive call.

All of that is done in two steps, using fill @'(:) and then fillLeaf.

fill @'(:) d is first used to add a new hollow cons cell (:) _h _t :: [b] at the end of the linked list, that is to say, a cons cell with unspecified fields (both the head _h and tail _t are holes). Under the hood, it allocates the new hollow cons cell, writes its address into the destination d :: Dest [b], and returns one destination dh :: Dest b pointing to the hole _h, and one destination dt :: Dest [b] pointing to the hole _t. This gives the signature fill @'(:) :: Dest [b] %1 -> (Dest b, Dest [b]).

Then, fillLeaf is used to fill the destination dh :: Dest b (representing the “value part” of the newly added cons cell) with the result of f x :: b. fillLeaf :: a -> Dest a %1 -> () is in fact pretty simple. It takes a value, a destination, and writes the value address to the hole represented by the destination. The destination is linearly consumed in the process.

After doing that, only one destination remains unconsumed: dt :: Dest [b]. This is exactly the destination that will be passed to the recursive call! It corresponds to the new “end” of the linked list.

We directly see here how fill @'(:) extends an (incomplete) list by adding one new “slot” at the end; whereas cons (:) is usually used to extend a normal linked list from the front. This is what I meant in the introduction by building functional structures in the reverse order.

What I just presented is not, in fact, restricted to lists. It can be used to build any kind of structure, as long as it implements Generic. This is mostly the only constraint fill has; it can be used for all sorts of constructors. For example, we can build a binary tree in a similar way, starting with the root, and extending it progressively in a top-down fashion, using fill @'Leaf and fill @'Node (assuming data Tree a = Leaf | Node a (Tree a) (Tree a) deriving Generic).

Creating and disposing of `Incomplete`s

One can create a new, empty Incomplete using alloc :: Linearly %1 -> Incomplete a (Dest a). This function exchanges a Linearly token (see below) for an Incomplete of the chosen type a. The resulting Incomplete has a single destination that points to its root of type a. In other terms, even the root of the new structure is a hole at the moment, that will be specified later with the first use of fill or fillLeaf.

Conversely, as soon as we have an Incomplete with only unit () on the b side, the absence of destination indicates that the structure on the a side is complete. So we can make it readable by getting out of the Incomplete wrapper using fromIncomplete :: Incomplete a () %1 -> Ur a.

It is valid to use the built structure in a non-linear fashion (justifying why it is wrapped in Ur in return position of fromIncomplete) because it has been made of non-linear elements only: fillLeaf is non-linear in its first argument, and the spine of the structure can be duplicated without breaking linearity.

The last missing piece of this API is linearly :: (Linearly %1 -> Ur b) %1 -> Ur b, whose definition is shared with the one from a previous blog post about linear scopes. linearly delimits a scope in which linear objects can be used. Only non-linear objects can escape this scope (because of the Ur restriction on the return type as before), such as complete structures finalized with a call to fromIncomplete.

The Linearly type, of which an instance is supplied by linearly, is a linear token which can be duplicated to give birth to any number of Incompletes, but each of them will still have to be managed linearly.

With these final ingredients, we can complete our definition of a tail-recursive map:

map :: (a -> b) -> [a] -> [b]
map f l =
  unur $ linearly $ \token ->
    fromIncomplete $ alloc token <$> \d ->
      mapDPS f l d

Performance

The current implementation behind the API is based on compact regions as they make it easy to operate on memory without too much tension with the garbage collector. However, they incur extra copying in some contexts, which makes it hard sometimes to compete with optimized lazy Haskell code.

At the moment, the mapDPS implementation is slightly more efficient memory-wise than the optimized lazy one for large lists (and less efficient for smaller lists). The same kind of results are obtained for the different use-cases we benchmarked in Section 6 of the associated paper. I expect a next implementation without compact regions, taking place directly in the GC heap, to have better performance.

In addition, the DPS techniques detailed here are proven to be really efficient in strict languages. This work might thus inspire performance and expressiveness improvements in languages other than Haskell.

Conclusion

The API presented in this blog post defines a small set of tooling sufficient to create and operate safely on incomplete data structures in Haskell through destination-passing style programming. It is more general than the constructor-based building approach usually used in functional programming languages, and also more general than DPS tooling introduced by Tail Modulo Cons in OCaml. It is also a nice example of how linear types can be used to enforce a write-once memory model in Haskell.

The full prototype API is available here. It currently requires a custom GHC version to work, but I hope I will be able to merge the few primops required for DPS programming into GHC in the future.

I would like to thank Arnaud Spiwack for his solid support and feedback on all pieces of this work.

OCaml has had experimental support for Tail-recursion Modulo Constructor since version 4.14.0.↩

About the authors

Thomas BagrelThomas is a CS engineer, and PhD student working on linear types and memory management. He applies functional programming principles to build reliable and maintainable software wherever it's possible, but can also jump back to imperative programming languages when the situation requires it. He is also a contributor on linear-base, the linear type library for Haskell.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Bazel remote execution with rules_nixpkgs Software Identifiers through the eyes of Nix →