Nickel is a gradually-typed, purely functional configuration language with contracts and lazy evaluation.

My internship at Tweag consisted of (attempts at) performance improvements in the Nickel reference interpreter.
In this article, I will describe what I think is the most interesting problem I had on my hands: array operations
like `head`

were *sometimes* running in linear time instead of constant time.

What do I mean by “sometimes”? Well, to answer that, we need some familiarity with **contract application** and **lazy evaluation**.

# Contract application

Contracts are a key feature of Nickel that empowers you to write *correct* configurations in a flexible way
by *extending* the type system with your custom dynamically-checked properties.
Thus you can write types for TCP ports, email addresses, or even the Cargo Manifest format,
using runtime checks on your data.

Prime numbers are cool. Let’s write a contract for them^{1}:

```
let Prime = fun label value =>
if is_prime value then value
else contract.blame_with label "not a prime number"
```

An expression such as `42 | Prime`

will expectedly fail, i.e. the interpreter will run the “function” `Prime`

on the *blame label*, a special value containing error-reporting information, and the value `42`

.
This is syntactic sugar — before evaluation, this expression will be transformed into `contract.apply Prime label 42`

.

For arrays, we can use the constructor `Array`

and write `[3, 19] | Array Prime`

to enforce the prime number contract on all elements of the array. In this case, the expression will be roughly transformed into:

`array.map (contract.apply Prime label) [3, 19]`

This is not completely accurate: the interpreter uses unique labels for each element to avoid blaming the entire array for a contract violation, as that would result in poor error messages.

At this point, you can already see where the linear behavior might come from. Since evaluating:

`array.head (array.map (contract.apply Prime label) [3, 19])`

will involve two contract applications, instead of one.
The argument of `array.head`

will first be evaluated to `[3 | Prime, 19 | Prime]`

, meaning that `Prime`

will be applied on both `3`

and `19`

. *Only then* can we retrieve `3 | Prime`

.

This effect gets amplified for recursive functions guarded by an array contract, such as:

```
let rec product
| Array Prime -> Num
= fun ps =>
if array.length ps == 0 then 1
else array.head ps * product (array.tail ps)
```

Here, every recursive call will apply the `Array Prime`

contract to the tail of the input array.
Consequently, every call to `product`

will be done in linear time, making the full operation quadratic.

Unfortunately, this reasoning doesn’t completely explain the observed behavior, because Nickel is *lazy*.

# Lazy evaluation

During evaluation, Nickel expressions are put behind **thunks**.
Thunks are mutable memory locations that initially hold unevaluated expressions, and are only updated with evaluation results if some part of the program *forces* them.

This means that evaluating:

`let ps = [3, 9 + 10] | Array Prime in array.head ps`

will mean mapping `Prime`

over `[3, 9 + 10]`

to get something that resembles `[3 | Prime, (9 + 10) | Prime]`

. Next, `3 | Prime`

would be extracted and evaluated.

Hence the evaluation result will be `3`

and `(9 + 10) | Prime`

won’t be computed.
In fact, one could replace `9 + 10`

with a computation which never succeeds, such as `0 / 0`

,
and the above snippet will *still* evaluate to `3`

.

Thanks to lazy evaluation, array contracts are mapped in linear time *but* the resulting contract applications are not evaluated unless they’re needed.
If you wish to force the entire array to be *fully* evaluated, you can use `builtin.deep_seq`

.
This special function will make sure that the thunk behind each of the array’s elements is updated.

# A solution

Instead of mapping contract application on arrays, the interpreter should make the array hold on to its contract, and only apply it when data leaves the array. This came in the form of a new internal operation which I will refer to as `contract.lazy_apply`

. This means that our `[3, 19] | Array Prime`

example will be equivalent to:

`contract.lazy_apply Prime label [3, 19]`

This time, `array.head ([3, 19] | Prime)`

will return `contract.apply Prime 3`

in constant time, *even if* the thunk corresponding to the array had never been updated before.

After refactoring the interpreter to use the new machinery, I realized that `builtin.deep_seq`

was subtly broken.
Given arbitrary terms `x`

and `y`

, `builtin.deep_seq x y`

recursively traverses all the thunks of `x`

so that nested records and arrays will be fully evaluated. Once that’s done, `builtin.deep_seq`

makes the interpreter resume normal evaluation of `y`

.

To preserve the semantics of array contracts, any *pending* contract should be applied during the `builtin.deep_seq`

operation. At first, I made it so that when `builtin.deep_seq`

is called on an array, the interpreter will evaluate each element with the array’s contract applied to it. This worked in the majority of cases, except for record contracts.

The problem with record contracts was that, under certain conditions^{2}, they have to clone their thunks. Which led to cases where calling `builtin.deep_seq`

on a record with a contract applied, such as `{ x = y + 1, y = 1 } | { x | Num, y | Num }`

, updated the newly cloned thunks and not the original thunks.

Therefore, in some cases `builtin.deep_seq x y`

left `x`

with unevaluated thunks: `x`

doesn’t “see” the cloned thunks. This isn’t the desired behaviour.
Furthermore, serialization functions to JSON and friends, as well as the pretty-printer in the REPL assume that they are given values without thunks, which I couldn’t guarantee anymore.

To address this, I introduced a new internal operator.

# Yet another operator

Because of how record contracts work, applying a contract to an arbitrary term may yield a version of said term where some thunks have been cloned.
And so given a term `t`

, when `builtin.deep_seq t t`

recursively evaluates arrays *with their contracts mapped*,
some contract application results will not be reflected in `t`

as they live in cloned thunks.

This is where a new internal unary operation comes into the picture: `builtin.force`

.

How it works, is that `builtin.force t`

not only evaluates `t`

, but also returns *a new copy of t* where everything is guaranteed to be evaluated. In particular, when

`t`

is an array or records, `builtin.force t`

will return a new array filled with the records’ (evaluated) cloned thunks, rather than the original, unevaluated thunks.# Future opportunities

Because array contracts are now saved in a *pending* state for later application, it’s possible to eliminate duplicates using a limited notion of contract equality. I say *limited* because the terms inside a contract can be arbitrarily complex. Still, it might be worth the effort to optimize for the common cases such as name aliases, and the Nickel Team recently made some headway towards deciding contract equality in many of those cases.

# Benchmarks

It turns out that this is one of those cases where the obtained performance improvement is significant.
Let’s write a function for computing array slices by generating a range of indices from a `from`

index and `to`

index.
For simplicity’s sake, an empty array is returned if the provided range is invalid.

```
let slice
| Num -> Num -> Array Num -> Array Num
= fun from to xs =>
if to < array.length xs && from >= 0 && from <= to then
let range = array.generate ((+) from) (to - from)
in array.map (fun idx => array.elem_at idx xs) range
else []
```

I ran `slice 200 800`

on an array of `1000`

elements and obtained the following results, courtesy of the excellent command-line tool hyperfine:

Version | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|

Without `array.lazy_apply` |
618.9 ± 21.2 | 601.3 | 670.1 | 7.59 ± 0.36 |

With `array.lazy_apply` |
81.6 ± 2.6 | 78.8 | 94.5 | 1.00 |

- A more idiomatic way of writing this contract is:
`contract.from_predicate is_prime`

. Of course, the two styles are equivalent.↩ - This is due to the interaction between Nickel’s merging primitive with recursive records. As documented in this Pull Request↩