Our Nickel language is a configuration language. It’s also a functional programming language. Functional programming isn’t a well-defined term: it can encompass anything from being vaguely able to pass functions as arguments and to call them (in that respect, C and JavaScript are functional) to being a statically typed, pure and immutable language based on the lambda-calculus, like Haskell.
However, if you ask a random developer, I can guarantee that one aspect will be mentioned every time: algebraic data types (ADTs) and pattern matching. They are the bread and butter of typed functional languages. ADTs are relatively easy to implement (for language maintainers) and easy to use. They’re part of the 20% of the complexity that makes for 80% of the joy of functional programming.
But Nickel didn’t have ADTs until recently. In this post, I’ll tell the story of Nickel and ADTs, starting from why they were initially lacking, the exploration of different possible solutions and the final design leading to the eventual retro-fitting of proper ADTs in Nickel. This post is intended for Nickel users, for people interested in configuration management, but also for anyone interested in programming language design and functional programming. It doesn’t require prior Nickel knowledge.
A quick primer on Nickel
Nickel is a gradually typed, functional, configuration language. From this point, we’ll talk about Nickel before the introduction of ADTs in the 1.5 release, unless stated otherwise. The core language features:
- let-bindings:
let extension = ".ncl" in "file.%{extension}"
- first-class functions:
let add = fun x y => x + y in add 1 2
- records (JSON objects):
{name = "Alice", age = 42}
- static typing:
let mult : Number -> Number -> Number = fun x y => x * y
. By default, expressions are dynamically typed. A static type annotation makes a definition or an inline expression typechecked statically. - contracts look and act almost like types but are evaluated at runtime:
{ port | Port = 80 }
. They are used to validate configurations against potentially complex schemas.
The lifecycle of a Nickel configuration is to be 1) written, 2) evaluated and 3) serialized, typically to JSON, YAML or TOML. An important guideline that we set first was that every native data structure (record, array, enum, etc.) should be trivially and straightforwardly serializable to JSON. In consequence, Nickel started with the JSON data model: records (objects), arrays, booleans, numbers and strings.
There’s one last primitive value: enums. As in C or in JavaScript, an enum in
Nickel is just a tag. An enum value is an identifier with a leading '
, such as
in {protocol = 'http, server = "tweag.io"}
. An enum is serialized as a string:
the previous expression is exported to JSON as {"protocol": "http", "server": "tweag.io"}
.
So why not just using strings? Because enums can better represent a finite set
of alternatives. For example, the enum type [| 'http, 'ftp, 'sftp |]
is the
type of values that are either 'http
, 'ftp
or 'sftp
. Writing protocol : [| 'http, 'ftp, 'sftp |]
will statically (at typechecking time) ensure that
protocol
doesn’t take forbidden values such as 'https
. Even without static
typing, using an enum conveys to the reader that a field isn’t a free-form
string.
Nickel has a match
which corresponds to C or JavaScript’s switch
:
is_http : [| 'http, 'ftp, 'sftp |] -> Bool =
match {
'http => true,
_ => false,
}
As you might notice, there are no ADTs in sight yet.
ADTs in a configuration language
While Nickel is a functional language, it’s first and foremost a configuration language, which comes with specific design constraints.
Because we’re telling the story of ADTs before they landed in Nickel, we can’t really
use a proper Nickel syntax yet to provide examples. In what follows, we’ll use a
Rust-like syntax to illustrate the examples: enum Foo<T> { Bar(i32), Baz(bool, T) }
is an ADT parametrized by a generic type T
with two constructors Bar
and Baz
, where the first one takes an integer as an argument and the other
takes a pair of a boolean and a T
. Concrete values are written as Bar(42)
or
Baz(true, "hello")
.
An unexpected obstacle: serialization
As said earlier, we want values to be straightforwardly serializable to the JSON data model.
Now, take a simple ADT such as enum Foo<T,U> = { SomePair(T,U), Nothing }
. You
can find reasonable serializations for SomePair(1,2)
, such as {"tag": "SomePair", "a": 1, "b": 2}
. But why not {"flag": "SomePair", "0": 1, "1": 2}
or {"mark": "SomePair", "data": [1, 2]}
? While those representations are isomorphic, it’s hard to know
the right choice for the right use-case beforehand, as it depends on the
consumer of the resulting JSON. We really don’t want to make an arbitrary choice
on behalf of the user.
Additionally, while ADTs are natural for a classical typed functional language,
they might not entirely fit the configuration space. A datatype like enum Literal { String(String), Number(Number) }
that can store either a string or a
number is usually represented directly as an untagged union in a
configuration, that {"literal": 5}
or {"literal": "hello"}
, instead of the
less natural tagged union (another name for ADTs) {"literal": {"tag" = "Number", "value": 5}}
.
This led us to look at (untagged) union types instead. Untagged unions have the advantage of not making any choice about the serialization: they aren’t a new data structure, as are ADTs, but rather new types (and contracts) to classify values that are already representable.
The road of union types
A union type is a type that accepts different alternatives. We’ll use the
fictitious \/
type combinator to write a union in Nickel (|
is commonly used
elswhere but it’s already taken in Nickel). Our previous example of a literal
that can be either a string or a number would be {literal: Number \/ String}
.
Those types are broadly useful independently of ADTs. For example, JSON Schema
features unions through the core combinator any_of
.
Our hope was to kill two birds with one stone by adding unions both as a way to better represent existing configuration schemas, but also as a way to emulate ADTs. Using unions lets users represent ADTs directly as plain records using their preferred serialization scheme. Together with flow-sensitive typing, we can get as expressive as ADTs while letting the user decide on the encoding. Here is an example in a hypothetical Nickel enhanced with unions and flow-sensitive typing:
let sum
: {tag = 'SomePair, a : Number, b : Number} \/ {tag = 'Nothing}
-> Number
= match {
{tag = 'SomePair, a, b} => a + b,
{tag = 'Nothing} => 0,
}
Using unions and flow-sensitive typing as ADTs is the approach taken by TypeScript, where the previous example would be:
type Foo = { tag: "SomePair"; a: number; b: number } | { tag: "Nothing" }
function sum(value: Foo): number {
switch (value.tag) {
case "SomePair":
return value.a + value.b
case "Nothing":
return 0
}
}
In Nickel, any type must have a contract counter-part. Alas union and intersection contracts are hard (in fact, union types alone are also not a trivial feat to implement!). In the linked blog post, we hint at possible pragmatic solutions for union contracts that we finally got to implement for Nickel 1.8. While sufficient for practical union contracts, this is far from the general union types that could subsume ADTs. This puts a serious stop to the idea of using union types to represent ADTs.
What are ADTs really good for?
As we have been writing more and more Nickel, we realized that we have been missing ADTs a
lot for library functions - typically the types enum Option<T> { Some(T), None }
and Result<T,E> = { Ok(T), Error(E) }
- where we don’t care about
serialization. Those ADTs are “internal” markers that wouldn’t leak out to the
final exported configuration.
Here are a few motivating use-cases.
std.string.find
std.string.find
is a function that searches for a substring in a string. Its
current type is:
String
-> String
-> { matched : String, index : Number, groups : Array String }
If the substring isn’t found, {matched = "", index = -1, groups []}
is
returned, which is error-prone if the consumer doesn’t defend against such
values. We would like to return a proper ADT instead, such as Found {matched : String, index : Number, groups : Array String}
or NotFound
, which
would make for a better and a safer interface1.
Contract definition
Contracts are a powerful validation system in Nickel. The ability to plug in your own custom contracts is crucial.
However, the general interface to define custom contracts can seem bizarre.
Custom contracts need to set error reporting data on a special label
value and
use the exception-throwing-like std.contract.blame
function. Here is a
simplified definition of std.number.Nat
which checks that a value is natural
number:
fun label value =>
if std.typeof value == 'Number then
if value % 1 == 0 && value >= 0 then
value
else
let label = std.contract.label.with_message "not a natural" in
std.contract.blame label
else
let label = std.contract.label.with_message "not a number" in
std.contract.blame label
There are good (and bad) reasons for this situation, but if we had ADTs, we
could cover most cases with an alternative interface where custom contracts
return a Result<T,E>
, which is simpler and more natural:
fun value =>
if std.typeof value == 'Number then
if value % 1 == 0 && value >= 0 then
Ok
else
Error("not a natural")
else
Error("not a number")
Of course, we could just encode this using a record, but it’s just not as nice.
Let it go, let it go!
The list of other examples of using ADTs to make libraries nicer is endless.
Thus, for the first time, we decided to introduce a native data structure that isn’t serializable.
Note that this doesn’t break any existing code and is forward-compatible with making ADTs serializable in the future, should we change our mind and settle on one particular encoding. Besides, another feature is independently explored to make serialization more customizable through metadata, which would let users use custom (de)serializer for ADTs easily.
Ok, let’s add the good old-fashioned ADTs to Nickel!
The design
Structural vs nominal
In fact, we won’t exactly add the old-fashioned version. ADTs are traditionally implemented in their nominal form.
A nominal type system (such as C, Rust, Haskell, Java, etc.) decides if two
types are equal based on their name and definition. For example, values of enum Alias1 { Value(String) }
and enum Alias2 { Value(String) }
are entirely
interchangeable in practice, but Rust still doesn’t accept Alias1::Value(s)
where a Alias2
is expected, because those types have distinct definitions.
Similarly, you can’t swap a class for another in Java just because they have
exactly the same fields and methods.
A structural type system, on the other hand, only cares about the shape of data.
TypeScript has a structural type system. For example, the types interface Ball { diameter: number; }
and interface Sphere { diameter: number; }
are entirely
interchangeable, and {diameter: 42}
is both a Ball
and a Sphere
. Some
languages, like OCaml2 or Go3,
mix both.
Nickel’s current type system is structural because it’s better equipped to handle arbitrary JSON-like data. Because ADTs aren’t serializable, this consideration doesn’t weight as much for our motivating use-cases, meaning ADTs could be still be either nominal or structural.
However, nominal types aren’t really usable without some way of exporting and importing type definitions, which Nickel currently lacks. It sounds more natural to go for structural ADTs, which seamlessly extend the existing enums and would overall fit better with the rest of the type system.
Structural ADTs look like the better choice for Nickel. We can build, typecheck, and match on ADTs locally without having to know or to care about any type declaration. Structural ADTs are a natural extension of Nickel (structural) enums, syntactically, semantically, and on the type level, as we will see.
While less common, structural ADTs do exist in the wild and they are pretty cool. OCaml has both nominal ADTs and structural ADTs, the latter being known as polymorphic variants. They are an especially powerful way to represent a non trivial hierarchy of data types with overlapping, such as abstract syntax trees or sets of error values.
Syntax
C-style enums are just a special case of ADTs, namely ADTs where constructors
don’t have any argument. The dual conclusion is that ADTs are enums with
arguments. We thus write the ADT Some("hello")
as an enum with an argument in
Nickel: 'Some "hello"
.
We apply the same treatment to types. [| 'Some, 'None |]
was a valid enum
type, and now [| 'Some String, 'None |]
is also a valid type (which would
correspond to Rust’s Option<String>
).
There is a subtlety here: what should be the type inferred for 'Some
now? In
a structural type system, 'Some
is just a free-standing symbol. The typechecker
can’t know if it’s a constant that will stay as it is - and thus has the type
[| 'Some |]
- or a constructor that will be eventually applied, of type a -> [| 'Some a |]
. This difficulty just doesn’t exist in a nominal type system like
Rust: there, Option::Some
refers to a unique, known and fixed ADT constructor
that is known to require precisely one argument.
To make it work, 'Ok 42
isn’t actually a normal function application in
Nickel: it’s an ADT constructor application, and it’s parsed differently. We
just repurpose the function application syntax4
in this special case. 'Ok
isn’t a function, and let x = 'Ok in x 42
is an
error (applying something that isn’t a function).
You can still recover Rust-style constructors that can be applied by defining a
function (eta-expanding, in the functional jargon): let ok = fun x => 'Ok x
.
We restrict ADTs to a single argument. You can use a record to emulate multiple
arguments: 'Point {x = 1, y = 2}
.
ADTs also come with pattern matching. The basic switch that was match
is now a
powerful pattern matching construct, with support for ADTs but also arrays,
records, constant, wildcards, or-patterns and guards (if
side-conditions).
Typechecking
Typechecking structural ADTs is a bit different from nominal ADTs. Take the
simple example (the enclosing : _
annotation is required to make the example
statically typed in Nickel)
(
let data = 'Ok 42 in
let process = match {
'Ok x => x + 1,
'Error => 0,
} in
process data
) : _
process
is inferred to have type [| 'Ok Number, 'Error |] -> Number
. What
type should we infer for data = 'Ok 42
? The most obvious one is [| 'Ok Number |]
. But then [| 'Ok Number |]
and [| 'Ok Number, 'Error |]
don’t match and
process data
doesn’t typecheck! This is silly, because this example should
be perfectly valid.
One possible solution is to introduce subtyping, which is able to express this
kind of inclusion relation: here that [| 'Ok Number |]
is included in [| 'Ok Number, 'Error |]
. However, subtyping has some defects and is whole can of
worms when mixed with polymorphism (which Nickel has).
Nickel rather relies on another approach called row polymorphism, which is the ability to abstract over not just a type, as in classical polymorphism, but a whole piece of an enum type. Row polymorphism is well studied in the literature, and is for example implemented in PureScript. Nickel already features row polymorphism for basic enum types and for records types.
Here is how it works:
let process : forall a. [| 'Ok Number, 'Error; a |] -> Number = match {
'Ok x => x + 1,
'Error => 0,
_ => -1,
} in
process 'Other
Because there’s a catch-all case _ => -1
, the type of process
is
polymorphic, expressing that it can handle any other variant beside 'Ok Number
and 'Error
(this isn’t entirely true: Ok String
is forbidden for example, because it can’t be distinguished from Ok Number
). Here, a
can be substituted for a subsequence of an enum type,
such as 'Foo Bool, 'Bar {x : Number}
.
Equipped with row polymorphism, we can infer the type forall a. [| 'Ok Number; a |]
5 for 'Ok 42
. When typechecking process data
in the
original example, a
will be instantiated to the single row 'Error
and the
example typechecks. You can learn more about structural ADTs and row
polymorphism in the corresponding section of the Nickel user
manual.
Conclusion
While ADTs are part of the basic package of functional languages, Nickel didn’t have them until relatively recently because of peculiarities of the design of a configuration language. After exploring the route of union types, which came to a dead-end, we settled on a structural version of ADTs that turns out to be a natural extension of the language and didn’t require too much new syntax or concepts.
ADTs already prove useful to write cleaner and more concise code, and to improve the interface of libraries, even in a gradually typed configuration language. Some concrete usages can be found in try_fold_left and validators already.
- Unfortunately, we can’t change the type of
std.string.find
without breaking existing programs (at least not until a Nickel 2.0), but this use-case still applies to external libraries or future stdlib functions↩ - In OCaml, Objects, polymorphic variants and modules are structural while records and ADTs are nominal.↩
- In Go, interfaces are structural while structs are nominal.↩
- Repurposing application is theoretically backward
incompatible because
'Ok 42
was already valid Nickel syntax before 1.5, but it was meaningless (an enum applied to a constant) and would always error out at runtime, so it’s ok.↩ - In practice, we infer a simpler type
[| 'Ok Number; ?a |]
where?a
is a unification variable which can still have limitations. Interestingly, we decided early on to not perform automatic generalization, as opposed to the ML tradition, for reasons similar to the ones exposed here. Doing so, we get (predicative) higher-rank polymorphism almost for free, while it’s otherwise quite tricky to combine with automatic generalization. It turned out to pay off in the case of structural ADTs, because it makes it possible to side-step those usual enum types inclusion issues (widening) by having the user add more polymorphic annotations. Or we could even actually infer the polymorphic type[| forall a. 'Ok Number; a |]
for literals.↩
About the author
Yann is working at Tweag on the Nickel programming language, a next-generation configuration language to manage the growing complexity of Infrastructure-as-Code and a candidate successor for the language Nix.
If you enjoyed this article, you might be interested in joining the Tweag team.