Type-checking plugins, Part I: Why write a type-checking plugin?

21 October 2021 — by Sam Derbyshire

Type-checking plugins for GHC are a powerful tool which allows users to inject domain-specific knowledge into GHC’s type-checker. In this series of posts, we will explore why you might want to write your own plugin, and how to do so.

I: Why write a type-checking plugin?
II: GHC’s constraint solver
III: Writing a type-checking plugin

In this first blog post of the series, I’ll be outlining a few examples that showcase some limitations in GHC’s instance resolution and type family reduction mechanisms. With a type-checker plugin, we are no longer restricted by these limitations, and can decide ourselves how to solve constraints and reduce type families.

Instance resolution

Recall how GHC goes about instance resolution, as per the user’s guide. A typeclass instance has two components:

instance ctxt => Cls arg_1 ... arg_n

To the left of => is the instance context, and to the right the instance head. To solve a constraint like Cls x_1 ... x_n, GHC goes through all the class instance declarations for Cls, trying to match the arguments x_1, ..., x_n against the arguments arg_1, ..., arg_n appearing in the instance head. Once it finds such an instance (which should be unique, unless one is using overlapping instances), GHC commits to it, picking up the context as a Wanted constraint (we will cover Wanted and Given constraints in depth in Part II: § Constraint solving).

However, one might be interested in using a different method to resolve instances. Let’s look at two simple examples.

State machines

Suppose we want to implement a state machine: each state corresponds to a type, and we can transition values between types according to certain rules. This can be implemented as a typeclass: arrows in the state diagram are typeclass instances.

type Transition :: Type -> Type -> Constraint
class Transition a b where { transition :: a -> b }

We could start by defining some basic instances:

instance Transition A B where {..}
instance Transition B C where {..}
instance Transition C D where {..}
instance Transition A E where {..}
instance Transition B F where {..}
instance Transition E F where {..}

Instance graph for Transition typeclass.

Then, we might want the compiler to use composition to solve other instances. For example, when trying to solve Transition A D, we notice that there is a unique path

A -> B -> C -> D

which would allow us to compose the transition functions to obtain a transition function of type A -> D. On the other hand, if we want a transition from A to F, we notice that there are two different paths, namely A -> B -> F, and A -> E -> F. We could either reject this, or perhaps choose one of the two paths arbitrarily.

This goes beyond GHC’s instance resolution mechanisms, but we could implement a graph reachability test in a type-checking plugin to solve general Transition X Y instances.

Constraint disjunction

The central feature of typeclasses is that they are open: one can define a typeclass and keep adding instances to it. This is in contrast to something like a datatype

data ABC
  = A
  | B Int
  | C Float Bool

which is closed: users can’t add new constructors to ABC in their own modules.

This property of typeclasses comes with a fundamental limitation: we can’t know in advance whether a typeclass constraint is satisfied. A typeclass constraint that was insoluble at some point might become solvable in another context (e.g. a user could define a Show (Bool -> Int) instance).

As a result, GHC does not offer any mechanism for determining whether a constraint is satisfied, as this could result in incoherent behaviour.¹ This is unfortunate, as this can be quite useful: for example, one might want to implement an arithmetic computation differently for integral vs floating-point numbers, to ensure numerical stability:

stableAlgorithm = select @(Floating a) @(Num a) fp_algo int_algo
  where
    fp_algo  :: Floating a => [a] -> a -- floating-point algorithm
    int_algo ::      Num a => [a] -> a -- integral algorithm

Here, the select function dispatches on whether the first constraint (in this case Floating a) is satisfied; when it is, it uses the first (visible) argument; otherwise, the second. This behaviour can be implemented in a type-checker plugin (see for instance my if-instance plugin): when attempting to solve a constraint disjunction ct1 || ct2, we can simply look up whether ct1 is currently satisfiable, disregarding the fact that new instances might be defined later (which can lead to incoherence as mentioned above). The satisfiability of ct1 at the point of solving the disjunction ct1 || ct2 will then determine which implementation is selected.

Stuck type families

Consider the type family

type (+) :: Nat -> Nat -> Nat
type family a + b where
  Zero   + b = b
  Succ a + b = Succ (a + b)

GHC can only reduce a + b when it knows what a is: is it Zero or is it Succ i for some i? This causes a problem when we don’t have concrete natural numbers, but still want to reason about (+):

infixr 5 :<
type Vec :: Nat -> Type -> Type
data Vec n a where
  Nil  :: Vec Zero a
  (:<) :: a -> Vec n a -> Vec (Succ n) a

-- | Interweave the elements of two vectors.
weave :: Vec n a -> Vec n a -> Vec (n + n) a
weave Nil       Nil       = Nil
weave (a :< as) (b :< bs) = a :< b :< weave as bs

To typecheck weave, we need to know that

Succ n + Succ n ~ Succ (Succ (n + n))

Using the second type family equation on the LHS, this reduces to:

Succ (n + Succ n) ~ Succ (Succ (n + n))

Peeling off Succ:

n + Succ n ~ Succ (n + n)

Now we can’t make any more progress: we don’t know which equation of (+) to use, as the first argument is a bare type variable n. We say the type family application is stuck; it doesn’t reduce.

By contrast, in a type-checking plugin, we can rewrite type-family applications involving variables, and thus implement a solver for natural number arithmetic.

Type family arguments

Suppose we are interested in solving a theory of row types, e.g. to implement a framework of extensible records.

A row is an unordered association map, field name ⇝ type, e.g.

myRow = ( "intField" :: Int, "boolField" :: Bool, "anotherInt" :: Int )

Crucially, order doesn’t matter in a row. To communicate this fact to the type-checker, we would want to be able to prove a fact such as:

Insert k v (Insert l w r) ~ Insert l w (Insert k v r)

when k and l are distinct field names that don’t appear in the row r. However, we can’t write a type family equation of the sort:

type Insert :: Symbol -> Type -> Row -> Row
type family Insert k v r where
  Insert k v (Insert l w r) = ...

• Illegal type synonym family application ‘Insert l w row’ in instance:
    Insert k v (Insert l w row)

GHC doesn’t allow type families to appear inside the LHS of type family equations. Doing so would risk non-confluence: the result of type-family reduction might depend on the order in which we rewrite arguments. For example:

type F :: Type -> Type
type family F a where
  F (F a) = a       -- F[0]
  F Bool  = Float   -- F[1]
  F a     = Maybe a -- F[2]

Given a type such as F (F Bool), we can proceed in two ways:

Reduce the outer type family application first, using the first equation (written F[0]). This yields the reduction F (F Bool) ~~> Bool.
Reduce the argument first, using the second equation, F[1], and following up with F[2]: F (F Bool) ~~> F Float ~~> Maybe Float.

We obtained different results depending on the order in which reductions were performed. To avoid this problem, GHC simply disallows type family applications from appearing on the LHS of type family equations.

In a type-checking plugin, we can inspect the arguments of a type family, and use that information in deciding how to reduce the type family application.

Performance of type-family reduction

The current implementation of type families in GHC suffers from one significant problem: they can be painfully slow.

This is because, when GHC reduces a type family application, it also creates a proof that keeps track of which type family equations were used. Such proofs can be large, in particular when using recursive type families. Returning to the example of natural number addition:

type (+) :: Nat -> Nat -> Nat
type family a + b where
  Zero   + b = b
  Succ a + b = Succ (a + b)

The proof that 5 + 0 reduces to 5, in the coercion language that will be explained in Part II: § Constraint solving, is as follows:

+[1] <Succ (Succ (Succ (Succ Zero)))> <Zero>
; (Succ (+[1] <Succ (Succ (Succ Zero))> <Zero>
        ; (Succ (+[1] <Succ (Succ Zero)> <Zero>
                ; (Succ (+[1] <Succ Zero> <Zero>
                        ; (Succ (+[1] <Zero> <Zero>
                                ; (Succ (+[0] <Zero>))))))))))

Here +[0] refers to the first type-family equation of +, and +[1] to the second. The difficulty is that, in this proof language, we store the types of the arguments. For example, in the first reduction step, in which we reduce Succ (Succ (Succ (Succ (Succ Zero)))) + Zero to Succ (Succ (Succ (Succ (Succ Zero))) + 0), the proof records the two arguments to +, namely Succ (Succ (Succ (Succ Zero))) and Zero. As a result, the size of the proof that n + 0 reduces to n is quadratic in n.

This can be a problem, for example, in libraries that implement large sum or product types using type families and type-level lists (like many anonymous record libraries do), causing slow compile-times for even moderately-sized records.

In a type-checking plugin, we can instead perform type-family reduction in a single step, returning a single proof term which omits the intermediate steps. In this way, type-checking plugins allow us to sidestep many of the performance issues that surround type families.

Practical examples

Solving with arithmetic expressions of natural numbers with Christiaan Baaij’s ghc-typelits-natnormalise,
units of measure and dimensional analysis with Adam Gundry’s uom-plugin,
regular expressions with Oleg Grenrus’s kleene-type,
row types with Divesh Otwani and Richard Eisenberg’s thoralf,
intrinsically typed System F with a solver for a lambda calculus of explicit substitutions, bundled with the ghc-tcplugin-api library.

Conclusion

We’ve seen that type-checking plugins can be useful in many different circumstances:

custom constraint-solving logic,
added flexibility for type-family reduction,
performance considerations.

The next question is then, hopefully: how does one actually write a type-checking plugin? Because type-checking plugins operate directly on constraints, it’s important to be somewhat familiar with GHC’s constraint solver, and how type-checking plugins interact with it.

This will be the topic of Part II of this series, before we dive into the practical aspects of actually writing and debugging a type-checking plugin in Part III.

In this context, coherence is the property that the runtime behaviour of programs does not depend on the specific way in which they are typechecked.↩

Behind the scenes

Sam Derbyshire

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Denotational Homomorphic Testing Hacking on Ormolu: An internship report →