During my internship at Tweag, I’ve been given the opportunity to work on GHC alongside Simon Peyton Jones at Microsoft Research Cambridge (MSRC) and my Tweag supervisor Richard Eisenberg. During a visit at MSRC, I got caught up in a discussion regarding the lazy or eager instantiation of type variables in GHC.

This discussion serves as a great showcase for how language design works in practice: it is a hard and involved process where not everyone will agree on the same answers. In this blog post I will show both sides of the discussion in order to:

- Showcase the kind of tradeoffs that are made in language design.
- Clarify this discussion and its relevance for the Haskell language and its community.
- Make sure that you, as a Haskell developer, are sufficiently informed when you encounter the problems described here in your day to day life.

I thus wholeheartedly encourage you to visit the GitHub thread after reading this post!

You can find all the code examples from this blog post here.

## What’s the Problem?

Over the past few months, we encountered the following three issues:

### Synonyms

Consider the following example, where we define `myConst`

to be a synonym
for `const`

, whose type is `forall a b. a -> b -> a`

:

`myConst = const`

When inferring the type for `myConst`

, the compiler *instantiates* the type of
`const`

, meaning that the type variables `a`

and `b`

get replaced by
(potentially yet unknown) types. Afterwards, the compiler *generalises* this
type, meaning that all remaining unknown types get bound using new `forall`

binders.

But should the resulting type for `myConst`

be `forall a b. a -> b -> a`

or
`forall b a. a -> b -> a`

? There is no way for the compiler to know the
intended order of the generalised variables. While these types look equivalent,
they are most certainly not in combination with type applications. Should the
type of `myConst @Int`

be `forall b. Int -> b -> Int`

or `forall a. a -> Int -> a`

?

For this reason GHC only allows type application for user defined type
variables, called “specified” variables. Compiler generated variables, called
“inferred” variables, can never be manually instantiated, and GHC marks them
using braces, resulting in `forall {a} {b}. a -> b -> a`

. If you want to know
more about this distinction, you can read my last blog post, which
goes into much greater depth on the topic.

This means that `myConst`

and `const`

do not have the same type, which
contradicts our intuition that they should behave identically.

### Type Abstraction

While discussing type inference for type abstraction in lambda binders, another
issue popped up, this time related to type abstraction, as discussed in this
accepted (but not yet implemented) GHC proposal. Just as you would
write a lambda binder `\ x -> e`

to introduce a term variable `x`

and bring
it into scope in `e`

, this proposal allows `\ @a -> e`

to bind a type
variable `a`

and bring it into scope in `e`

.

At the moment, the proposal only discusses this feature under type-*checking*,
meaning that a type signature needs to be present. However, in order to extend
this feature with type inference, things got a bit more hairy. Consider the
following example:

`foo = \ @a (x :: a) -> x`

We would expect the inferred type for `foo`

to
be `forall a. a -> a`

, since we explicitly abstract over `a`

: `a`

is, literally, specified.
However, under eager type instantiation, the type for `foo`

would actually be
`forall {a}. a -> a`

.

### Nested Foralls

A third issue appears when instantiating types with nested `forall`

binders.
Consider the following example:

```
{-# LANGUAGE RankNTypes #-}
f :: forall a. a -> forall b. b -> b
f x = id
g = f
```

We ask GHC to infer the type of `g`

for us, and as you might expect from the
`myConst`

example above, the types of `f`

and `g`

are not identical:

```
*Main> :set -fprint-explicit-foralls
*Main> :info f
f :: forall a. a -> forall b. b -> b
*Main> :info g
g :: forall {a}. a -> forall b. b -> b
```

While typing `g`

, GHC instantiates and generalises the binder for `a`

. As a
consequence, the type of `g`

now contains a mix of type variables which can
and can not be manually instantiated. This hard-to-predict handling of type
variables is quite confusing.

This behaviour was introduced in a recent GHC proposal and is
expected to be released in GHC 8.12. If you want to try this example
for yourself, you can download our prebuilt GHC compiler using
`docker pull gertjanb/simplified-subsumption-ghc`

, and run it using
`docker run -it gertjanb/simplified-subsumption-ghc`

.

## Eager vs. Lazy Instantiation

All three issues explained above involve type variable instantiation, so let us
explore this in a bit more detail. When type inference encounters the variable
`f`

in the last example, a choice emerges when assigning it a type:

- GHC 8.10 instantiates types eagerly, resulting in the following type:
`alpha -> forall b. b -> b`

(note that we use greek letters to denote types yet to be determined). Finally, generalisation produces for`g`

the type`forall {a}. a -> forall b. b -> b`

. - Another choice is to instantiate lazily, that is, returning the type of
`f`

as is, and only instantiating it when needed. The function`g`

thus gets assigned the type of`f`

:`forall a. a -> forall b. b -> b`

.

## A Silver Bullet!

Up to GHC 8.10, type variables have always been instantiated eagerly. However, lazy instantiation might solve our issues above! Let’s investigate:

### Synonyms

When inferring a type for `myConst`

under lazy instantiation, the type
variables of `const`

would not get instantiated, resulting in the inferred
type `forall a. a -> a`

. This is the type we would expect, and
indeed GHC no longer treats `const`

and `myConst`

differently.

### Type Abstraction

When inferring a type for `foo`

under lazy instantiation, the user-specified
type variable `a`

does not get abstracted over, and the inferred type becomes
`forall a. a -> a`

. This is again the type we would expect for `foo`

, and
using this type as a signature works like a charm.

### Simplified Subsumption

A similar story holds for our `g`

example above. Since type variables do not
get instantiated unless absolutely necessary, its type becomes identical to the
type of `f`

: `forall a. a -> forall b. b -> b`

with no inferred variables.

## … But Comes at a Cost.

Unfortunately, as pointed out by Simon Peyton Jones, lazy instantiation might not be the amazing solution we hoped it would be. While figuring out the details of how this should work in practice, a number of new issues popped up:

### Case expressions

Type inference for case expressions requires the compiler to assign a
monomorphic type to each of the branches. This means that the type can not
contain top-level `forall`

binders or binders on the right of function
arrows. In order to illustrate this restriction, consider the following
example:

```
bar1 True = \ x -> id
bar2 True = \ x -> id
bar2 False = error "Impossible case for reasons"
```

Under eager instantiation, the inferred types are as you would expect:
when encountering `id`

, its forall type is instantiated eagerly.
This results in the type
`forall {a} {b}. Bool -> a -> b -> b`

for both `bar1`

and `bar2`

.

The case of lazy instantiation is more interesting. The function `bar1`

gets
the type `forall {a}. Bool -> a -> forall b. b -> b`

. But by adding the
catch-all sanity check in `bar2`

, we are actually introducing a case
expression, thus forcing the compiler to return a monomorphic type. The
`forall`

s get instantiated, resulting in the same type we get from eager
instantiation.

### Evaluation

So far the impact of our discussion has mainly been limited to type level differences. However, these type level choices do have an impact on the actual evaluation of the program. Consider the following example:

```
{-# LANGUAGE BangPatterns #-}
diverge = let !x = undefined in ()
```

Note that the bang forces GHC to evaluate `x`

eagerly. We would thus expect
this function to throw an exception. However, while this is certainly the case
under eager instantiation, this does not hold when variables are instantiated
lazily. This happens because the type of `undefined`

is `forall a. HasCallStack => a`

, and eager instantiation will instantiate the type `a`

and the `HasCallStack`

argument. Evaluating this instantiated `undefined`

throws an exception, as we would expect. However, under lazy instantiation, the
type of `undefined`

remains effectively a function type (from
`HasCallStack`

evidence to type `a`

), and functions do not diverge.

### Implicit Arguments

The story becomes even more involved when we include GHC’s implicit arguments. While recognizing that this extension is not widely used, it remains important to take all the compiler features into account when considering changes. Take for instance the following code example:

```
{-# LANGUAGE ImplicitParams #-}
x :: (?i :: Int) => Int
x = ?i
y :: (?i :: Int) => Int
y = let ?i = 5 in x
z :: Int
z = let ?i = 6 in y
```

Again, the choice of either eager or lazy instantiation determines the
evaluation outcome. Similarly to before, under eager instantiation, while typing
`y`

, the type of `x`

gets instantiated right away with `?i = 5`

. On the
other hand, under lazy instantiation, this is postponed as far as possible.
Concretely, at the very end when typing `z`

, the implicit variable `?i`

has
to be instantiated, in this case with `?i = 6`

.
This means that `z`

evaluates to `5`

under eager instantiation (as most
people would expect), but evaluates to `6`

under lazy instantiation.

## Compromises Have to be Made

I hope this blog post illustrates that these decisions are just plain hard, and ultimately quite subjective. Both eager and lazy instantiation are reasonable approaches and lead to type-safe languages. In the end the choice comes down to taste and desire for the best user experience.

After going back and forth a couple of times, we finally concluded that eager instantiation seems most sensible: while lazy instantiation would certainly solve the three issues described above, it unfortunately comes at too heavy a cost. Instead, we will just have to accept the strangeness of synonyms not behaving like we expect and shallow instantiation making some variables inferred while keeping others specified. Regarding type abstraction in lambda binders, to avoid the issues described above, we propose limiting this feature to type checking only.

## Conclusion

GHC is made by the Haskell community, so it’s important that you’re informed about this discussion. I thus wholeheartedly encourage you to visit the Github page and continue reading about the topic.