Why Liquid Haskell matters

19 January 2022 — by Facundo Domínguez

Since the inception of the Haskell language, the community around it has steadily innovated with language extensions and abstractions. To mention a few, we had the IO monad, all flavors of type classes and constraints, Template Haskell, generalized algebraic data types (GADTs), data kinds, etc. Then we had to deal with the programs resulting from all this innovation, which eventually sprang feelings that ranged from sticking to Simple Haskell to supporting full motion towards Dependent Haskell.

There seems to be two design goals at conflict in current discussions. On the one hand, we want our programs to be easy to maintain. For instance, changes to one part of the code shouldn’t propagate to many others, and the meaning of a piece of code should remain reasonably deducible from the text.

On the other hand, we want our programs to do what we mean, and we want assistance from the computer to help us achieve this goal. Our human brains can only deal with a limited amount of complexity, and industrial-scale software has largely exceeded our capacity in the last several decades. But can we prove the correctness of our programs with a type-system that doesn’t make them hard to read or change?

In this post, I’m arguing that Liquid Haskell offers an alternative angle to approach this question. Liquid Haskell is a tool that can analyse a program and calculate proof obligations that would ensure that the program meets some specification (not unlike Dafny, Why3, or F*). The specification is included in the program as a special comment inserted by the programmer. The compiler ignores this comment, but Liquid Haskell can find it. Once the proof obligations are identified, they are given to a theorem prover (an SMT solver specifically) in an attempt to save the programmer the trouble of writing a proof.

The conjecture that I pose is that many of the properties that would usually require dependent types to ensure at compile time, can be described in Liquid Haskell specifications. The point is not so much that proofs are easier with Liquid Haskell, but rather that we are in front of an approach that integrates well with a programming language as is, and yet it leverages the power of tools specialized to reason about logic when verifying programs.

Verifying functions

Let us consider an indexing function for lists.

elemAt :: [a] -> Int -> a
elemAt (x :  _) 0 = x
elemAt (_ : xs) i = elemAt xs (i-1)

This function can fail at runtime if the index is negative or greater or equal to the length of the list. If we wanted to do indexing safely, a formulation with dependent types could change the type of the function. The list type is replaced with the type of lists with a given length, and the integer type is replaced with the type of natural numbers smaller than the length.

elemAt1 :: Vec a n -> Fin n -> a
elemAt1 xs i = case (i, xs) of
  (FZ, x :>  _) -> x
  (FS i, _ :> xs) -> elemAt1 xs i

-- Vec a n = lists of length n
data Vec :: Type -> Nat -> Type where
  VNil :: Vec a Zero
  (:>) :: a -> Vec a n -> Vec a (Succ n)

-- Fin n = natural numbers smaller than n
data Fin :: Nat -> Type where
  FZ :: Fin (Succ n)
  FS :: Fin n -> Fin (Succ n)

data Nat = Zero | Succ Nat

We include the types of Vec and Fin to illustrate some of the complexity that needs to be introduced to work with dependent types, but we are not overly concerned with their details. A gentle presentation of these and the following examples with dependent types can be found in this tutorial.

The original function has been rewritten to be total. The types of the arguments have been changed to exclude from the domain the invalid indices. Being a total function, it can no longer fail at runtime. Moreover, the new types force all invocations to provide a valid index. In the transformation, though, we lost the original simplicity. The list type is no longer the standard list type for which many functions are offered in the standard libraries, and the efficient Int type has been replaced by a representation that counts sticks. To avoid conversions between the standard types and the new Vec and Fin types, we could be tempted to use these types in code that needs list indexing, which then would make this change less local than we could wish for.

With Liquid Haskell, instead, we could get our safety guarantees by writing a specification of the original function expressed with predicate logic. We use a refinement type on the index argument to state its validity.

-- specifications go between special comments `{-@ ... @-}`
{-@ elemAt :: xs:[a] -> { i:Int | 0 <= i && i < len xs } -> a @-}
elemAt :: [a] -> Int -> a
elemAt (x :  _) 0 = x
elemAt (_ : xs) i = elemAt xs (i-1)

A refinement type denotes a subtype of another type, characterized by a given predicate. In our example, { i:Int | 0 <= i && i < len xs } is a subtype of Int, whose values i satisfy the predicate 0 <= i && i < len xs. The specifications allow us to name the arguments of functions and to refer to them in the predicates, such as the list argument named xs. The language used to describe a predicate is not Haskell, but a separate language with functions and logical connectives. When predicates need to refer to Haskell functions, the functions can be translated to the logic.

The function elemAt continues to be partial in Haskell, but it is total according to its specification. And Liquid Haskell is powerful enough to check that elemAt meets the specification, i.e. given a valid index, the function won’t fail. It is a choice of the user to use Liquid Haskell to ensure that the invocations meet the specification as well, or to leave the invocations unverified. The solution not only leaves the original function untouched, which dependent types could also achieve, but it is remarkably economical in the way of expressing the property we care about.

Beyond bounds checking

Bounds checking is useful enough, but given the ability of Liquid Haskell to translate pure Haskell functions into the logic, we can aspire to verify other kinds of properties too. The following example is a datatype that can be used to represent untyped lambda expressions. It has constructors for variables, lambda abstractions, and applications.

-- | Unchecked expression, indexed by a bound of the allowed free variables
data UExp (n :: Nat)
  = UVar (Fin n)   -- ^ de Bruijn index for a variable
  | ULam (UExp (Succ n))
  | UApp (UExp n) (UExp n)
  deriving Show

Variables are represented with a de Bruijn index. That is, the variable UVar i refers to the variable bound by the i-th lambda abstraction surrounding the variable occurrence. Thus, ULam (ULam (UVar FZ)) stands for the lambda expression \x . \y . y, and ULam (ULam (UVar (FS FZ))) stands for \x. \y. x.

The type index of UExp allows us to refer conveniently to closed expressions as UExp Zero. In a closed expression, all variables have a matching lambda abstraction. Thus, ULam (UVar FZ) :: UExp Zero type checks, but UVar FZ :: UExp Zero wouldn’t typecheck because there is no lambda abstraction to bind UVar FZ and it has type forall n. UExp (Succ n).

Besides the fact that we are still counting sticks to identify variables, the type index needs to be carried around everywhere we deal with these expressions regardless of the task at hand. The Liquid Haskell counterpart looks as follows.

{-@
data UExp
  = UVar ({i:Int | 0 <= i}) // indices are specified to be non-negative
  | ULam UExp
  | UApp UExp UExp
@-}
data UExp
  = UVar Int
  | ULam UExp
  | UApp UExp UExp

-- | Computes an upper bound of the variables that appear free
-- in an expression.
{-@ reflect freeVarBound @-}
{-@ freeVarBound :: UExp -> { i:Int | 0 <= i } @-}
freeVarBound :: UExp -> Int
freeVarBound (UVar v) = v + 1
freeVarBound (ULam body) = max 0 (freeVarBound body - 1)
freeVarBound (UApp e1 e2) = max (freeVarBound e1) (freeVarBound e2)

With these definitions we can mimic the original indexed type with the following type synonyms.

-- Type synonym parameters starting with an upper case letter stand
-- for variables that are substituted with term-level expressions
-- instead of types.
{-@ type UExpN N = { e:UExp | freeVarBound e <= N } @-}
{-@ type ClosedUExp = UExpN 0 @-}

One feature of the Liquid Haskell way is that we don’t need to use a type index in the Haskell datatypes. We are only concerned with the ability to express the type of closed expressions in specifications, not in the actual program.

Another feature is that we are using a simple Haskell function to express what the bound of free variables is for a particular expression, and then we are using this function in the logic to say that an expression is closed. The reflect keyword in the specification of freeVarBound is directing Liquid Haskell to translate the function to the logic language.

Now we can test our type specifications with example expressions, and this is possible without a single GADT or a promoted data constructor.

{-@ e0 :: ClosedUExp @-}
e0 :: UExp
e0 = ULam (UVar 0)

{-@ e1 :: ClosedUExp @-} -- Fails verification
{-@ e1 :: UExpN 1 @-} -- Passes verification
e1 :: UExp
e1 = UVar 0

Beyond closed expressions

As a final example, let us consider a juicier Haskell datatype to represent typed lambda expressions. These expressions are functions manipulating values of some opaque type T.

-- | @Exp ctx ty@ is a well-typed expression of type @ty@ in context
-- @ctx@. Note that a context is a list of types, where a type's index
-- in the list indicates the de Bruijn index of the associated term-level
-- variable.
data Exp :: forall n. Vec Ty n -> Ty -> Type where
  Var   :: Elem ctx ty -> Exp ctx ty
  Lam   :: Exp (arg :> ctx) res -> Exp ctx (arg :-> res)
  App   :: Exp ctx (arg :-> res) -> Exp ctx arg -> Exp ctx res

-- | A type encoding types. @T@ is the atom while @:->@ refers to
-- the type arrow '->'.
data Ty = T | Ty :-> Ty

-- | @Elem xs x@ is evidence that @x@ is in the vector @xs@.
-- @EZ :: Elem xs x@ is evidence that @x@ is the first element of @xs@.
-- @ES ev :: Elem xs x@ is evidence that @x@ is one position later in
-- @xs@ than is indicated in @ev@
data Elem :: forall a n. Vec a n -> a -> Type where
  EZ :: Elem (x :> xs) x
  ES :: Elem xs x -> Elem (y :> xs)

The Exp datatype still tracks whether variables are in scope, this time with an Elem type that stands for a proof that a variable is in the typing context. Additionally, the datatype ensures that the expressions are well typed.

In Liquid Haskell we start with untyped expressions as before, with the novelty that the ULam constructor now has an extra argument to indicate the type of the binding. The extra argument is necessary to implement a type inference function named inferType.

data UExp
  = UVar Int
  | ULam Ty UExp
  | UApp UExp UExp
  deriving Show

{-@ reflect elemAt @-}
{-@ reflect inferType @-}
{-@ inferType :: ctx:[Ty] -> UExpN (len ctx) -> Maybe Ty @-}
inferType :: [Ty] -> UExp -> Maybe Ty
inferType ctx (UVar i) = Just (elemAt ctx i)
inferType ctx (ULam t body) =
  case inferType (t : ctx) body of
    Just r -> Just (t :-> r)
    Nothing -> Nothing
inferType ctx (UApp e0 e1) =
  case inferType ctx e0 of
    Just (a :-> r) -> case inferType ctx e1 of
      Just t -> if a == t then Just r else Nothing
      Nothing -> Nothing
    _ -> Nothing

Liquid Haskell verifies that the list indexing in the first equation of inferType is safe, thanks to the refinement type of the expression that ensures that the index is within the bounds of the list. The same refinement type ensures that we don’t forget to grow the context when we infer the type for the body of a lambda expression. And we also get the assurance that inferType terminates on inputs that meet the specification. In order to express well-typed terms, we only need now a type synonym.

{-@ type WellTypedExp CTX TY = { e:UExp | freeVarBound e <= len CTX && inferType CTX e == Just TY } @-}

{-@ e2 :: WellTypedExp [T] T @-}
e2 :: UExp
e2 = UVar 0

{-@ e3 :: WellTypedExp [] (T :-> T) @-}
e3 :: UExp
e3 = ULam T (UVar 0)

The specification of well-typed terms borrows the function inferType from Haskell, which has been translated to the logic. This allows to reuse the understanding that the user has of inferType to express what being well-typed means. When verifying e2 and e3, Liquid Haskell relies on the the SMT solver and other reasoning mechanisms of its own.

Summing up

The examples in this post that use Liquid Haskell can be found here. If you would like to see more substantial use of Liquid Haskell, there are two implementations of an interpreter for lambda expressions that can be compared side by side. The first implementation relies on emulating dependent types in Haskell and was implemented by fellow Tweager Richard Eisenberg. The second implementation is my reimplementation of Richard’s using Liquid Haskell.

In this post I have presented some examples where Liquid Haskell is expressive enough to deal with properties that are typically associated with the need of a dependently-typed language. Yet, unlike dependent types, Liquid Haskell allows us to keep programs to verify unchanged from the perspective of the compiler, and the trouble of discharging proof obligations is addressed in cooperation with theorem provers.

The downside of this way is that we now have to deal with the complexities of the integration. We need to orchestrate the communication between the different tools, translate our programs into the logics of the theorem provers, and translate the errors back to terms that the user can relate to the program being verified. There are more opportunities to make mistakes in the seams between the tools, while dependently-typed languages are often built on a single compiler with a possibly smaller trusted core.

On the user side, though, we can turn Liquid Haskell into a philosophy for approaching verification. We are not necessarily doomed to build a behemoth language-implementation capable of verifying it all. Instead, we can separate our verification from our programming needs and tackle each group with dedicated tools in order to better reuse code and techniques on both fronts.

Behind the scenes

Facundo Domínguez

Facundo is a software engineer supporting development and research projects at Tweag. Prior to joining Tweag, he worked in academia and in industry, on a varied assortment of domains, with an overarching interest in programming languages.

Tech Group

High Assurance Software

Automating the verification of correct behaviour in complex systems.

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← Trustix - Good things come in trees A Case Study on Correctness and Safety Testing of Stateful Systems →