Almost a million years ago, I was dealing with some sinister bugs inside the
data structures in
linear-base and to stop the nightmares I decided to just
test the specification of the data structures themselves. I ended up using
something that I’ve been calling denotational homomorphic testing. In this
post, I’ll walk through how I ended up with this and why this is legitimately helpful.
Let’s consider a toy example which corresponds to one of the data structures I
was working with; see PR #263. How shall we test a specification of a simple
implementation shown below?
-- Constructors / Modifiers empty :: Set a insert :: Keyed a => a -> Set a -> Set a delete :: Keyed a => a -> Set a -> Set a intersect :: Keyed a => Set a -> Set a -> Set a union :: Keyed a => Set a -> Set a -> Set a -- Accessors member :: Keyed a => a -> Set a -> Bool size :: Keyed a => Set a -> Int
My first idea was to create a specification for my data structures via axioms, and then property test those axioms.
I actually got this idea from my introductory computer science course. Our professor chose to introduce us to data structures by understanding axioms that functionally specify their behavior. Together, these axioms specified “simplification” rules that would define the value of the accessors on an arbitrary instance of that data structure. Think of it as defining rules to evaluate any well typed expression whose head is an accessor. The idea being that if you define the interface for an arbitrary use of the data structure by an external program, then you’ve defined what you want (functionally) from the data structure1.
In this system, we need at least one axiom for each accessor applied to each
constructor or modifier. If we had accessors and constructors or
modifiers, we would need at least axioms. For instance, for the accessor
member, we would need at least 5 axioms, one for each of the 5 constructors
or modifiers, and these are some of the axioms we would need:
-- For all x, y != x, member empty x == False member (insert x s) x == True member (insert x s) y == member s y
Intuitively, the axioms provide a way to evaluate any well typed expression starting
member to either
With this specification by axioms, I thought I could just property test each axiom
hedgehog and provided my samples were large enough and random enough, we’d have
a high degree of confidence in each axiom, and hence in the specification and everyone would
live happily ever after. It was a nice dream while it lasted.
It turns out that there are just too many axioms to test. For instance,
consider the axioms for
size (intersect s1 s2) = ...
You would effectively need axioms that perform the intersection on two terms
s2. A few of simple axioms would include the following.
size (intersect empty s2) = 0 -- For x ∉ s1 size (intersect (insert x s1) s2) = boolToInt (member x s2) + size (intersect s1 s2)
So, if we can’t come up with a specification that’s easy to specify, much less easy to test, what are we to do?
I looked at a test written by my colleague Utku Demir and it sparked an idea. Utku wrote some tests, actually for arrays, that translated operations on arrays to analogous operations on lists:
f <$> fromList xs == fromList (f <$> xs) fromList :: [a] -> Array a
I asked myself whether we could test sets by testing whether they map onto lists
in a sensible way. I didn’t understand this at the time, but I was really testing
a semantic function that was itself a specification for
Concretely, consider the following equations.
-- toList :: Set a -> [a] -- toList (1) does not produce repeats and (2) is injective -- Semantic Function -------------------- toList empty ==  toList (insert x s) == listInsert x (toList s) -- listInsert x l = nub (x : l) toList (delete x s) == listDelete x (toList s) -- listDelete x = filter (/= x) toList (union s1 s2) == listUnion (toList s1) (toList s2) -- listUnion a b = nub (a ++ b) toList (intersect s1 s2) == listIntersect (toList s1) (toList s2) -- listIntersect a b = filter (`elem` a) b -- Accessor Axioms ------------------ member x s == elem x (toList s) size s == length (toList s)
These effectively model a set as a list with no repeats. This model is a
Set. In other words, we can say our
is correct if and only if it corresponds to this model of lists without
Why is this so, you ask me? Well, I’ll tell you. The first five equations cover
each constructor or modifier and hence fully characterise
a semantic function. Any expression using
Set either is a
Set or is composed of operations that use the accessors. Thus, any
Sets can be converted to one using lists.
toList is injective, we can go back and forth between expressions in
the list model and expressions using
For instance, we can prove that for any sets
insert x (union s1 s2) = union (insert x s1) s2. First we have:
toList (insert x (union s1 s2)) = -- Function inverses listInsert x (toList (union s1 s2)) = -- Semantic function listInsert x (nub ((toList s1) ++ (toList s2))) = -- Semantic function nub (listInsert x (toList s1) ++ (toList s2)) = -- List properties nub (toList (insert x s1) ++ (toList s2)) = -- Semantic function toList (union (insert x s1) s2) = -- Semantic function
Then by injectivity of
toList we deduced, as expected:
insert x (union s1 s2) = union (insert x s1) s2
This actually means that our denotational specification subsumes the axiomatic specification that we were reaching for at the beginning of the post.
Now, there’s an interesting remark to make here. Our axiomatic system had axioms
that all began with an accessor. The accessors’ denotations
toList on the right-hand side, removing the need for injectivity
of the semantic function.
member (insert s x) x = elem x (toList (insert s x))
which simplifies to
elem x (nub (x:toList s)) which is clearly
The injectivity of the semantic function is only needed
for equalities between sets like the one in the proof above.
In summary, we now have an equivalent way to give a specification of
terms of laws that we can property test. Namely, we could property test all the
laws that define the semantic function, like
toList empty = , and
the accessors axioms, like
member x s = elem x (toList s), and this would
test that our implementation of
Set met our specification. And to boot,
unlike before, we’d only write one property for each constructor, accessor or
modifier. If we had accessors and constructors and modifiers, we’d need
to test about laws.
Stated abstractly, we have the following technique. Provide a specification (à
la denotational semantics) of a data structure
D by modeling it by
another simpler data structure
T. The model consists of a semantic function
sem :: D -> T and axioms that correspond constructors on
T and accessors on
D to accessors on
acc someD = acc' (sem someD). Now, test the specification by
property testing each of the
sem axioms. These
axioms look like homomorphisms in that operations on
Ds are recursively
translated to related operations on
Ts, which preserves some semblance of
structure (more on this below). If our property testing goes well, we have a
strong way to test the implementation of
It’s critical that I note that providing and testing a specification for data structures is not at all a novel problem. It has been well studied in a variety of contexts. Two of the most relevant approaches to this technique are in Conal Elliot’s paper Denotational design with type class morphisms and the textbook Algebra Driven Design.
In Elliot’s paper, he outlines a design principle: provide denotational semantics
to data structures such that the semantic function is a homomorphism over any
type classes shared by the data structure at hand and the one mapped to.
Applying this to our example, this would be like creating an
class, having an instance for
Ord a => IsSet [a] and
Ord a => IsSet (Set a), and verifying that if we changed each equation in the former to use
toList, it would be the corresponding equation in the latter. So, for example
class IsSet s where member :: a -> s a -> Bool -- ... etc -- Actual homomorphism property: -- -- > member x (toList s) = member x s --
where the property to test is actually a homomorphism.
This has the same underlying structure to our approach and the type class laws
that hold on the list instance of
IsSet prove the laws hold on the
instance, analogous to how facts on lists prove facts about
In Algebra Driven Design, the approach is similar. First generate axioms on
a reference implementation
RefSet a (which might be backed by lists) via
quickspec and test those. Then, property test the homomorphism laws from
To conclude, it seems to me that denotational homomorphic testing is a pretty cool way to think about and test data structures. It gives us a way to abstract a complex data structure by making a precise correspondence to a simpler one. Facts we know about the simpler one prove facts about the complex one, and yet, the correspondence is itself the specification. Hence proving the correspondence would be a rigorous formal verification. Falling a few inches below that, we can property test the correspondence because it’s in the form of a small number of laws and this in turn gives us a high degree of confidence in our implementation.
- Think about the batman line: “It’s not who I am underneath, but what I do that defines me”.↩