 Denotational Homomorphic Testing

13 October 2021 — by Divesh Otwani

Almost a million years ago, I was dealing with some sinister bugs inside the data structures in linear-base and to stop the nightmares I decided to just test the specification of the data structures themselves. I ended up using something that I’ve been calling denotational homomorphic testing. In this post, I’ll walk through how I ended up with this and why this is legitimately helpful.

## Our Toy Example

Let’s consider a toy example which corresponds to one of the data structures I was working with; see PR #263. How shall we test a specification of a simple Set implementation shown below?

-- Constructors / Modifiers
empty :: Set a
insert :: Keyed a => a -> Set a -> Set a
delete :: Keyed a => a -> Set a -> Set a
intersect :: Keyed a => Set a -> Set a -> Set a
union :: Keyed a => Set a -> Set a -> Set a

-- Accessors
member :: Keyed a => a -> Set a -> Bool
size :: Keyed a => Set a -> Int

## Specification via Axioms

My first idea was to create a specification for my data structures via axioms, and then property test those axioms.

I actually got this idea from my introductory computer science course. Our professor chose to introduce us to data structures by understanding axioms that functionally specify their behavior. Together, these axioms specified “simplification” rules that would define the value of the accessors on an arbitrary instance of that data structure. Think of it as defining rules to evaluate any well typed expression whose head is an accessor. The idea being that if you define the interface for an arbitrary use of the data structure by an external program, then you’ve defined what you want (functionally) from the data structure1.

In this system, we need at least one axiom for each accessor applied to each constructor or modifier. If we had $m$ accessors and $n$ constructors or modifiers, we would need at least $mn$ axioms. For instance, for the accessor member, we would need at least 5 axioms, one for each of the 5 constructors or modifiers, and these are some of the axioms we would need:

-- For all x, y != x,
member empty x == False
member (insert x s) x == True
member (insert x s) y == member s y

Intuitively, the axioms provide a way to evaluate any well typed expression starting with member to either True or False.

With this specification by axioms, I thought I could just property test each axiom in hedgehog and provided my samples were large enough and random enough, we’d have a high degree of confidence in each axiom, and hence in the specification and everyone would live happily ever after. It was a nice dream while it lasted.

It turns out that there are just too many axioms to test. For instance, consider the axioms for size on intersection.

size (intersect s1 s2) = ...

You would effectively need axioms that perform the intersection on two terms s1 and s2. A few of simple axioms would include the following.

size (intersect empty s2) = 0
-- For x ∉ s1
size (intersect (insert x s1) s2) = boolToInt (member x s2) + size (intersect s1 s2)

So, if we can’t come up with a specification that’s easy to specify, much less easy to test, what are we to do?

## Specification via Denotational Semantics

I looked at a test written by my colleague Utku Demir and it sparked an idea. Utku wrote some tests, actually for arrays, that translated operations on arrays to analogous operations on lists:

f <$> fromList xs == fromList (f <$> xs)
fromList :: [a] -> Array a

I asked myself whether we could test sets by testing whether they map onto lists in a sensible way. I didn’t understand this at the time, but I was really testing a semantic function that was itself a specification for Set.

### Denotational Semantics

Concretely, consider the following equations.

-- toList :: Set a -> [a]
-- toList (1) does not produce repeats and (2) is injective

-- Semantic Function
--------------------

toList empty == []
toList (insert x s) == listInsert x (toList s)
-- listInsert x l = nub (x : l)
toList (delete x s) == listDelete x (toList s)
-- listDelete x = filter (/= x)
toList (union s1 s2) == listUnion (toList s1) (toList s2)
-- listUnion a b = nub (a ++ b)
toList (intersect s1 s2) == listIntersect (toList s1) (toList s2)
-- listIntersect a b = filter (elem a) b

-- Accessor Axioms
------------------

member x s == elem x (toList s)
size s == length (toList s)

These effectively model a set as a list with no repeats. This model is a specification for Set. In other words, we can say our Set implementation is correct if and only if it corresponds to this model of lists without repeats.

### Consequences of Denotational Semantics

Why is this so, you ask me? Well, I’ll tell you. The first five equations cover each constructor or modifier and hence fully characterise toList. We say that toList is a semantic function. Any expression using Set either is a Set or is composed of operations that use the accessors. Thus, any expression using Sets can be converted to one using lists.

Moreover, since toList is injective, we can go back and forth between expressions in the list model and expressions using Sets. For instance, we can prove that for any sets s1 and s2, insert x (union s1 s2) = union (insert x s1) s2. First we have:

toList (insert x (union s1 s2)) =                 -- Function inverses
listInsert x (toList (union s1 s2)) =             -- Semantic function
listInsert x (nub ((toList s1) ++ (toList s2))) = -- Semantic function
nub (listInsert x (toList s1) ++ (toList s2)) =   -- List properties
nub (toList (insert x s1) ++ (toList s2)) =       -- Semantic function
toList (union (insert x s1) s2) =                 -- Semantic function

Then by injectivity of toList we deduced, as expected:

insert x (union s1 s2) =
union (insert x s1) s2

This actually means that our denotational specification subsumes the axiomatic specification that we were reaching for at the beginning of the post.

Now, there’s an interesting remark to make here. Our axiomatic system had axioms that all began with an accessor. The accessors’ denotations only have toList on the right-hand side, removing the need for injectivity of the semantic function. For instance, member (insert s x) x = elem x (toList (insert s x)) which simplifies to elem x (nub (x:toList s)) which is clearly True. The injectivity of the semantic function is only needed for equalities between sets like the one in the proof above.

### Denotational Homomorphic Testing

In summary, we now have an equivalent way to give a specification of Set in terms of laws that we can property test. Namely, we could property test all the laws that define the semantic function, like toList empty = [], and the accessors axioms, like member x s = elem x (toList s), and this would test that our implementation of Set met our specification. And to boot, unlike before, we’d only write one property for each constructor, accessor or modifier. If we had $m$ accessors and $n$ constructors and modifiers, we’d need to test about $m+n$ laws.

Stated abstractly, we have the following technique. Provide a specification (à la denotational semantics) of a data structure D by modeling it by another simpler data structure T. The model consists of a semantic function sem :: D -> T and axioms that correspond constructors on D to constructors of T and accessors on D to accessors on T, e.g. acc someD = acc' (sem someD). Now, test the specification by property testing each of the sem axioms. These axioms look like homomorphisms in that operations on Ds are recursively translated to related operations on Ts, which preserves some semblance of structure (more on this below). If our property testing goes well, we have a strong way to test the implementation of D.

It’s critical that I note that providing and testing a specification for data structures is not at all a novel problem. It has been well studied in a variety of contexts. Two of the most relevant approaches to this technique are in Conal Elliot’s paper Denotational design with type class morphisms and the textbook Algebra Driven Design.

In Elliot’s paper, he outlines a design principle: provide denotational semantics to data structures such that the semantic function is a homomorphism over any type classes shared by the data structure at hand and the one mapped to. Applying this to our example, this would be like creating an IsSet type class, having an instance for Ord a => IsSet [a] and Ord a => IsSet (Set a), and verifying that if we changed each equation in the former to use toList, it would be the corresponding equation in the latter. So, for example

class IsSet s where
member :: a -> s a -> Bool
-- ... etc

-- Actual homomorphism property:
--
-- >   member x (toList s) = member x s
--

where the property to test is actually a homomorphism.

This has the same underlying structure to our approach and the type class laws that hold on the list instance of IsSet prove the laws hold on the Set instance, analogous to how facts on lists prove facts about Sets.

In Algebra Driven Design, the approach is similar. First generate axioms on a reference implementation RefSet a (which might be backed by lists) via quickspec and test those. Then, property test the homomorphism laws from Set to RefSet.

## Concluding Thoughts

To conclude, it seems to me that denotational homomorphic testing is a pretty cool way to think about and test data structures. It gives us a way to abstract a complex data structure by making a precise correspondence to a simpler one. Facts we know about the simpler one prove facts about the complex one, and yet, the correspondence is itself the specification. Hence proving the correspondence would be a rigorous formal verification. Falling a few inches below that, we can property test the correspondence because it’s in the form of a small number of laws and this in turn gives us a high degree of confidence in our implementation.

1. Think about the batman line: “It’s not who I am underneath, but what I do that defines me”.  This article is licensed under a Creative Commons Attribution 4.0 International license.