Writing a test suite is a crucial step when developing highassurance software. Yet, identifying meaningful properties of the system and giving it suitable inputs may be hard. This is especially challenging for code involving state — besides properties and inputs, one must also take into account the state of the system at all times.
It is inconvenient to construct states suitable for testing by hand. An alternative is to build sequences of inputs that can be iterated over the initial state, placing the system in a state we want to test. These sequences of inputs are called traces.
For these traces to be useful for testing, it must be possible to:
 report traces to the user in an informative way when some postcondition fails
 modify traces by changing some or all inputs sent in them
While (1) is fairly simple to accomplish, (2) requires significantly more advanced techniques.
In this post I will present a technique to do testing using traces which can can be used with arbitrary stateful systems. They were applied in practice in cookedvalidators, which was developed as a tool for writing Plutus validator scripts.
Context: auditing smart contracts
During this post I will use smart contracts as a guiding example, but no specific knowledge on them is required. The unfamiliar reader can think of smart contracts as arbitrary code whose state is stored in the blockchain, and which is triggered by transactions.
While auditing Plutus smart contracts, we want to write tests
using a function validateTx
that pretends to run a transaction in a blockchain.
Its type is:
validateTx :: (Monad m) => Tx > StateT BlockChainSt (ExceptT BlockChainErr m) TxId
Smart contracts will live somewhere in the state, of type BlockChainSt
, and their
implementation gets to decide whether some transaction is valid or not.
If a contract rejects a transaction, validateTx
will throw an error.
If all contracts involved in a transaction agree, the overall blockchain
state changes and we return the transaction id of that particular transaction.
In this context, a typical test consists of generating and executing a sequence of transactions then checking some predicate on the resulting state. For example, say that we are simulating a crowdfundinglike contract. One property we expect to hold is that the owner can withdraw the funds if and only if the threshold has been reached. To check that, we can write a number of traces that reach the threshold, execute them and ensure the owner can withdraw the funds. We might also want to transform those traces that succeed with certain attacks or faults and ensure they fail.
An Appetizer
Lets start by defining a convenient type synonym which represents computations that manipulate the blockchain:
type DirectT m = StateT BlockChainSt (ExceptT BlockChainErr m)
Coming back to the crowdfunding example, we might want to express a trace
that opens a crowdfunding vault with a threshold of 10000 units, then deposits 8000 units,
followed by depositing an additional 5000 units, and finally attempts withdraw, which should succeed.
Using monadic syntax, this could be written as example
below:
example :: (Monad m) => DirectT m ()
example = void $ do
vaultId < validateTx (openCrowdFundingWithLimit 10000)
validateTx (fundVault vaultId 8000)
validateTx (fundVault vaultId 5000)
validateTx (withdrawVault vaultId)
Here, our example uses the transaction id from the first transaction to identify a particular vault
but ignores the other TxId
s. To witness that example
succeeds, we can run it from the initial state and
witness that it does not return an error. What if it doesn’t succeed, though?
As a user, we would like to see a description of all the transactions that led to
the failure, so we can diagnose any potential issue.
Even better, say we want to express that if either funding transaction above happened to fund 3001 units less, the trace should fail: the transactions deposit a total of 13000 units into the vault, if we take more than 3000 units from anywhere then the threshold won’t be met and the trace should fail. In fact, we want to be able to write something like:
example' = somewhere (subtractValFromFundVault 3001) example
The somewhere
modality affects specific transactions over
a trace, yielding a set of traces and enabling us to
inject faults and attacks. In example'
above, subtractValFromFundVault
has type Tx > Maybe Tx
. Say we defined it to return Just
if and only if it is applied to
a fundVault
transaction. In that case, example'
represents the
following two traces:

First trace
openCrowdFunding
fundVault vault 4999
fundVault 5000
withdraw

Second trace
openCrowdFunding
fundVault vault 8000
fundVault 1999
withdraw
The original example
trace is not present because somewhere
applies the modifier to exactly one transaction in the trace, and the
original transaction has the modifier applied nowhere.
Finally, note how the monadic structure ensures
that we can write dynamic traces: we can use a generated vaultId
to validate subsequent transactions. More generally, traces often
depend on values that are returned from previous transactions.
One such case that shows up in practice all the time is the
creation of NFTs, which often requires some hash of some previously submitted
transaction id.
We have just covered the essentials of cookedvalidators
.
If you are only interested in using our library, you can continue
with the tutorials and examples in the repository.
In the rest of the post I will go over some of the more technical internal details.
Free Monads and Staging
So how does this work? The core idea is to have two monads:
DirectT
, that executes operations directly but doesn’t supportsomewhere
Staged
, that is an AST which can be modified, and interpreted intoDirectT
In order to be able to choose which monad we want, we abstract the
core functionality we need in a type class called MonadBlockChain
and write all of our functions with type: (MonadBlockChain m) => ... > m res
.
For simplicity, let’s assume there are two base
functions needed: validating a transaction and returning some id if
validation succeeds, and a function to lookup the resources that
belong to a certain address. We would then define MonadBlockChain
as
follows:
class (Monad m) => MonadBlockChain m where
validateTx :: Tx > m TxId
lkupResources :: Address > m [Resource]
instance (Monad m) => MonadBlockChain (DirectT m) where
...
instance MonadBlockChain Staged where
...
The idea behind Staged
is to reify the base operations, declared in MonadBlockChain
, into a GADT:
data Op :: * > * where
ValidateTx :: Tx > Op TxId
LkupResources :: Address > Op [Resources]
Then we want an interpretation of Op
in terms of the DirectT
monad, called interpretOp
:
interpretOp :: (Monad m) => Op a > DirectT m a
interpretOp (ValidateTx tx) = validateTx tx
interpretOp (LkupResources addr) = lkpResources addr
With that at hand, we need to be able to construct values of type Op
and chain
them together. For this, we can use something like the Free Monad
to build up an AST using operations in
Op
. Unfortunately, Op
above is not a functor, hence, we cannot
just take the free monad of Op
. We must construct the
operational monad of Op
, which is also called the Freer monad:
data Freer (op :: Type > Type) :: Type > Type where
Return :: a > Freer a
Bind :: op a > (a > Freer b) > Freer b
Our Staged
monad then is just a particular instantiation of Freer
:
type Staged = Freer Op
With that setup, we can now write a first interpreter for Staged
. Let’s warm
up with a trivial interpreter which only interprets returns and binds in the
target monad:
interpret :: (Monad m) => Staged a > DirectT m a
interpret (Return a) = return a
interpret (Bind op cont) = interpretOp op >>= interpret . cont
Constructing the AST
All that is left for us in order to use DirectT
and Staged
interchangeably is
the instance for MonadBlockChain Staged
, which is easy to write:
instance MonadBlockChain Staged where
validateTx tx = Bind (ValidateTx tx) Return
lkupResource addr = Bind (LkupResource addr) Return
Probabilistic traces for propertybased testing
Up until now we can build ASTs with Staged
which represent traces
and can be interpreted with DirectT
to witness their success or failure.
That is good enough for unit testing, but we want to have propertybased tests.
The GenT
transformer, is the key to achieve this, enabling us to
write probabilistic traces:
instance (MonadBlockChain m) => MonadBlockChain (GenT m) where
validateTx = lift . validateTx
lkupResource = lift . lkupResource
We can rewrite our original example as a polymorphic, propertybased
test. The test will exercise the property that for all threshold
and
x < threshold
, if we open a vault for threshold
units then issue
two transactions depositing and arbitrary x
and threshold  x
units, we must be able to withdraw the funds.
example :: (MonadBlockChain m) => GenT m ()
example = void $ do
threshold < choose (5000, 15000)
vaultId < validateTx (openCrowdFundingWithLimit threshold)
x < choose (1, threshold  1)
validateTx (fundVault vaultId x)
validateTx (fundVault vaultId (threshold  x))
validateTx (withdrawVault vaultId)
Here, example
can be thought of as a probability distribution of traces.
Calling runGenT example :: Staged ()
samples that probability distribution, yielding
one trace at random that picks a particular instantiation of threshold
and x
.
We tie everything together with a simple forAllTr
combinator which
samples a distribution of traces, then interprets the random trace
and applies a predicate to its result:
forAllTr :: GenT Staged a > (Either BlockChainErr (a, BlockChainSt) > Property) > Property
forAllTr pTr prop = forAllBlind (runGenT pTr) $ \tr > prop (runDirect (interpret tr) st0)
A test that checks that all traces in the distribution example
above succeed can
then be written as:
exampleOk :: Property
exampleOk = forAllTr example (property . isRight)
Up to this point we have addressed our first requirement only:
express traces for propertybased testing. However, we did it in a way that
will enable us to address the second and third requirements: inform the
user if something goes wrong and modify traces with combinators like somewhere
.
What went wrong? Returning traces
If the exampleOk
test fails at any point, the programmer has no idea of what went wrong.
To improve this, we must change our interpret
function to inspect and record the transactions
being validated. This can be done with a modified version of forAllTr
:
interpret :: (Monad m) => Staged a > DirectT (WriterT [Tx] m) a
interpret (Return a) = return a
interpret (Bind op cont) = tellIfTx op >> interpretOp op >>= interpret . cont
where tellIfTx (ValidateTx tx) = tell [tx]
tellIfTx _ = return ()
forAllTr :: GenT Staged a > (Either BlockChainErr (a, BlockChainSt) > Property) > Property
forAllTr pTr prop = forAllBlind (runGenT pTr) $ \tr
> let (res, descr) = runDirect (interpret tr) st0
in counterexample (show descr) $ prop res
The type DirectT (WriterT [Tx] m) a
is isomorphic to BlockChainSt > (Either BlockChainErr (a, BlockChainSt), [Tx])
,
which gives us the sequence of transactions that yielded the given result, even if the result is
actually an error, which is important.
Note that we could not just write a function observe :: Staged a > [Tx]
,
which extracts all the transactions used in a Staged
trace. This would be
possible if we relied on Applicative
instead of Monad
. Yet, in our case Monad
is necessary because
we may need the result of previous computations in order to construct
the next transaction (think of hashes or signatures, for instance). This happens constantly in practice.
Modifying traces: adding modalities
At this point, we can write probabilistic traces and have a description of what happened if something goes wrong, but we still cannot modify transactions within traces. To accomplish that we will take inspiration in modal logic and define two dual functions:
somewhere :: (Tx > Maybe Tx) > Staged a > Staged a
everywhere :: (Tx > Maybe Tx) > Staged a > Staged a
These combinators take a partial function: this way they are able not to affect certain
transactions. Some transformations might only make sense for some transactions.
Additionally, the semantics of Staged
will be enriched. We now need:
interpret :: Staged a > DirectT (WriterT TraceDescr []) a
which is equivalent to the type Staged a > BlockChainSt > [(Either BlockChainErr (a, BlockChainSt), TraceDescr)]
.
A Staged
AST now denotes a set of possible executions in
the DirectT
monad, each of which has its own description of which
transactions led to that particular outcome.
The everywhere f tr
combinator maps a transformation f
to every
transaction tx
in every outcome of tr
such that isJust (f tx)
.
Its dual, somewhere f tr
will fork new outcomes from tr
such that
in each one, f
was applied to exactly one transaction tx
such that
isJust (f tx)
. To illustrate this, let a, a', b, b'
and c
be
arbitrary, but different, transactions. Now we define a
transformation:
f :: Tx > Maybe Tx
f a = Just a'
f b = Just b'
f _ = Nothing
Interpreting everywhere f (mapM_ validateTx [a,b,c])
should be equivalent to:
\st > [ runDirect (validateTx a' >> validateTx b' >> validateTx c) st ]
That is, it returns the same set of traces, but with all possible transactions modified
by f
. Those that f
returned Nothing
are left untouched.
Interpreting somewhere f (mapM_ validateTx [a,b,c])
should be equivalent to:
\st > [ runDirect (validateTx a' >> validateTx b >> validateTx c) st
, runDirect (validateTx a >> validateTx b' >> validateTx c) st
]
The final definition of interpret
gets a little
involved, hence we will omit it here but we invite the interested
reader to go check its source. The central idea in implementing it is
maintaining a set of modalities that are yet to be consumed in the
current branch. For instance, if we are interpreting a trace
somewhere f tr
, we collect Somewhere f
as a modality that needs to
be consumed and continue interpreting tr
. Say we are interpreting
Bind (ValidateTx tx) rest
with two modalities to be consumed,
[Somewhere f, Everywhere g]
. We can either consume the Somewhere
now, applying f
to tx
, or later, within rest
. Yet, we must consume
and preserve the Everywhere g
in either case.
Closing thoughts
The combination of techniques presented in this post address an
important obstacle in writing good test suites over code involving a state monad.
In practice, they enable us to write complex traces for
Plutus contracts, and we can use somewhere
and
everywhere
to exhaustively attempt to execute an attack
within some trace.
Having combinators is preferable to generating traces with hardcoded transformations.
The resulting code is easier to understand and we get to write less code.
A trace somewhere f tr
doesn’t need to be fixed if something in tr
changes.
Moreover, we only have to maintain tr
, and not a family of variations of tr
that
applies f
to its different transactions. In general, it is useful to automatically rule out
certain kinds of bad traces, which can be obtained as modifications over good traces
through the injection of attacks and faults.