Writing a test suite is a crucial step when developing high-assurance software. Yet, identifying meaningful properties of the system and giving it suitable inputs may be hard. This is especially challenging for code involving state — besides properties and inputs, one must also take into account the state of the system at all times.
It is inconvenient to construct states suitable for testing by hand. An alternative is to build sequences of inputs that can be iterated over the initial state, placing the system in a state we want to test. These sequences of inputs are called traces.
For these traces to be useful for testing, it must be possible to:
- report traces to the user in an informative way when some post-condition fails
- modify traces by changing some or all inputs sent in them
While (1) is fairly simple to accomplish, (2) requires significantly more advanced techniques.
In this post I will present a technique to do testing using traces which can can be used with arbitrary stateful systems. They were applied in practice in cooked-validators, which was developed as a tool for writing Plutus validator scripts.
Context: auditing smart contracts
During this post I will use smart contracts as a guiding example, but no specific knowledge on them is required. The unfamiliar reader can think of smart contracts as arbitrary code whose state is stored in the blockchain, and which is triggered by transactions.
While auditing Plutus smart contracts, we want to write tests
using a function
validateTx that pretends to run a transaction in a blockchain.
Its type is:
validateTx :: (Monad m) => Tx -> StateT BlockChainSt (ExceptT BlockChainErr m) TxId
Smart contracts will live somewhere in the state, of type
BlockChainSt, and their
implementation gets to decide whether some transaction is valid or not.
If a contract rejects a transaction,
validateTx will throw an error.
If all contracts involved in a transaction agree, the overall blockchain
state changes and we return the transaction id of that particular transaction.
In this context, a typical test consists of generating and executing a sequence of transactions then checking some predicate on the resulting state. For example, say that we are simulating a crowdfunding-like contract. One property we expect to hold is that the owner can withdraw the funds if and only if the threshold has been reached. To check that, we can write a number of traces that reach the threshold, execute them and ensure the owner can withdraw the funds. We might also want to transform those traces that succeed with certain attacks or faults and ensure they fail.
Lets start by defining a convenient type synonym which represents computations that manipulate the blockchain:
type DirectT m = StateT BlockChainSt (ExceptT BlockChainErr m)
Coming back to the crowdfunding example, we might want to express a trace
that opens a crowdfunding vault with a threshold of 10000 units, then deposits 8000 units,
followed by depositing an additional 5000 units, and finally attempts withdraw, which should succeed.
Using monadic syntax, this could be written as
example :: (Monad m) => DirectT m () example = void $ do vaultId <- validateTx (openCrowdFundingWithLimit 10000) validateTx (fundVault vaultId 8000) validateTx (fundVault vaultId 5000) validateTx (withdrawVault vaultId)
Here, our example uses the transaction id from the first transaction to identify a particular vault
but ignores the other
TxIds. To witness that
example succeeds, we can run it from the initial state and
witness that it does not return an error. What if it doesn’t succeed, though?
As a user, we would like to see a description of all the transactions that led to
the failure, so we can diagnose any potential issue.
Even better, say we want to express that if either funding transaction above happened to fund 3001 units less, the trace should fail: the transactions deposit a total of 13000 units into the vault, if we take more than 3000 units from anywhere then the threshold won’t be met and the trace should fail. In fact, we want to be able to write something like:
example' = somewhere (subtractValFromFundVault 3001) example
somewhere modality affects specific transactions over
a trace, yielding a set of traces and enabling us to
inject faults and attacks. In
Tx -> Maybe Tx. Say we defined it to return
Just if and only if it is applied to
fundVault transaction. In that case,
example' represents the
following two traces:
- First trace
fundVault vault 4999
- Second trace
fundVault vault 8000
example trace is not present because
applies the modifier to exactly one transaction in the trace, and the
original transaction has the modifier applied nowhere.
Finally, note how the monadic structure ensures
that we can write dynamic traces: we can use a generated
to validate subsequent transactions. More generally, traces often
depend on values that are returned from previous transactions.
One such case that shows up in practice all the time is the
creation of NFTs, which often requires some hash of some previously submitted
We have just covered the essentials of
If you are only interested in using our library, you can continue
with the tutorials and examples in the repository.
In the rest of the post I will go over some of the more technical internal details.
Free Monads and Staging
So how does this work? The core idea is to have two monads:
DirectT, that executes operations directly but doesn’t support
Staged, that is an AST which can be modified, and interpreted into
In order to be able to choose which monad we want, we abstract the
core functionality we need in a type class called
and write all of our functions with type:
(MonadBlockChain m) => ... -> m res.
For simplicity, let’s assume there are two base
functions needed: validating a transaction and returning some id if
validation succeeds, and a function to lookup the resources that
belong to a certain address. We would then define
class (Monad m) => MonadBlockChain m where validateTx :: Tx -> m TxId lkupResources :: Address -> m [Resource] instance (Monad m) => MonadBlockChain (DirectT m) where ... instance MonadBlockChain Staged where ...
The idea behind
Staged is to reify the base operations, declared in
MonadBlockChain, into a GADT:
data Op :: * -> * where ValidateTx :: Tx -> Op TxId LkupResources :: Address -> Op [Resources]
Then we want an interpretation of
Op in terms of the
DirectT monad, called
interpretOp :: (Monad m) => Op a -> DirectT m a interpretOp (ValidateTx tx) = validateTx tx interpretOp (LkupResources addr) = lkpResources addr
With that at hand, we need to be able to construct values of type
Op and chain
them together. For this, we can use something like the Free Monad
to build up an AST using operations in
Op above is not a functor, hence, we cannot
just take the free monad of
Op. We must construct the
operational monad of
Op, which is also called the Freer monad:
data Freer (op :: Type -> Type) :: Type -> Type where Return :: a -> Freer a Bind :: op a -> (a -> Freer b) -> Freer b
Staged monad then is just a particular instantiation of
type Staged = Freer Op
With that setup, we can now write a first interpreter for
Staged. Let’s warm
up with a trivial interpreter which only interprets returns and binds in the
interpret :: (Monad m) => Staged a -> DirectT m a interpret (Return a) = return a interpret (Bind op cont) = interpretOp op >>= interpret . cont
Constructing the AST
All that is left for us in order to use
Staged interchangeably is
the instance for
MonadBlockChain Staged, which is easy to write:
instance MonadBlockChain Staged where validateTx tx = Bind (ValidateTx tx) Return lkupResource addr = Bind (LkupResource addr) Return
Probabilistic traces for property-based testing
Up until now we can build ASTs with
Staged which represent traces
and can be interpreted with
DirectT to witness their success or failure.
That is good enough for unit testing, but we want to have property-based tests.
GenT transformer, is the key to achieve this, enabling us to
write probabilistic traces:
instance (MonadBlockChain m) => MonadBlockChain (GenT m) where validateTx = lift . validateTx lkupResource = lift . lkupResource
We can rewrite our original example as a polymorphic, property-based
test. The test will exercise the property that for all
x < threshold, if we open a vault for
threshold units then issue
two transactions depositing and arbitrary
threshold - x
units, we must be able to withdraw the funds.
example :: (MonadBlockChain m) => GenT m () example = void $ do threshold <- choose (5000, 15000) vaultId <- validateTx (openCrowdFundingWithLimit threshold) x <- choose (1, threshold - 1) validateTx (fundVault vaultId x) validateTx (fundVault vaultId (threshold - x)) validateTx (withdrawVault vaultId)
example can be thought of as a probability distribution of traces.
runGenT example :: Staged () samples that probability distribution, yielding
one trace at random that picks a particular instantiation of
We tie everything together with a simple
forAllTr combinator which
samples a distribution of traces, then interprets the random trace
and applies a predicate to its result:
forAllTr :: GenT Staged a -> (Either BlockChainErr (a, BlockChainSt) -> Property) -> Property forAllTr pTr prop = forAllBlind (runGenT pTr) $ \tr -> prop (runDirect (interpret tr) st0)
A test that checks that all traces in the distribution
example above succeed can
then be written as:
exampleOk :: Property exampleOk = forAllTr example (property . isRight)
Up to this point we have addressed our first requirement only:
express traces for property-based testing. However, we did it in a way that
will enable us to address the second and third requirements: inform the
user if something goes wrong and modify traces with combinators like
What went wrong? Returning traces
exampleOk test fails at any point, the programmer has no idea of what went wrong.
To improve this, we must change our
interpret function to inspect and record the transactions
being validated. This can be done with a modified version of
interpret :: (Monad m) => Staged a -> DirectT (WriterT [Tx] m) a interpret (Return a) = return a interpret (Bind op cont) = tellIfTx op >> interpretOp op >>= interpret . cont where tellIfTx (ValidateTx tx) = tell [tx] tellIfTx _ = return () forAllTr :: GenT Staged a -> (Either BlockChainErr (a, BlockChainSt) -> Property) -> Property forAllTr pTr prop = forAllBlind (runGenT pTr) $ \tr -> let (res, descr) = runDirect (interpret tr) st0 in counterexample (show descr) $ prop res
DirectT (WriterT [Tx] m) a is isomorphic to
BlockChainSt -> (Either BlockChainErr (a, BlockChainSt), [Tx]),
which gives us the sequence of transactions that yielded the given result, even if the result is
actually an error, which is important.
Note that we could not just write a function
observe :: Staged a -> [Tx],
which extracts all the transactions used in a
Staged trace. This would be
possible if we relied on
Applicative instead of
Monad. Yet, in our case
Monad is necessary because
we may need the result of previous computations in order to construct
the next transaction (think of hashes or signatures, for instance). This happens constantly in practice.
Modifying traces: adding modalities
At this point, we can write probabilistic traces and have a description of what happened if something goes wrong, but we still cannot modify transactions within traces. To accomplish that we will take inspiration in modal logic and define two dual functions:
somewhere :: (Tx -> Maybe Tx) -> Staged a -> Staged a everywhere :: (Tx -> Maybe Tx) -> Staged a -> Staged a
These combinators take a partial function: this way they are able not to affect certain
transactions. Some transformations might only make sense for some transactions.
Additionally, the semantics of
Staged will be enriched. We now need:
interpret :: Staged a -> DirectT (WriterT TraceDescr ) a
which is equivalent to the type
Staged a -> BlockChainSt -> [(Either BlockChainErr (a, BlockChainSt), TraceDescr)].
Staged AST now denotes a set of possible executions in
DirectT monad, each of which has its own description of which
transactions led to that particular outcome.
everywhere f tr combinator maps a transformation
f to every
tx in every outcome of
tr such that
isJust (f tx).
somewhere f tr will fork new outcomes from
tr such that
in each one,
f was applied to exactly one transaction
tx such that
isJust (f tx). To illustrate this, let
a, a', b, b' and
arbitrary, but different, transactions. Now we define a
f :: Tx -> Maybe Tx f a = Just a' f b = Just b' f _ = Nothing
everywhere f (mapM_ validateTx [a,b,c]) should be equivalent to:
\st -> [ runDirect (validateTx a' >> validateTx b' >> validateTx c) st ]
That is, it returns the same set of traces, but with all possible transactions modified
f. Those that
Nothing are left untouched.
somewhere f (mapM_ validateTx [a,b,c]) should be equivalent to:
\st -> [ runDirect (validateTx a' >> validateTx b >> validateTx c) st , runDirect (validateTx a >> validateTx b' >> validateTx c) st ]
The final definition of
interpret gets a little
involved, hence we will omit it here but we invite the interested
reader to go check its source. The central idea in implementing it is
maintaining a set of modalities that are yet to be consumed in the
current branch. For instance, if we are interpreting a trace
somewhere f tr, we collect
Somewhere f as a modality that needs to
be consumed and continue interpreting
tr. Say we are interpreting
Bind (ValidateTx tx) rest with two modalities to be consumed,
[Somewhere f, Everywhere g]. We can either consume the
tx, or later, within
rest. Yet, we must consume
and preserve the
Everywhere g in either case.
The combination of techniques presented in this post address an
important obstacle in writing good test suites over code involving a state monad.
In practice, they enable us to write complex traces for
Plutus contracts, and we can use
everywhere to exhaustively attempt to execute an attack
within some trace.
Having combinators is preferable to generating traces with hardcoded transformations.
The resulting code is easier to understand and we get to write less code.
somewhere f tr doesn’t need to be fixed if something in
Moreover, we only have to maintain
tr, and not a family of variations of
f to its different transactions. In general, it is useful to automatically rule out
certain kinds of bad traces, which can be obtained as modifications over good traces
through the injection of attacks and faults.