My goal today is to convince you that destination-passing style is neat, actually. And that linear types make destination-passing purely functional. But first, I must answer a question.
If you’ve ever programmed in C, C++, or Fortran, you are sure to have
encountered the style of programming which sometimes goes by the name
destination-passing style. It is the practice of writing, e.g. an
array-producing functions as, instead, taking an empty array as an
extra argument and filling it. Consider, for example, the C
char* strcpy ( char* destination, const char* source );
It copies the string in
source to the array
destination (it also
destination when it’s done).
The name “destination-passing style” itself seems to be more common in the functional programming language compilation literature, however. C programmers don’t appear to have a name for it. So it is likely that you have never encountered it.
Why, indeed, care about destination-passing? It does let you ask a new question: “whose responsibility is it to allocate the array?“. If I were to write an array copy in Haskell, it would have type
copyArray :: Array a -> Array a
And there is no way around
copyArray allocating an array itself. The
question doesn’t even exist. With
strcpy, I can either choose to
allocate an array, and pass it immediately to
strcpy, or, I can
delegate the allocation of the array to someone else.
But, once I can ask this question, what can I do with it? I can compose it! Let’s imagine that we have a function to split an array in two
splitArray :: Array a -> (Array a, Array a)
Now consider the following (admittedly not especially useful) function:
copyArray2 :: Array a -> Array a copyArray2 a = case splitArray a of (al, ar) -> copyArray al <> copyArray ar
When the question doesn’t exist, each call to
copyArray has, no matter what,
to allocate an array, which is then copied into a new array. It means
that we are making a superfluous copy of our original array,
only to discard it immediately. This is quite wasteful.
Often, you can, indeed, rely on array fusion to avoid too egregious a behaviour. Array fusion, such as implemented in the excellent vector library will eliminate a ton of intermediate allocations.
However, fusion is unreliable. Sometimes, a simple refactoring will push a function’s size beyond what GHC is willing to inline, and it will break an entire fusion pipeline. Most of the time, this is fine, but not when you are dependent on fusion happening. And if you need GHC to produce code without allocations, why not write your program directly as you want it, rather than try and coax the compiler into hopefully eliminating the allocations for you.
This has been a guiding principle in the development of the linear types project: compiler optimisations are great, as you don’t need to think about a lot of things; until you do, and you find yourself second-guessing the optimiser. When that happens, we want linear types to empower you to write the code that you mean, without sacrificing Haskell’s type safety.
Besides, in the article about F̃, a restricted array-based functional language which compiles to very efficient code, the authors find significant performance gains for using destination-passing on top of an array fusion optimisation. They only use destination-passing in the optimiser, though, not as a language feature.
Finally, fusion doesn’t always work. Suppose I rewrite my
function to use threads to better utilise my multicore architecture
copyArray3 :: Array a -> IO (Array a) copyArray3 = case splitArray a of (al, ar) -> do (bl, br) <- concurrently (evaluate $ copyArray al) (evaluate $ copyArray ar) return $ bl <> br
This is beyond a fusion framework ability to optimise. Or maybe I want to copy my array into a memory mapped buffer. The point is: fusion will do a lot for you, just not everything.
The obvious way to encode destination-passing style, in Haskell, is to
move all our computation to
ST, so that
copyArray would be
copyArray :: STArray s a -> STArray s a -> ST ()
But it’s not very congruent with how functional programmers write their programs. It does lift all of the above limitations, at the price of adding state everywhere, which is an entire error-inducing surface that functional programming usually avoids.
It’s a huge price to pay, and that’s why the vector library is not structured like this. It does feature mutable arrays, but immutable arrays are very much encouraged.
This is where linear types help. Indeed, let’s take a step back and ask: what makes a destination impure to begin with?
- If I read out a cell, then write to it, then read it again: I’ll see a different result the second time.
- If I write to the same cell twice, the writes need to be ordered, otherwise the result would be non-deterministic.
- Reading a cell which has not been initialised is non-deterministic
(though in most case, we can salvage this by initialising every cell
All of these behaviours are prohibited in pure code. But we could avoid all the prohibited behaviours if we could make sure that each cell is written to exactly once before being read. Aha! Exactly once, this is the sort of thing that linear types are good at! Ok, so let’s try again:
copyArray :: Array a -> DArray a ⊸ ()
This means that
copyArray is a pure function which uses its destination
(in its entirety) exactly once. We only need to make sure that there
is only ever a unique pointer to a destination array, which we do with
alloc :: Int -> (DArray a ⊸ ()) ⊸ Array a
A destination is allocated for the scope of the linear function. At
the end of the function, we know that the destination has been fully
filled, and so we get an array out. From this destination-passing
copyArray, by the way, it is easy to retrieve the
direct style variant:
copyArray' :: Array a -> Array a copyArray' a = alloc (length a) (\d -> copyArray a d)
The reverse, as I’ve been arguing throughout this post, is very much not true. So the destination-passing function is the more fundamental one.
Now, to be able to implement
copyArray2, we need a function which
splitDArray :: DArray a ⊸ (DArray a, DArray a)
Then, it is just a matter of following the types (the curious-looking
& \case construction is due to a limitation of the current
implementation of linear types in GHC, see here)
copyArray2 :: Array a -> DArray a ⊸ () copyArray2 a d = case splitArray a of (al, ar) -> splitDArray d & \case (dl, dr) -> copyArray al dl `lseq` copyArray ar dr
Voilà! No superfluous allocation. Not because of the optimiser, but because of the semantics of my program: it doesn’t allocate an array anywhere.
You’ll find a more complete destination array interface in the
Data.Array.Destination module of linear-base.
One of the features of linear types, is that they often allow to expose as pure interfaces objects which appear to be intrinsically impure. But I want to argue that, in the case of destinations, we’ve actually done more than this: we’ve made the interface better than the impure interface. Not because pure interfaces are better than impure interfaces (though it’s a defensible position), but because the linear destination interface is a more faithful representation of what destinations mean.
There is no longer confusion about what is an input and what is an
output: inputs are
Array, and outputs are
DArray. Destinations are
there solely for output, they can’t be used as a temporary store of
data. And the types ensure that they are fully filled, and that we
don’t accidentally overwrite an output, by the time the destination is
read back as an array.
And this is pretty neat.
If you want to go a bit deeper into this particular brand of weed, let me leave you with a handful of comments which you can take either as closing this blog post, or opening new avenues.
allocfunction takes a destination-consuming function as an argument, instead of returning a destination directly. This style is common in Linear Haskell, as a means to enforce uniqueness. It is sometimes seen as a limitation of Linear Haskell’s design. However in this particular case, the function is necessary to delimit the scope of the destination. In fact, the
allocfunction is virtually identical to that of the F̃ article, where there is no linear typing whatsoever.
Affine types (affine arguments are consumed at most once, rather than exactly once for linear arguments) are sometimes preferable to linear types. For instance affine types appear to represent streaming computations better. But in the case of destinations we really do want linear types: it wouldn’t make as much sense to return from
allocwith a partially-filled destination.
When using linear types to make a pure interface to array functions which, in fact, mutate an array for efficiency (like in this module of linear base), we lose the ability to alias the mutable array in exchange for purity. Sometimes it’s a perfectly acceptable trade-off, but some algorithms depend on sharing mutation for efficiency, these are not available with linear pure mutable arrays. We are not making such a trade-off for destinations: linear destinations, being pure output, are, arguably, a more faithful interface for destination-passing style than mutable array.
Have you noticed how in the destination-passing
copyArray2, the call to array concatenation from the direct-style implementation has been replaced by a call to
splitDArray? And, if you have, have you also noticed the symmetry between these two functions?
uncurry (<>) :: (Array a, Array a) -> Array a splitDArray :: Darray a ⊸ (DArray a, DArray a)
This is not a coincidence. There is a sort of duality between destinations and constructors. This duality happens when writing destinations for other types than array types as well. I spoke of this phenomenon at Haskell Exchange 2017.
If I define
type PushArray a = DArray a ⊸ (), then the type of
copyArray :: Array a -> DArray a ⊸ ()
can be written
copyArray :: Array a -> PushArray a
We can give
PushArrays a (restricted) array interface, then we don’t even need to abandon direct style to benefit from destinations. This is part of the Data.Array.Polarized framework in linear-base.
In a previous blog post, I had written about how linear types made it possible to manipulate compact data representation directly. The
Needtype in that blog post is, in fact, a form of destination.