In an earlier blog post, we discussed how streaming libraries help writing composable programs without lazy I/O. We showed, for a simple program, how using a streaming library helps reducing the amount of unaided bookkeeping that the programmer needs to do in order to make it correct. In this post we delve further in that direction by considering linear types and uncover their potential to have the compiler enforce more properties statically.
Streaming libraries provide abstractions to manipulate sequences of values which may not reside fully in memory, but which may be produced progressively as they are consumed. This ability is essential to write programs which run in bounded space despite of handling inputs of any size (so-called streaming programs).
Streaming libraries fall short of giving you an airtight guarantee that programs are streaming or correct. There are side conditions that are required for proper use. For instance, for the streaming package we have:
- no length-unbounded stream must be fed to any function that loads it completely into memory like SB.toStrict,
- the resources from where streams are read or written to need to be closed “soon enough” to prevent open handles from accumulating,
- no stream shall be used after the resources it depends upon have been released,
- and no stream shall be used a second time.
The last condition is necessary because otherwise the effects producing the contents of the stream would be replayed yielding an undefined outcome. This is left implicit in the documentation of most streaming libraries. Suppose we use fromHandle:
-- | Read Strings from a Handle using hGetLine. Terminates on end of input. fromHandle :: MonadIO m => Handle -> Stream (Of String) m ()
In the following code,
s is regarded as a
Stream producing the strings
"c" as they are read from the file
We want to accumulate the stream elements in two lists
and compare them, expecting them to be the same. However, the two
toList_ s yield different results actually, because
the meaning we have given to
s is reasonable only while
s hasn’t been evaluated before.
In this case, extracting elements from
s causes the source handle
to read a file. The first evaluation of
toList_ s reads the whole
file. When we reach the second evaluation of
toList_ s, the
source handle has reached the end of the file, and we get the empty
writeFile "stream.txt" "a\nb\nc\n" h <- openFile "stream.txt" ReadMode s <- fromHandle h xs1 <- Streaming.Prelude.toList_ s xs2 <- Streaming.Prelude.toList_ s return (xs1 == xs2)
This is not a problem of streaming libraries exclusively. If using lazy
ByteStrings, passing the input to another
function would mean that its contents might be retained until all
functions are evaluated, making the program non-streaming.
Arguably, we could refine the meaning of
s so we say instead that
it is some sort of state machine that reads lines from a file. And
then the functions we write on streams need to document in which state
the streams are left when they return. But in practice, the norm is to
regard a stream reference as invalid once it has been inspected, and if
one element has been extracted, processing should continue on the tail
of the stream. Could the compiler help ensuring that we don’t use
invalid stream references?
With linear types we introduce multiplicity of values; not only does the type system track what sort of values we have, but also to which extent they are used. A linear argument has to be used — or consumed — exactly once in a function, or the program will not compile. Both forgetting to use a linear value or using it more than once will lead to a compile time error. This simple concept helps us move a new category of programming errors from the responsibility of the programmer to the compiler, making it easier to write correct code with stronger guarantees about the implementation.
Linearity is a contract with the implementer of a function — never the caller. This means that even if a function has linearity constraints it can always be called without knowledge of this, with linear or unconstrained types of arguments. The converse is of course not true: if a function type does not guarantee linearity in an argument, it is an error to pass it a linear value as such because it might be duplicated or discarded in the function.
There are a few aspects where linear types can help managing streams. Firstly, ensuring that the effects of a stream are performed at most once. Secondly, ensuring that linear values contained in the stream are eventually used. And finally, they can help ensuring that no stream is used after the resources it depends upon have been released.
The type of streams is kept as
Stream f m r
were the type parameters
have almost the same meaning as for unrestricted streams. Most functions
in the new interface expect
m to be an instance of LMonad and
to be an instance of LFunctor.
LMonad we can introduce new streams as linear values in our
LMonad is a class offering methods
>>= similar to those of
Monad class, but where arguments have been changed to have a
class LMonad m where return :: a ->. m a (>>=) :: m a ->. (a ->. m b) ->. m b
a ->. b stands for a linear function.
For streams, the type of
>>= enforces that a stream reference is
used linearly in the continuation. The consequence is that when
we use a stream reference more than once, like we use
s in the
example above, this breaks the linearity constraints from
>>=, and the
error is caught at compile time!
Moreover, the handle
h can’t be closed before invoking
fromHandle or the
typechecker would notice that the handle is used more than once.
Linear vs affine streams
Linear types ensure both that the effects are not duplicated and the resources are properly freed. Do we really need the latter? After all, in the previous discussion, we are only concerned with our stream references not being used more than once. It turns out that sometimes we want to carry linear values inside streams, and we can only achieve this if the stream references are linear.
Suppose we are writing a program which is written both in Java and
Haskell. Let us suppose further that, on the Java side, we have an iterator
java.util.Iterator<Object>. To use it on the Haskell side, we
would like to marshal it to Haskell as a stream of type
Stream (Of JObject) m (), where
m is a linear monad and
the Haskell type of references to Java objects.
As we explain in
an earlier post,
we want to treat Java references linearly to make sure that they are
promptly deleted when they are no longer used, hence the requirement
to have the parameter
m be a linear monad.
For a concrete example, consider a function that modifies the values produced by a Java iterator. The Java iterator is first marshaled to a Haskell stream, then we apply a function to each element in the stream, and finally we marshal the stream back to another Java iterator.
-- IOL is a linear version of the IO monad. instance LMonad IOL where ... -- JIterator stands for a reference to an iterator on the Java side. mapIterator :: (JObject ->. JObject) -> JIterator ->. IOL JIterator mapIterator f jiterator = iteratorToStream jiterator >>= streamToIterator . linearMap f where iteratorToStream :: JIterator ->. Stream (Of JObject) IOL () streamToIterator :: Stream (Of JObject) IOL () ->. IOL JIterator linearMap :: LMonad m => (a ->. b) -> Stream (Of a) m r ->. Stream (Of b) m r
Thanks to linear types, the compiler can check that the
reference is deleted (inside
iteratorToStream). It can furthermore check
that the references produced by the intermediate streams are
deleted as the resulting iterator is consumed in the Java side.
There is a price to pay for requiring stream references
to be linear. Some operations which are trivial to offer in an
unrestricted setting are not possible to implement anymore.
Consider the function
take, for instance.
take :: LMonad m => Stream (Of a) m r -> Stream (Of a) m ()
If we had a stream
s which produces a sequence of linear values, the
take 0 s would cause all of the values to be dropped, which
the compiler would not stand.
We have shown how linear types can prevent some forms of mistakes when writing streaming programs. This translates in fewer side conditions for the programmer to check, since these can be discharged by the type checker. Moreover, linear streams can carry linear values which allows using them in combination with other resources that need to be treated linearly.
A few questions remain open. For instance, it has to be seen if it is practical to have a single implementation of streams that can be used as linear, affine or unrestricted depending on the context. Otherwise we might end up with three similar implementations where we would prefer to avoid the code duplication. Another question is identifying good use cases for which affine streams would be a good fit but not linear streams.
At this time, the GHC proposal for linear types is still under discussion and a complete implementation of linear streams still has a little way to go before becoming a reality. For those interested, there is a prototype of GHC which implements linear types. Related to linear streams, there was a Summer of Haskell project where Edvard Hübinette worked with this prototype and the streaming package. Several interesting findings are documented in the tech blog.