inline-js: seamless JavaScript/Haskell interop

9 May 2019 — by Cheng Shao

Tweag.io has a bit of a history with language interop. By this point, we created or collaborated with others in the community on HaskellR, inline-c, inline-java, and now inline-js. The original idea for this style of interop was realized in language-c-inline by Manuel Chakravarty a few years before joining, concurrently to HaskellR. Manuel wrote a blog post about the design principles that underpin all these different libraries. Others in the community have since created similar libraries such as clr-inline, inline-rust and more. In this post, we’ll present our latest contribution to the family: inline-js.

The tagline for inline-js: program Node.js from Haskell.

A quick taste of `inline-js`

Here is a quick demo of calling the Node.js DNS Promises API to resolve a domain:

import Data.Aeson
import GHC.Generics
import Language.JavaScript.Inline

data DNSRecord = DNSRecord
  { address :: String
  , family :: Int
  } deriving (FromJSON, Generic, Show)

dnsLookup :: String -> IO [DNSRecord]
dnsLookup hostname =
  withJSSession
    defJSSessionOpts
    [block|
        const dns = (await import("dns")).promises;
        return dns.lookup($hostname, {all: true});
    |]

To run it in ghci:

*Blog> dnsLookup "tweag.io"
[DNSRecord {address = "104.31.68.163", family = 4},DNSRecord {address = "104.31.69.163", family = 4},DNSRecord {address = "2606:4700:30::681f:44a3", family = 6},DNSRecord {address = "2606:4700:30::681f:45a3", family = 6}]

We can see that the A/AAAA records of tweag.io are returned as Haskell values.

This demo is relatively small, yet already enough to present some important features described below.

The QuasiQuoters

In the example above, we used block to embed a JavaScript snippet. Naturally, two questions arise: what content can be quoted, and what’s the generated expression’s type?

block quotes a series of JavaScript statements, and in-scope Haskell variables can be referred to by prefixing their names with $. Before evaluation, we wrap the code in a JavaScript async function, and this clearly has advantages against evaluating unmodified code:

When different blocks of code share a JSSession, the local bindings in one block don’t pollute the scope of another block. And it’s still possible to add global bindings by explicitly operating on global; these global bindings will persist within the same JSSession.
We can return the result back to Haskell any time we want; otherwise we’ll need to ensure the last executed statement happens to be the result value itself, which can be tricky to get right.
Since it’s an async function, we have await at our disposal, so working with async APIs becomes much more pleasant.

When we call dnsLookup "tweag.io", the constructed JavaScript code looks like this:

;(async $hostname => {
  const dns = (await import("dns")).promises
  return dns.lookup($hostname, { all: true })
})("tweag.io").then(r => JSON.stringify(r))

As we can see, the Haskell variables are serialized and put into the argument list of the async function. Since we’re relying on FromJSON to parse the result in this case, the result of the async function is further mapped with JSON.stringify.

We also provide an expr QuasiQuoter when the quoted code is expected to be a single expression. Under the hood it adds return and reuses the implementation of block, to save a few keystrokes for the user.

Haskell/JavaScript data marshaling

The type of block’s generated expression is JSSession -> IO r, with hidden constraints placed on r. In our example, we’re returning [DNSRecord] which has a FromJSON instance, so that instance is picked up, and on the JavaScript side, JSON.stringify() is called automatically before returning the result back to Haskell. Likewise, since hostname is a String which supports ToJSON, upon calling dnsLookup, hostname is serialized to a JSON to be embedded in the JavaScript code.

For marshaling user-defined types, ToJSON/FromJSON is sufficient. This is quite convenient when binding a JavaScript function, since the ToJSON/FromJSON instances are often free due to Haskell’s amazing generics mechanism. However, there are also a few other useful non-JSON types which are supported here. These non-JSON types are:

The ByteString types in the bytestring package, including strict/lazy/short versions. It’s possible to pass a Haskell ByteString to JavaScript, which shows up as a Buffer. Going in the other direction works too.
The JSVal type which is an opaque reference to a JavaScript value, described in later sections of this post.
The () type (only as a return value), meaning that the JavaScript return value is discarded.

Ensuring the expr/block QuasiQuoters work with both JSON/non-JSON types involves quite a bit of type hackery, so we hide the relevant internal classes and it’s currently not possible for inline-js users to add new such non-JSON types.

Importing modules & managing sessions

When prototyping inline-js, we felt the need to support the importing of modules, either built-in or user-supplied ones. Currently, there are two different import mechanisms coexisting in Node.js: the old CommonJS-style require() and the new ECMAScript native import. It’s quite non-trivial to support both, and we eventually chose to support ECMAScript dynamic import() since it works out-of-the-box on both web and Node, making it more future-proof.

Importing a built-in module is straightforward: import(module_name) returns a Promise which resolves to that module’s namespace object. When we need to import npm-installed modules, we need to specify their location in the settings to initialize JSSession:

import Data.ByteString (ByteString)
import Data.Foldable
import Language.JavaScript.Inline
import System.Directory
import System.IO.Temp
import System.Process

getMagnet :: String -> FilePath -> IO ByteString
getMagnet magnet filename =
  withSystemTempDirectory "" $ \tmpdir -> do
    withCurrentDirectory tmpdir $
      traverse_
        callCommand
        ["npm init --yes", "npm install --save --save-exact [email protected]"]
    withJSSession
      defJSSessionOpts {nodeWorkDir = Just tmpdir}
      [block|
        const WebTorrent = (await import("webtorrent")).default,
          client = new WebTorrent();

        return new Promise((resolve, reject) =>
          client.add($magnet, torrent =>
            torrent.files
              .find(file => file.name === $filename)
              .getBuffer((err, buf) => (err ? reject(err) : resolve(buf)))
          )
        );
    |]

Here, we rely on the webtorrent npm package to implement a simple BitTorrent client function getMagnet, which fetches the file content based on a magnet URI and a filename. First, we allocate a temporary directory and run npm install in it; then we supply the directory path in the nodeWorkDir field of session config, so inline-js knows where node_modules is. And finally, we use the webtorrent API to perform downloading, returning the result as a Haskell ByteString.

Naturally, running npm install for every single getMagnet call doesn’t sound like a good idea. In a real world Haskell application which calls npm-installed modules with inline-js, the required modules shall be installed by the package build process, e.g. by using Cabal hooks to install to the package’s data directory, and getMagnet can use the data directory as the working directory of Node.

Now, it’s clear that all code created by the QuasiQuoters in inline-js requires a JSSession state, which can be created by newJSSession or withJSSession. There are a couple of config fields available, which allows one to specify the working directory of Node, pass extra arguments or redirect back the Node process standard error output.

How it works

Interacting with Node from Haskell

There are multiple possible methods to interact with Node in other applications, including in particular:

Whenever we evaluate some code, start a Node process to run it, and fetch the result either via standard output or a temporary file; persistent Node state can be serialized via structural cloning. This is the easiest way but also has the highest overhead.
Use pipes/sockets for IPC, with inline-js starting a script to get the code, perform evaluation and return results, reusing the same Node process throughout the session. This requires more work and has less overhead than calling Node for each call.
Use the Node.js N-API to build a native addon, and whatever Haskell application relying on inline-js gets linked with the addon, moving the program entry point to the Node side. We have ABI stability with N-API, and building a native addon is surely less troublesome than building the whole Node stack. Although the IPC overhead is spared, this complicates the Haskell build process.
Try to link with Node either as a static or dynamic library, then directly call internal functions. Given that the build system of Node and V8 is a large beast, we thought it would take a considerable amount of effort; even if it’s known to work for a specific revision of Node, there’s no guarantee later revisions won’t break it.

The current implementation uses the second method listed above. inline-js starts an “eval server” which passes binary messages between Node and the host Haskell process via a pair of pipes. At the cost of a bit of IPC-related overhead, we make inline-js capable of working with multiple installations of Node without recompiling. The schema of binary messages and implementation of “eval server” is hidden from users and thus can evolve without breaking the exposed API of inline-js.

The “eval server”

The JavaScript specification provides the eval() function, allowing a dynamically constructed code string to be run anywhere. However, it’s better to use the built-in vm module of Node.js, since it’s possible to supply a custom global object where JavaScript evaluation happens, so we can prevent the eval server’s declarations leaking into the global scope of the evaluated code, while still being able to add custom classes or useful functions to the eval server.

Once started, the eval server accepts binary requests from the host Haskell process and returns responses. Upon an “eval request” containing a piece of UTF-8 encoded JavaScript code, it first evaluates the code, expecting a Promise to be returned. When the Promise resolves with a final result, the result is serialized and returned. Given the asynchronous nature of this pipeline, it’s perfectly possible for the Haskell process to dispatch a batch of eval requests, and the eval server to process them concurrently, therefore we also export a set of “async” APIs in Language.JavaScript.Inline which decouples sending requests and fetching responses.

On the Haskell side, we use STM to implement send/receive queues, and they are accompanied by threads which perform the actual sending/receiving. All user-facing interfaces either enqueue a request or try to fetch the corresponding response from a TVar, blocked if the response is not ready yet. In this way, we make almost all exposed interfaces of inline-js thread-safe.

Marshaling data based on types

Typically, the JavaScript code sent to the eval server is generated by the QuasiQuoter’s returned code, potentially including some serialized Haskell variables in the code, and the raw binary data included in the eval response is deserialized into a Haskell value. So how are the Haskell variables recognized in quoted code, and how does the Haskell/JavaScript marshaling take place?

To recognize Haskell variables, it’s possible to simply use a simple regex to parse whatever token starting with $ and assume it’s a captured Haskell variables, yet this introduces a lot of false positives, e.g. "$not_var", where $not_var is actually in a string. So in the QuasiQuoters of inline-js, we perform JavaScript lexical analysis on quoted code, borrowing the lexer in language-javascript. After the Haskell variables are found, the QuasiQuoters generate a Haskell expression including them as free variables, and at runtime, they can be serialized as parts of the quoted JavaScript code.

To perform type-based marshaling between Haskell and JavaScript data, the simplest thing to do is solely relying on aeson’s FromJSON/ToJSON classes. All captured variables should have a ToJSON instance, serialized to JSON which is also a valid piece of ECMAScript, and whatever returned value should also have a FromJSON instance. However, there are annoying exceptions which aren’t appropriate to recover from FromJSON/ToJSON instances.

One such type is ByteString. It’s very important to be able to support Haskell ByteString variables and expect them to convert to Buffer on the Node side (or vice versa). Unfortunately, the JSON spec doesn’t have a special variant for raw binary data. While there are other cross-language serialization schemes (e.g. CBOR) that support it, they introduce heavy npm dependencies to the eval server. Therefore, a reasonable choice is: expect inline-js users to solely rely on FromJSON/ToJSON for their custom types, while also supporting a few special types which have different serialization logic.

Therefore, we have a pair of internal classes for this purpose: ToJSCode and FromEvalResult. All ToJSON instances are also ToJSCode instances, while for ByteString, we encode it with base64 and generate an expression which recovers a Buffer and is safe to embed in any JavaScript code. The FromEvalResult class contains two functions: one to generate a “post-processing” JavaScript function that encodes the result to binary on the Node side, another to deserialize from binary on the Haskell side. For the instances derived from FromJSON, the “post-processing” code is r => JSON.stringify(r), and for ByteString it’s simply r => r.

To keep the public API simple, ToJSCode and FromEvalResult are not exposed, and although type inference is quite fragile for QuasiQuoter output, everything works well as long as the relevant variables and return values have explicit type annotations.

Passing references to arbitrary JavaScript values

It’s also possible to pass opaque references to arbitrary JavaScript values between Haskell and Node. On the Haskell side, we have a JSVal type to represent such references, and when the returned value’s type is annotated to be a JSVal, on the Node side, we allocate a JSVal table slot for the result and pass the table index back. JSVal can also be included in quoted JavaScript code, and they convert to JavaScript expressions which fetch the indexed value.

Exporting Haskell functions to the JavaScript world

Finally, here’s another important feature worth noting: inline-js supports a limited form of exporting Haskell functions to the JavaScript world! For functions of type [ByteString] -> IO ByteString, we can use exportHSFunc to get the JSVal corresponding to a JavaScript wrapper function which calls this Haskell function. When the wrapper function is called, it expects all parameters to be convertible to Buffer, then sends a request back to the Haskell process. The regular response-processor Haskell thread has special logic to handle them; it fetches the indexed Haskell function, calls it with the serialized JavaScript parameters in a forked thread, then the result is sent back to the Node side. The wrapper function is async and returns a Promise which resolves once the expected response is received from the Haskell side. Due to the async nature of message processing on both the Node and Haskell side, it’s even possible for an exported Haskell function to call into Node again, and it also works the other way.

Normally, the JavaScript wrapper function is async, and async functions work nicely for most cases. There are corner cases where we need the JavaScript function to be synchronous, blocking when the Haskell response is not ready and returning the result without firing a callback. One such example is WebAssembly imports: the JavaScript embedding spec of WebAssembly doesn’t allow async functions to be used as imports since this involves the “suspending” and “resuming” of WebAssembly instance state, which might be not economical to implement in today’s JavaScript engines. Therefore, we also provide exportSyncHSFunc which makes a synchronous wrapper function to be used in such scenarios. Since it involves completely locking up the main thread in Node with Atomics, this is an extremely heavy hammer and should be used with much caution. We also lose reentrancy with this “sync mode”; when the exported Haskell function calls back into Node, the relevant request will be forever stuck in the message queue, freezing both the Haskell/Node process.

Summary

We’ve presented how inline-js allows JavaScript code to be used directly from Haskell, and explained several key aspects of inline-js internals. The core ideas are quite simple, and the potential use cases are potentially endless, given the enormous ecosystem the Node.js community has accumulated over the past few years. Even for development tasks that are not specifically tied to Node.js, it is still nice to have the ability to easily call relevant JavaScript libraries, to accelerate prototyping in Haskell and to compare correctness/performance of Haskell/JavaScript implementations.

There are still potential improvements to make, e.g. implementing type-based exporting of Haskell functions. But we decided that now is a good time to announce the framework and collect some first-hand user experience, spot more bugs and hear user opinions on how it can be improved. When we get enough confidence from the feedback of seed users, we can prepare an initial Hackage release. Please spread the word, make actual stuff with inline-js and tell us what you think :)

About the author

Cheng Shao

Cheng is a Software Engineer who specializes in the implementation of functional programming languages. He is the project lead and main developer of Tweag's Haskell-to-WebAssembly compiler project codenamed Asterius. He also maintains other Haskell projects and makes contributions to GHC(Glasgow Haskell Compiler). Outside of work, Cheng spends his time exploring Paris and watching anime.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← The Sneakernet: Towards A Much Faster Internet Ormolu: Format Haskell code like never before →