Threads and messages with Rust and WebAssembly

24 November 2022 — by Joe Neeman

On most systems, you can implement concurrency using either threads or processes, where the main difference between the two is that threads share memory and processes don’t. Modern web browsers support concurrency through the Web Workers API. Although Web Workers are by default closer to a multi-process model, when used with WebAssembly you can opt-in to a more thread-like experience. Just like in systems programming, the choice of threads vs. processes comes with various trade-offs and performance implications; I’ll be covering some of them in this post. These examples will be in Rust, but similar trade-offs should apply to other languages compiled to WASM.

The Web Workers API (multi-processing on the web)

When used from JavaScript, the Web Workers API is very simple: call new Worker("/path/to/worker.js") and your browser will fetch worker.js and start running it concurrently. Inter-worker communication works in a very JavaScripty way, by setting message handler callbacks and then sending messages. To use Web Workers from compiled WASM code, you’ll need to go “through” JavaScript: you need a little JavaScript glue for spawning the worker, and you need to do the message sending and callback handling using some JavaScript bindings. Here’s a little example that spawns a worker, sends a message, and gets a reply:

// Spawn a worker and communicate with it.
fn spawn_worker() {
  let worker = web_sys::Worker::new("./worker.js");
  let callback = wasm_bindgen::Closure<FnMut(web_sys::MessageEvent)>::new(|msg| {
    assert_eq!(msg.data.as_f64(), Some(2.0));
  }));
  // Set up a callback to be invoked whenever we receive a message from the worker.
  // .as_ref().unchecked_ref() turns a wasm_bindgen::Closure into a &js_sys::Function
  worker.set_onmessage(callback.as_ref().unchecked_ref());

  // Send a message to the worker.
  worker.post_message(&JsValue::from(1.0)).expect("failed to post");

  // Did you notice that `set_onmessage` took a borrow? We still own `callback`, and we'd
  // better not free it too soon! See also
  // https://rustwasm.github.io/wasm-bindgen/reference/weak-references.html
  std::mem::forget(callback); // FIXME: memory management is hard
}

// An entry point for the JavaScript worker to call back into WASM.
#[wasm_bindgen]
pub fn worker_entry_point(arg: i32) {
  // Add 1 to our argument and send it back to the main thread.
  // Yeah, the js_sys/web_sys bindings are ... low-level.
  js_sys::global()
    .dyn_into::<web_sys::DedicatedWorkerGlobalScope>()
    .unwrap()
    .post_message(&JsValue::from(arg + 1))
    .unwrap();
}

And here’s the JavaScript glue code in worker.js, which receives messages and calls worker_entry_point:

importScripts("./path/to/wasm_bindgen/module.js")
self.onmessage = async event => {
  const { child_entry_point } = await wasm_bindgen(
    "./path/to/wasm_bindgen/module_bg.wasm"
  )
  worker_entry_point(Number(event.data))
}

Note that when using the Web Workers API, all of the messages you send are JsValues. This is fine for sending primitive types, but it becomes annoying if you want to send structured types, which must be converted into JsValues and back. You can simplify this process by using a helper crate like gloo-worker, which provides a convenient way to send structured data between workers. Under the hood, it serializes and deserializes data to and from a js_sys::ArrayBuffer.

Dealing with large data can also be tricky, because post_message requires that you copy the data. To avoid large data copies, you can use a SharedArrayBuffer (a buffer that can be accessed by multiple workers) or the post_message_with_transfer function, which allows for transferring the ownership of certain JavaScript objects from one worker to another without copying. The downside of this workaround is that it doesn’t work directly with objects living in WASM memory. For example, if you have a Vec<u8> that you want to send to another worker, you’ll need to either copy it to an ArrayBuffer and transfer it, or copy it to a SharedArrayBuffer and share it.

Shared memory in WebAssembly (multi-threading on the web)

Workers that share an address space can communicate with less boilerplate and minimal data-copying. To create shared memory workers, note that wasm_bindgen’s auto-generated initialization function takes a second (optional) parameter: a WASM memory object for the module to use. Memory chunks can be shared between WASM modules, so we can instantiate a new module using the same memory as the first one, and the two modules will share it.

Having two WASM workers sharing the same memory opens the door to more expressive inter-worker communication. For example, we can easily write a function for executing a closure in another worker, just like how the std::thread::spawn function works. The trick is to create a closure and send its address to the other worker. Since the memory space is shared, the receiving worker can cast that address back into a closure and execute it.

// A function imitating `std::thread::spawn`.
pub fn spawn(f: impl FnOnce() + Send + 'static) -> Result<web_sys::Worker, JsValue> {
  let worker = web_sys::Worker::new("./worker.js")?;
  // Double-boxing because `dyn FnOnce` is unsized and so `Box<dyn FnOnce()>` is a fat pointer.
  // But `Box<Box<dyn FnOnce()>>` is just a plain pointer, and since wasm has 32-bit pointers,
  // we can cast it to a `u32` and back.
  let ptr = Box::into_raw(Box::new(Box::new(f) as Box<dyn FnOnce()>));
  let msg = js_sys::Array::new();
  // Send the worker a reference to our memory chunk, so it can initialize a wasm module
  // using the same memory.
  msg.push(&wasm_bindgen::memory());
  // Also send the worker the address of the closure we want to execute.
  msg.push(&JsValue::from(ptr as u32))
  worker.post_message(&msg);
}

#[wasm_bindgen]
// This function is here for `worker.js` to call.
pub fn worker_entry_point(addr: u32) {
  // Interpret the address we were given as a pointer to a closure to call.
  let closure = unsafe { Box::from_raw(ptr as *mut Box<dyn FnOnce()>) };
  (*closure)();
}

The JavaScript worker glue must be changed slightly, to use the received memory chunk when initializing its WASM module.

importScripts("./path/to/wasm_bindgen/module.js")
self.onmessage = async event => {
  // event.data[0] should be the Memory object, and event.data[1] is the value to pass into child_entry_point
  const { child_entry_point } = await wasm_bindgen(
    "./path/to/wasm_bindgen/module_bg.wasm",
    event.data[0]
  )
  child_entry_point(Number(event.data[1]))
}

And now we can spawn closures on another thread just like in native multi-threaded code, using the spawn function above instead of std::thread::spawn. You can even use Rust’s native inter-thread communication tools, like std::sync::mpsc, to transfer data between threads without copying! Our first worker example becomes as simple as:

let (to_worker, from_main) = std::sync::mpsc::channel();
let (to_main, from_worker) = std::sync::mpsc::channel();
spawn(move || { to_main.send(from_main.recv().unwrap() + 1.0); });
to_worker.send(1.0);
assert_eq!(from_worker.recv().unwrap(), 2.0);

Ok, there are some caveats. Shared memory WASM modules need some features that weren’t in the first iteration of the WASM spec, so you’ll need to build with some extra [target-features][targer-features]. You’ll also need to rebuild the standard library with those features, which requires a nightly compiler and unstable flags. Something like this will do the trick:

RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals" cargo build --target=wasm32-unknown-unknown --release -Z build-std=panic_abort,std

And then you’ll need to configure your web server with some special headers, because shared WASM memory builds on SharedArrayBuffer.

But there’s a more serious issue with shared-memory workers: our example called from_worker.recv() in the main thread, and most browsers will throw an exception if you try to block the main thread, even for a very short time. Since Rust doesn’t have any tooling for checking non-blockingness (see here for some discussion), this might be difficult to ensure.

If the extra discipline is just too onerous or unreliable, you can guarantee a non-blocked main thread by moving all shared-memory WASM modules off of it: from the main thread, use the JavaScript message-passing methods to communicate with one or more workers, which are free to communicate amongst each other using whichever (possibly blocking) methods they want.

How much does all of this actually matter?

To measure the performance implications of the various options, I made some buffers and sent them back and forth repeatedly between workers while measuring the round-trip time. I repeated the experiment with two different buffer sizes (a large 20 MB buffer, and a small 16 B one) and three different message-passing methods. The timings were done on Firefox 101, and the code is available here.

	20MB buffer	16B buffer
`post_message`	28ms	0.028ms
`post_message_with_transfer`	0.033ms	0.033ms
`std::sync::mpsc::channel`	0.0062ms	0.0062ms

You’ll notice that Rust-native shared memory is the fastest by a substantial factor but not a very large absolute amount, unless you really need to send a lot of messages. Between the JavaScript methods, post_message_with_transfer has some small overhead compared to post_message for small buffers, but this is dwarfed by the copying time if you have substantial data to send.

At Tweag, we’ve been working with a client on an optimized WASM library that caches and doles out largish (around 20MB each) chunks of data. We tried various different threading architectures and ended up making do without shared memory. Our heavy use of non-lock-free primitives made it hard to keep the main browser thread happy when using shared memory, while the hybrid architecture depicted above forced us into too many expensive copies (we couldn’t just transfer the data to the main thread because we needed a copy in cache). With a separate-memory architecture, we arranged our data processing so that large buffers are only ever transferred, never copied. And the small overhead of post_message_with_transfer was negligible compared to the other processing we were doing.

Your ideal architecture might be different from ours. By explaining some of the trade-offs involved, I hope this post will help you find it!

Behind the scenes

Joe Neeman

Joe is a programmer and a mathematician. He loves getting lost in new topics and is a fervent believer in clear documentation.

Tech Group

Programming Languages and Compilers

Research, create, improve and maintain programming languages and their tooling to enhance developer productivity and to deliver reliable, maintainable, correct and performant software with minimum effort.

If you enjoyed this article, you might be interested in joining the Tweag team.

This article is licensed under a Creative Commons Attribution 4.0 International license.

← WebAssembly backend merged into GHC Higher-orderness is first-order interaction →