Wasmtime: Supporting UDFs in ScyllaDB with WebAssembly

Share This Post

Editor’s note: This post is by P99 CONF speaker Piotr Sarna.  To hear more from Piotr and many more latency-minded engineers, check out the videos from P99 CONF, including Piotr’s talk on Wasm.

ACCESS P99 CONF ON DEMAND

WebAssembly, also known as WASM, is a binary format for representing executable code, designed to be easily embeddable into other projects. It turns out that WASM is also a perfect candidate for a user-defined functions (UDFs) back-end, thanks to its ease of integration, performance, and popularity. ScyllaDB already supports user-defined functions expressed in WebAssembly in experimental mode, based on an open-source runtime written natively in Rust — Wasmtime.

In fact, we also just added Rust support to our build system in order to make future integrations even smoother!

Choosing the Right Engine

WebAssembly is a format for executable code designed first and foremost to be portable and embeddable. As its name suggests, it’s a good fit for web applications, but it’s also generally a good choice for an embedded language, since it’s quite fast.

One of WebAssembly’s core features is isolation. Each module is executed in a sandboxed environment separate from the host application. Such a limited trust environment is really desired for an embedded language because it vastly reduces the risk of somebody running malicious code from within your project.

WASM is a binary format but it also specifies a human readable text format called WebAssembly Text format – WAT.

To integrate WebAssembly into a project one needs to pick an engine. The most popular engine is Google’s v8, which is implemented in C++ with support for Javascript and provides a very rich feature set. It’s also (unfortunately) quite heavy and not very easy to integrate with asynchronous frameworks like Seastar, which is a building block of ScyllaDB.

Fortunately, there’s also Wasmtime – a smaller (but not small!) project implemented in Rust. It only supports WebAssembly, not Javascript, which also makes it more lightweight. It also has good support for asynchronous environments and has C++ bindings, making it a good fit for injecting into ScyllaDB for a proof of concept implementation.

In ScyllaDB, we selected Wasmtime, due to its being lighter than v8 and its potential for being async-friendly. While we currently use the existing C++ bindings provided by Wasmtime, we plan to implement this whole integration layer in Rust and then compile it directly into ScyllaDB.

Coding in WebAssembly

So, how would one create a WebAssembly program?

WebAssembly Text (WAT) format

First, modules can be coded directly in WebAssembly text format. It’s not the most convenient way, at least for me, due to WASM’s limited type system and specific syntax with lots of parentheses. But it’s possible, of course. All you need in this case is a text editor. Being in love with Lisp wouldn’t hurt either.

```wat

(module
   (func $fib (param $n i64) (result i64)
      (if
         (i64.lt_s (local.get $n) (i64.const 2))
         (return (local.get $n))
      )
      (i64.add
         (call $fib (i64.sub (local.get $n) (i64.const 1)))
         (call $fib (i64.sub (local.get $n) (i64.const 2)))
      )
   )
   (export "fib" (func $fib))
)

```

C++

C and C++ enthusiasts can compile their language of choice to WASM with the clang compiler.

```cpp
int fib(int n) {
   if (n < 2) {
      return n;
   }
   return fib(n - 1) + fib(n - 2);
}
```
```sh
clang -O2 --target=wasm32 --no-standard-libraries -Wl,--export-all -Wl,--no-entry fib.c -o fib.wasm
wasm2wat fib.wasm > fib.wat

```

The binary interface is well defined, and the resulting binaries are also quite well optimized underneath. The code is compiled to WebAssembly with the use of LLVM representation, which makes many optimizations possible.

Rust

Rust also has the ability to reduce WASM output in its ecosystem, and a wasm 32 target is already supported in cargo, the official Rust build tool chain.

```rust
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn fib(n: i32) -> i32 {
   if n < 2 {
      n
   } else {
      fib(n - 1) + fib(n - 2)
   }
}
```
```sh
rustup target add wasm32-unknown-unknown
cargo build --target wasm32-unknown-unknown
wasm2wat target/wasm32-unknown-unknown/debug/fib.wasm > fib.wat

```

AssemblyScript

There’s also an AssemblyScript, a typescript-like language that compiles directly to WebAssembly. AssemblyScript is especially nice for quick experiments because it’s a scripting language. It’s also the only language that was actually invented and designed with WebAssembly as a compilation target in mind.

```assemblyscript
export function fib(n: i32): i32 {
   if (n < 2) {
      return n
   }
   return fib(n - 1) + fib(n - 2)
}
```
```sh
asc fib.ts --textFile fib.wat --optimize
```

User-Defined Functions

Why do we need WebAssembly? Our first use case for ScyllaDB involves User Defined Functions (UDFs). UDF is a Cassandra query language (CQL) feature that allows functions to be defined in a given language, and then calling that function when querying the database. The function will be applied on the arguments by the database itself, and only then returned to the client. UDF also makes it possible to express nested calls and other more complex operations.

Here’s how you can use a user-defined function in CQL:

```cql

cassandra@cqlsh:ks> SELECT id, inv(id), mult(id, inv(id)) FROM t;

id | ks.inv(id) | ks.mult(id, ks.inv(id))
----+------------+-------------------------
7 |   0.142857 |                       1
1 |          1 |                       1
0 |   Infinity |                     NaN
4 |       0.25 |                       1

(4 rows)
```

UDFs are cool enough by themselves, but a more important purpose is enabling User Defined Aggregates (UDAs). UDAs are custom accumulators that combine data from multiple database rows into potentially complex outputs. UDAs consist of two functions: one for accumulating the result for each argument and another for finalizing and transforming the result into the output type.

The code example below shows an aggregate that computes the average length of all requested strings. Functions below are coded in Lua, which is yet another language that we support.

First, let’s create all the building blocks — functions for accumulating partial results and transforming the final result:

```cql
CREATE FUNCTION accumulate_len(acc tuple<bigint,bigint>, a text)
   RETURNS NULL ON NULL INPUT
   RETURNS tuple<bigint,bigint>
   LANGUAGE lua as 'return {acc[1] + 1, acc[2] + #a}';

CREATE OR REPLACE FUNCTION present(res tuple<bigint,bigint>)
   RETURNS NULL ON NULL INPUT
   RETURNS text
   LANGUAGE lua as
'return "The average string length is " .. res[2]/res[1] .. "!"';
```

…and now, let’s combine them all into a user-defined aggregate:

```cql
CREATE OR REPLACE AGGREGATE avg_length(text)
   SFUNC accumulate_len
   STYPE tuple<bigint,bigint>
   FINALFUNC present INITCOND (0,0);
```

Here’s how you can use the aggregate after it’s created:

```cql
cassandra@cqlsh:ks> SELECT * FROM words;

 word
------------
     monkey
 rhinoceros
        dog
(3 rows)

cassandra@cqlsh:ks> SELECT avg_length(word) FROM words;

 ks.avg_length(word)
-----------------------------------------------
 The average string length is 6.3333333333333!
(1 rows)
```

One function accumulates partial results by storing the total sum of all lengths and total number of strings. The finalizing function divides one by the other in order to return the result. In this case the result is in the form of rendered text. As you can see the potential here is quite large — user-defined aggregates allow using database queries in a more powerful way, for instance, by gathering complex statistics or transforming whole partitions into different formats.

Enter WebAssembly

To create a user-defined function in WebAssembly, we first need to write or compile a function to WASM text format. The function body is then simply registered in a CQL statement called create function. That’s it!

```cql
CREATE FUNCTION ks.fib (input bigint)
RETURNS NULL ON NULL INPUT
RETURNS bigint LANGUAGE xwasm
AS '(module
   (func $fib (param $n i64) (result i64)
      (if
         (i64.lt_s (local.get $n) (i64.const 2))
         (return (local.get $n))
      )
      (i64.add
         (call $fib (i64.sub (local.get $n) (i64.const 1)))
         (call $fib (i64.sub (local.get $n) (i64.const 2)))
      )
   )
   (export "fib" (func $fib))
   (global (;0;) i32 (i32.const 1024))
   (export "_scylla_abi" (global 0))
   (data $.rodata (i32.const 1024) "\\01")
)'
```
```cql

cassandra@cqlsh:ks> SELECT n, fib(n) FROM numbers;
 n | ks.fib(n)
---+-----------
 1 |         1 
 2 |         1
 3 |         2
 4 |         3
 5 |         5
 6 |         8
 7 |        13
 8 |        21
 9 |        34
(9 rows)
```

Note that the declared language here is xwasm, which stands for “experimental WASM.” Support for this language is currently still experimental in ScyllaDB.

The current design doc is maintained here. You’re welcome to take a look at it: https://github.com/scylladb/scylladb/blob/master/docs/dev/wasm.md

Our Roadmap

WebAssembly support is in active development, and here are some of our most important goals.

Helper Libraries for Rust and C++

Writing functions directly in WAT format is cumbersome and not trivial because ScyllaDB expects the functions to follow our application binary interface (ABI) specification. In order to hide these details from developers, we’re in the process of implementing helper libraries for Rust and C++, which seamlessly provide ScyllaDB bindings. With our helper libraries, writing a user-defined function will be no harder than writing a regular native function in your language of choice.

Rewriting the User-Defined Functions Layer in Rust

We currently rely on Wasmtime’s C++ bindings to expose a WASM runtime for user-defined functions to run on. These C++ bindings have certain limitations though. Specifically, they lack support for asynchronous operations, which is present in Wasmtime’s original Rust implementation.

The choice is abundantly clear — let’s rewrite it in Rust! Our precise plan is to move the entire user-defined functions layer to Rust, where we can fully utilize Wasmtime’s potential. With such an implementation, we’ll be able to run user-defined functions asynchronously, with strict latency guarantees; we’ll only provide a thin compatibility layer between our Seastar and Rust’s async model to enable polling Rust futures directly from ScyllaDB. The rough idea for binding Rust futures straight into Seastar is explained here.

We already added Rust support to our build system. The next step is to start rewriting User-Defined Functions engine to a native Rust implementation and then we can compile it right into ScyllaDB.

Join the ScyllaDB Community

WASM support is only one of a huge number of projects we have underway at ScyllaDB. If you want to keep abreast of all that’s happening, there’s a few ways to get further plugged in:

Piotr @ P99 CONF: Overcoming Variable Payloads to Optimize for Performance

At this year’s P99 CONF — free and virtual — Piotr will take deeper dive into this topic with a talk on “Keeping Latency Low for User-Defined Functions with WebAssembly.” He will show how to integrate WebAssembly and Wasmtime into a C++ project in a latency-friendly manner, while implementing the core runtime for user-defined functions in async Rust.

REGISTER FOR P99 CONF

About Piotr Sarna

Piotr Sarna is a software engineer who is keen on open source projects and the Rust and C++ languages. He previously developed an open source distributed file system and had a brief adventure with the Linux kernel. He's also a long-time contributor and maintainer of ScyllaDB, as well as libSQL and Turso. Piotr graduated from University of Warsaw with a Master's degree in computer science.

More To Explore

P99 CONF Speaker Spotlight: Tanel Poder

Get to know Tanel Poder, Performance Nerd at PoderC Consulting, in anticipation of his P99 CONF talk: “Using eBPF Off-CPU Sampling to See What Your Databases are Really Waiting For”

P99 CONF OCT. 23 + 24, 2024

Register for Your Free Ticket