How WebAssembly addresses key issues with SQLite UDFs — so you can skip the low-level C API and use languages like Rust, Zig, or Go
Editor’s note: The following is a post from Piotr Sarna, a long-time ScyllaDB contributer who’s now at Turso. At P99 CONF 23, he presented “Less Wasm,” a short case study on how to “minify” WebAssembly code compiled with the Rust toolchain so that it loads faster and consumes less resources. This article was originally published on Turso’s blog.
A while back, Turso announced that we were forking SQLite, into a project called libSQL. While we love SQLite, we respectfully disagree with (even if we understand) their focus on being fully public domain with no 3rd party code and rarely accepting contributions. Our goal was to create a community of database enthusiasts who want to explore other directions that could be taken for an OLTP-oriented embedded database if SQLite would be more open, while standing on the shoulders of giants, as we all should aspire to do.
In the first official release of libSQL, one of our main goals was to address the challenges of SQLite UDFs with a new approach: the ability to dynamically create WebAssembly-powered user-defined functions. We wanted to address key issues with SQLite UDFs so users could skip the low-level C API and use languages like Rust, Zig, or Go.
Problems with SQLite user-defined functions (UDFs)
SQLite supports user-defined functions. UDFs in SQLite work by allowing users to provide a piece of C code that will be executed directly into the data, performing some in-statement computation.
For example, you could define a function to calculate a hash of a particular string, and then obtain hashes for all values in a column col from a table table by writing:
SELECT hash(col) from table;
In an over-the-network database like Postgres, not using UDFs means that before any computation can be performed, the client needs to first fetch the data. This can be very wasteful.
For an embedded database, there is no network activity involved in fetching this data (although we are also exploring changing that). Still, using UDFs, as opposed to materializing the data and then transforming, can avoid memory copies and extra passes, resulting in faster code. But there are problems…
A more convenient API
The first problem is that in the original SQLite, UDFs are available through the C API or its derivatives, which means that they can only be registered programmatically. The SQL layer is what is ultimately exposed, so that’s where we should aim for function registration to live.
Many projects (like rqlite, dqlite, mvsqlite) do try to push the boundaries, and add network functions around SQLite, which is one more point in favor of function manipulation at the statement level, as opposed to the low-level C API.
That’s why other SQL databases customarily offer a
CREATE FUNCTION statement, and we want one as well.
CREATE FUNCTION statements would be a perfect gateway for users to add their specific logic to the database.
The choice of language
Aside from not being available from the SQL statements, the SQLite function API expects you to register a pointer to a C function. With the help of your driver, that doesn’t mean that you have to write your function in C: your language of choice likely has a C FFI. The driver just has to provide C bindings to your functions.
That is easy from the programmatic interface from a single-language driver. But if we want this to work from the SQL statements in a consistent way, we need a language that is cross platform enough so that it will work with any driver.
It could be C. But for a large contingent of our industry, we’re usually talking about higher level languages like Go, Ruby, or TypeScript. Even in the domain of systems programming, historically a stronghold of C and C++, Zig and Rust are rapidly gaining momentum.
WebAssembly for User-Defined Functions
Thankfully, there is a language, also gaining rapid momentum across different ecosystems, that is a great fit for user-defined functions: WebAssembly. One of its core foundations is isolation, which is essential for running untrusted code on your own machines. It also ticks all the other boxes:
- thriving open-source community
You don’t actually have to write your code directly into WebAssembly. It is a common compile target for many modern languages, such as Rust and Zig.
The language is publicly specified. So, true to the original SQLite’s approach, it would not be impossible to write an entire new WebAssembly runtime into SQLite. But we believe there are benefits to keeping to the existing runtimes, and for libSQL, we will just integrate.
Example: How to run an encryption routine as a UDF with libSQL
Let’s go through a full example of how to run an encryption routine as a UDF with libSQL.
Get the latest libSQL release
In order to run WebAssembly user-defined functions in libSQL, the first step is to get the latest libSQL release. Release artifacts for x86_64 Linux can be downloaded from the official release page. The release is also available in a container.
To run the shell, use the following command:
docker run -it piotrsarna/libsql:libsql-0.1.0-wasm-udf ./libsql
The source code is available on GitHub.
Compile your first function
The first step is to compile your code into a WebAssembly binary. You can then load a file containing the WebAssembly binary, or even add the resulting binary blob straight into the SQL statement.
For this example, we will write our code in Rust. Your code can do pretty much any computations, as long as it doesn’t try to access system resources (including time/date and sources for entropy for pseudorandom number generation), use the network, and so on — that’s something that WebAssembly isolation rules will prevent.
In addition to WebAssembly isolation rules, we require that you limit the function parameters and return type to something that both WebAssembly and libSQL can handle, specifically, for our Rust example:
- Any primitive integer type for libSQL’s
To make this process easier, we provide a crate, libsql_bindgen. It allows you to add the
#[libsql_bindgen] macro to a Rust function, making sure all types in the function signature are transformed the the types libSQL understands.
Come play at our playground instead?
If you don’t feel like doing this all manually, that’s okay: we deliver all the tools to make the experience as smooth as possible, at our playground bindgen.
Navigate there and click the “Generate” button. Now paste the resulting SQL statement right into the libSQL shell.
Calling your first function
For this example, let’s compile the following Rust function, using bindgen:
Start by launching the libSQL shell:
As a next step, let’s instantiate the functions and create a demo table with some data we want encrypted. Note that we will paste the statements as blobs, by clicking “as binary blob” in the bindgen playground. That is just to allow a more compact representation.
Now, let’s see if we’re able to read the results as key owners, and get refused otherwise:
Voilà! All the secrets are properly encrypted, and the decryption key works too.
WebAssembly triggers in libSQL
WebAssembly-powered user-defined functions are exciting on their own, but combined with database triggers, they become a powerful building block for automating your workflows. This follow-up blog shares a few examples of WebAssembly triggers in action within a database flow for creating new user records. It walks you through how triggers can be used to:
- Save passwords in an encrypted form instead of plaintext
- Check if the encryption works
- Generate a unique single-use token for each user and put it in a separate table
- Verify that a single-use token was indeed generated and inserted into the table
Read about how to use WebAssembly for database triggers.
Watch Piotr’s Talk from P99 CONF 23
Piotr’s “Less Wasm” tech talk is a case study on how getting rid of WebAssembly is great for your latency. More specifically, it’s about how you can reduce the size of your WebAssembly binaries so that they load faster and perform better. He walks through a handful of “minification” techniques he’s found useful, then shares the results from applying these techniques to libSQL. (Spoiler: they achieved a 24000% reduction in size).