Rust Rewrites & Optimizations at P99 CONF 25

Share This Post

Rust has been a primary thread at P99 CONF since day 1 – literally. From Brian Martin opening up the first-ever P99 CONF with, Whoops! I Wrote it in Rust!, to Glauber Costa’s brilliant session Rust Is Safe. But Is It Fast?, to Bryan Cantrill’s turbocharged take on Rust, Wright’s Law, and the Future of Low-Latency Systems, Rust earned “top topic” honors. And it defended that position every year since, thanks to talks by the likes of Carl Lerche, Armin Ronacher, fasterthanlime, and many more. [watch past sesssions on demand]

P99 CONF 2025 will be rather Rusty, yet again. Here’s a sneak peek into some of the Rust-focused talks we’ll be featuring.

Join P99 CONF (Free + Virtual)

(In case you’re new to P99 CONF, it’s a free 2-day community event for engineers obsessed with low-latency engineering strategies and performance optimization. It’s intentionally virtual, highly interactive, and purely technical.)

Clickhouse’s C++ and Rust Journey

Alexey Milovidov, CTO at Clickhouse

Full rewrite from C++ to Rust or gradual integration with Rust libraries? For a large C++ codebase, only the latter works, but even then, there are many complications and rough edges. In my presentation, I will describe our experience integrating Rust and C++ code and some weird and unusual problems we had to overcome.

Rewriting Prime Video UI with Rust and Webassembly

Alexandru Ene, Principal Engineer at Prime Video

Prime Video delivers content to millions of customers, all over the world, on a variety of devices such as: game consoles, set-top boxes, streaming sticks, and Smart TVs. These devices have a vast range of hardware capabilities and performance characteristics. We’ll show how we’ve used Rust and WebAssembly to build new version of the Prime Video app and improve performance as well as enable features previously impossible to use with JavaScript and React. The talk will go into detail on how we tackled challenges during the rewrite, as well as new challenges that have emerged recently (why is AOT compilled Wasm x3 the size of a WASM file, etc.).

Why We’re Rewriting SQLite in Rust

Glauber Costa, Co-Founder and CEO at Turso

Over two years ago, we forked SQLite. We were huge fans of the embedded nature of SQLite, but wanted a more open model of development…and libSQL was born as an Open Contribution project. Last year, as we were adding Vector Search to SQLite, we had a crazy idea. What could we achieve if we were to completely rewrite SQLite in Rust? This talk explains what drove us down this path, how we’re using deterministic simulation testing to ensure the reliability of the Rust rewrite, and the lessons learned (so far). I will show how a reimagining of this iconic database can lead to performance improvements of over 500x in some cases by looking at what powers it under the hood.

Squeezing Every Millisecond: How We Rebuilt the Datadog Lambda Extension in Rust

AJ Stuyvenberg, Staff Engineer at Datadog

Hear how we rewrote Datadog’s AWS Lambda Extension from Go to Rust (with no prior Rust experience) in order to temper p99.9 latency which is especially painful in Lambda. This talk will cover detailed benchmarks in how we measured and achieved a 80% Lambda cold start improvement along with a 50% memory footprint reduction, directly saving customers thousands of dollars on cloud bandwidth and compute spend. We’ll also discuss where and how to use to best use Lambda for optimal cost/performance, which will help give you the tools you need to grasp the full operational scope of Lambda and use it to handle unpredictable traffic bursts or large amounts of background jobs.

Reworking the Neon IO stack: Rust+tokio+io_uring+O_DIRECT

Christian Schwarz, Member of Technical Staff at Databricks

Pageserver is the multi-tenant storage service at the heart of Neon, a Postgres platform that is now part of Databricks, where it powers the recently launched Lakebase product. We share techniques and lessons learned from reworking the IO stack of Pageserver for fully asynchronous and direct IO to local NVMe drives – all during a period of rapid growth. Pageserver is implemented in Rust, we use the tokio async runtime for networking, and integrate it with io_uring for filesystem access.

Bridging epoll and io_uring in Async Rust

Tzu Gwo, Founder at Tonbo

Tokio is the most well-known runtime in the async Rust ecosystem. However, since Tokio is built around epoll (and other polling-based async I/O), it implicitly limits support for io_uring — a completion-based async I/O — in the broader async Rust world. These two kinds of I/Os use entirely incompatible trait bounds.

In this talk, I’ll explain why async Rust’s current design and ecosystem make it hard to adopt io_uring, and introduce an approach we’ve been using: although it’s not possible to switch async I/O types at runtime (i.e., be I/O-agnostic) under today’s async Rust model, we found a simple and effective way to switch between different I/O runtimes at compile time. This allows I/O middleware to support both epoll and io_uring without changing its code.

A Visual Journey Through Async Rust

Alex Puschinsky, Tech Lead Software Engineer at Trigo

Async programming, whether in Rust or other platforms, is full of nuance and pitfalls. Reading tutorials, documentation, and even source code is helpful, but for me, the best path to understanding is through tinkering, experimenting, and visualizing. In this talk, I’ll create an async Rust visualization tool and use it to investigate the nature of async execution. What order do futures execute in? How do “parallel” and “concurrent” execution really look? What are the real effects of the dreaded “CPU-heavy code” on async performance? Visualization isn’t just about pretty graphics — it gives us an intuitive understanding of async. With it, we’ll gain important insights that will help us improve our code efficiency and multi-core utilization.

Mechanical Sympathy in Cooperative Multitasking

Kenny Chamberlin, Lead engineer at Momento

I’m briefly covering mechanical sympathy from first principles, and moving forward to practical applications. I have 3 techniques to present; 2 that are readily portable to any ecosystem, and a third that takes advantage of Rust’s borrow checker and standard Future model.

This talk is prefaced to cooperative multitasking / coroutines / async-await on servers. I’ll bring hardware ideas next to operating system ideas. These will be high level, “a cpu core,” “system memory,” and “a thread” are assumed to be detailed enough descriptions. I’ll clarify any curiosity in the live Q&A. The core mechanical sympathy idea to apply is to avoid parking your threads whenever possible. I’ll talk about 3 techniques to do that.
The first technique is to reduce your thread count. I’ve seen systems at big companies with hundreds of threads on 8-16 core servers. If you do that with a cooperative multitasking system you are bringing extra tail latency pain into your service. And it’s hard to measure it.

The second technique is to scope your locks as tightly as possible. Allocate memory, prepare objects, look up metadata before you lock. Update state and release the lock as quickly as possible. This is sometimes enough.

The third technique is to remove your locks. Some Rust features make it convenient to do this safely and quickly. Use ArcSwap for read-heavy data that does not need strict consistency with other data. Use Future to synchronize data and processes without any locks.

I will have pictures of metrics, example code, and live demos in my IDE.

Achieving Sub-10 Millisecond Latencies at Climatiq

Gustav Wengel, Software Developer at SCADA Minds

This is a case study of how Climatiq achieved a 8.94ms median latency, with a serverless Rust-based web API, without the capability to store anything in-memory between requests. We’ll cover:

The different strategies we used to include significant data amounts right into our binary, culminating in using the “rkyv” library for zero-copy deserialization, essentially removing the entire deserialization step.
How this more general pattern of shifting work to build-time works for other cases, like pathfinding graphs.
How we minimize latency from having to contact external services, by by utilizing caching.
How “stale-while-revalidate” allows us to offload time-consuming network requests until after we have responded to the user.
A few miscellaneous tips and tricks regarding Rust performance.

Join Us Online, October 22 and 23