Low-Latency Engineering Tech Talks

Browse the full library of P99 CONF tech talks and decks. Discover how experts tackle low-latency, high-performance distributed computing challenges from a wide range of perspectives

Filter Videos

Browse our library of talks on low-latency engineering strategies.

Patterns of Low Latency

Pekka Enberg

Founder & CTO at Turso
Building for low latency is important, but the tips and tricks are often part of developer folklore and hard to…

DTrace at 21: Reflections on Fully-grown Software

Bryan Cantrill

CTO of Oxide Computer Company
Twenty one years ago, DTrace was integrated into the operating system. My any measure, the software is now fully-grown: it…

Rust + io_uring + ktls: How Fast Can We Make HTTP?

Amos Wenger

Creator of Faster Than Lime
Working on Fluke: async Rust HTTP1+2 with io_uring & kTLS, sponsored by fly.io & Shopify. Unlike others, Fluke is built…

The Next Chapter in the Sordid Love/Hate Relationship Between DBs and OSes

Andy Pavlo

Associate Professor at Carnegie Mellon University
DBMSs struggle with OS constraints, but new tech like eBPF can change the game. Join us to explore “user-bypass” designs…

Zero-overhead Container Networking with eBPF and Netkit

Liz Rice

Chief Open Source Officer, Isovalent at Cisco
Introducing Netkit: a new eBPF enhancement replacing veth connections in container networking. Say goodbye to the overhead slowing down container…

Noisy Neighbor Detection with eBPF

Jose Fernandez

Senior Software Engineer at Netflix
Tackling “noisy neighbor” issues in multi-tenant setups! At Netflix, we use eBPF to monitor and mitigate excessive CPU usage in…

Rust: A Productive Language for Writing Database Applications

Carl Lerche

Principal Engineer at AWS
Think Rust is just about performance and safety? Let’s talk productivity. Last year, Rust’s library ecosystem needed work. What’s changed?…

Designing a Query Queue for ScyllaDB

Avi Kivity

CTO and Co-Founder of ScyllaDB
Database queries vary widely—from milliseconds to hours. Optimizing concurrency is a delicate balance of CPU, memory, and stability. Bad design…

You’re Doing It All Wrong

Michael Stonebraker

CTO & Co-founder of DBOS
Historically, business apps use a three-tier architecture. Now, cloud-native architectures and DBMS can be combined, allowing for resilient, cost-effective, and…

1BRC – Nerd Sniping the Java Community

Gunnar Morling

Principal Software Engineer at Decodable
Gunnar Morling dives into the tricks that the fastest 1BRC solutions used to process the challenge’s 13 GB input file…

Overcoming Distributed Databases Scaling Challenges with Tablets

Dor Laor

CEO of ScyllaDB
Maximizing performance goes beyond server-level tweaks. Even low level code, scaling requires more. In this session, learn about “tablets”—a dynamic…

The Performance Engineer’s Toolkit: A Case Study on Data Analytics with Rust

Will Crichton

Assistant Professor at Brown University
I optimized a Python data analytics pipeline, making it 180,000x faster with Rust! Using compiler optimizations, data structures, vectorization, parallelization,…

Using Sketching Technology to Optimize Services with Fewer Resources

Yichen Wei

Engineer Manager at Disney+/Hulu
Optimize your services with cost-efficient observability using high-performance sketching tools. Dive into creating sketching tech for various scenarios, making the…

Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For

Tanel Poder

Performance Nerd at PoderC LLC
At last year’s P99 CONF, Tanel introduced using eBPF Task State Arrays to track Linux apps’ thread states/activity without built-in…

Java Heap Memory Optimization to Improve P99 Query Latency at Linkedin Scale

Vivek Iyer Vaidyanathan

Staff Software Engineer at LinkedIn
Discover how LinkedIn optimized Apache Pinot’s performance! By using FALF Interning, a home-grown, lock-free method, they cut JVM heap usage…

Just In Time LSM Compaction

Aleksei Kladov

Staff Software Engineer at TigerBeetle
Matklad dives into the implementation of TigerBeetle’s JIT compaction algorithm for LSM, which is highly concurrent and uses all available…

Redis Alternatives Compared

Peter Zaitsev

Founder of Percona, Coroot, FerretDB
Join Peter as he dives into Redis alternatives like Valley, DragonflyDB, and Microsoft Garnet. He’ll cover licensing, features, community support,…

Detecting Memory Leaks in Android A/B Tests: A Production-Focused Approach

Pavlo Stavytskyi

Google Developer Expert
Discover how to detect subtle memory leaks and regressions in Android apps with a production-focused approach. Learn the key metrics…

One Billion Row Challenge in Golang

Shraddha Agrawal

Senior Software Engineer, Ceph, IBM
Join us as we tackle Gunnar Morling’s One Billion Rows Challenge in Golang! We’ll walk through optimizing a 16GB file…

Taming Discard Latency Spikes

Patryk Wróbel

Software Engineer at ScyllaDB
Learned a crucial lesson on read/write latency when fixing a real ScyllaDB issue! Discover how TRIM requests impact NVMe SSDs…

Why Databases Cache, but Caches Go to Disk

Felipe Cardeneti Mendes

Technical Director at ScyllaDB

Alan Kasindorf

Founder of Cache Forge
ScyllaDB teamed up with Memcached to compare how caches and databases handle storage and memory across different scenarios. We’ll dive…

Primitive Pursuits: Slaying Latency with Low-Level Primitives and Instructions

Ravi A Giri

Senior Principal Engineer at Intel

Harshad S Sane

Principal Software Engineer at Intel
This talk showcases a methodology with examples to break down applications to low-level primitives and identify optimizations on existing compute…

How to Improve Your Ability to Solve Complex Performance Problems: Part 2

Kerry Osborne

Google Database Black Belt Team Lead at Google
In Part 2 of my P99 2023 talk, I’ll dive into practical strategies to enhance our problem-solving skills in the…

Database Drivers: Performance Perspectives

Piotr Sarna

Founding Engineer at poolside
Unlock the full potential of database drivers! Dive deep into their design, uncover how they work under the hood, and…

Low-Latency Mesh Services Using Actors

Nikita Lapkov

Senior Software Engineer
We’re transforming elfo, our Rust actor system, into a distributed mesh of services. Learn how we tackled message serialization, compression,…

Minimizing Request Latency of Self-Hosted ML Models

Julia Kroll

Applied Engineer at Deepgram
Join our session on minimizing latency in self-hosted #ML models in cloud environments. Learn strategies for deploying Deepgram’s speech-to-text models…

Using Change Point Detection to Fight Noisy Benchmark Results

Matt Fleming

Co-Founer & CTO at Nyrkiö Oy
Discovering performance regressions in modern systems is tough due to inevitable noise. Change Point Detection (CPD) algorithms are gaining traction…

Enhancing P99 Latency: Strategies for Doubling/Tripling Performance in Third-Party APIs

Cristian Velazquez

Staff Site Reliability Engineer at Uber
Sharing our journey to improve P99 latency in third-party APIs. From optimizing network configs to fine-tuning connection management, we aimed…

Understanding Request Latency with Wallclock Profiling

Richard Startin

Senior Software Engineer at Datadog
Analyzing request latency is tough since it’s not always CPU-bound. Many devs give up on CPU profiling, but sampling profilers…

Fast, Secure and Dense: Finally Serverless with WebAssembly

Thorsten Hans

Sr. Cloud Advocate at Fermyon Technologies
Discover how WebAssembly is revolutionizing cloud computing. Join Thorsten Hans to learn about building serverless apps with Spin, achieving true…

Latency, Throughput & Fault Tolerance: Designing the Arroyo Streaming Engine

Micah Wylde

Co-Founder at Arroyo
Arroyo is a Rust-based, distributed stream processing engine offering millisecond-latency and high-throughput. It achieves fault tolerance and exactly-once processing via…

Get Low (Latency)

Benjamin Cane

Distinguished Engineer at American Express

Tyler Wedin

Vice President, Global Payments Network SRE at American Express
Building a real-time, low-latency card payments system is a challenge. Join the Amex Payments Network team to learn about their…

Reliable Data Replication

Cameron Morgan

Staff Infrastructure Engineer at Shopify
Data replication ensures high availability—reliable, consistent, and timely access. Dive into the tough problems often skipped: reliable backfills, schema changes,…

Scheduler Tracing With ftrace + eBPF

Jason Rahman

Principal Software Engineer at Microsoft
Dive into understanding app latency by exploring the Linux scheduler with ftrace, eBPF, and Perfetto for visualization. Uncover quirks in…

Aiding the CUDA Compiler for Fun and Profit

Joe Rowell

Founding Engineer at poolside
Get the most out of your CUDA code by understanding how the compiler works.

Building a Cloud Native LSM on Object Storage

Chris Riccomini

Creator of Materialized View

Rohan Desai

Co-Founder of Responsive
Excited to introduce SlateDB, an open-source, cloud-native storage engine. Built as an LSM on object stores like S3/GCS/ABS, it leverages…

Cheating the Cloud: 50% Savings with Compression Dictionaries

Łukasz Paszkowsk

Software Engineer Team Lead at ScyllaDB
Faced with high networking costs, we tackled insufficient compression with a custom RPC compressor using ZSTD and external dictionary support.…

Internet-Scale Semantic, Structural, and Text Search in Real Time

Ash Vardanian

Founder of Unum Cloud
Discover powerful search algorithms and their SIMD- and GPU-accelerated implementations for AI-powered semantic search, structure search, or exact & fuzzy…

Writing a Kernel in Rust: Code Quality and Performance

Luc Lenôtre

Site Reliability Engineer at Clever Cloud
Maestro kernel began as a C-based school project and transitioned to Rust for better code quality. Now, it’s in a…

Running Low-Latency Workloads on Kubernetes

Jimmy Zelinskie

Co-Founder of AuthZed
Configuring Kubernetes for optimal workload performance is a continuous journey. Best practices can sometimes harm performance. Join us as we…

Distributed Async Await: A New Programming Model for the Cloud

Dominik Tornow

CEO at Resonate HQ
Dive into the future of cloud dev with Distributed Async Await. Simplify your code and conquer the chaos of distributed…

Feature Store Evolution Under Cost Constraints: When Cost is Part of the Architecture

David Malinge

Senior Staff Software Engineer at ShareChat

Ivan Burmistrov

Principal Software Engineer at ShareChat
ShareChat’s scaling ML Feature Store to handle 1B features/sec was just the start. Next challenge: cutting costs while keeping quality.…

WebAssembly on the Edge: Sandboxing AND Performance

Brian Sletten

President at Bosatsu Consulting, Inc.

Ramnivas Laddad

Co-Founder of Exograph, Inc
Moving apps to the Edge can complicate performance due to security constraints. Learn how WebAssembly bridges the gap, enabling both…

Queues, Hockey Sticks and Performance

David Collier-Brown

Staff Engineer
Queues: both a blessing and a curse in computer science. They help predict performance but also signal overload. This talk…

Taming Tail Latencies in Apache Pinot with Generational ZGC

Christopher Peck

Senior Software Engineer at Uber
Discover how Generational ZGC slashed Java app pause times in real-world use! Learn how Apache Pinot tackled scatter-gather tail latencies…

Measuring and Diagnosing Performance Shouldn’t Require Magic

Cary Millsap

Distinguished Product Manager at Oracle
Struggling with performance issues despite all green dashboards? Experts say you need special skills, but we’ll show you how to…

Remote CAD that Feels Local

Adam Chalmers

Systems Engineer at Zoo

Adam Sunderland

Lead Cloud Infrastructure Engineer at Zoo
Zoo is creating a CAD suite that runs in the cloud but feels like it’s local. How? Regional deployment, WebRTC…

Profiling your Go Service with pprof

Miriah Peterson

Lead Engineer at Soypete Tech
Optimize your Go code with the powerful pprof tool. Learn how to integrate, access, and interpret pprof metrics, plus best…

Performance Pitfalls of Rust Async Function Pointers (And Why It Might Not Matter)

Byron Wasti

Founder & CEO
An in-depth analysis of asynchronous function pointers in Rust, why they aren’t a real thing (compared to normal function pointers)…

Elevating PostgreSQL: Benchmarking Vector Search Performance

Daniel Seybold

Co-Founder at benchANT
PostgreSQL continues to evolve with vector search extensions like pgvector and pgvecto.rs. We’ll explore recent benchmarks comparing vector search performance…

Sight Beyond Sight: See it All Through Observability

Leandro Melendez

Developer Advocate at Grafana Labs
Observability is more than metrics and logs—it’s knowing your system’s status without checking under the hood. From QA processes to…

Time-Series and Analytical Databases Walk Into a Bar…

Andrei Pechkurov

Core Engineer at QuestDB
In this talk, we share our journey in making QuestDB, an open-source time-series database, a much faster analytical database, featuring…

Profile-Guided Optimization (PGO): (Ab)using it for Fun and Profit

Aliaksandr Zaitsau

Solution Architect
Discover how to boost your software with lesser-known compiler flags and Profile-Guided Optimization (PGO). Learn what PGO is, how it…

How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

Aadhav Vignesh

Software Engineer
In 2022, I began crafting a tool to visualize Go’s GC in real-time. I’ll dive into the hurdles of extracting…

What C and C++ Can Do and When Do You Need Assembly?

Alexander Krizhanovsky

CEO at Tempesta Technologies
Join us to dive into GCC and Clang optimizations for C/C++! We’ll explore how x86-64 executes code, use assembly for…

Low Latency Gal Presents: Low Latency Stuff

Sonia Kolasinska

Low Latency Gal
Lock-free programming and precise ultra low latency pipelining between CPU cores.

Cache Me If You Can: How Grafana Labs Scaled Up Their Memcached 42x & Cut Costs Too

Danny Kopping

Senior Software Engineer at Grafana Labs
Our cloud database stores billions of files in object storage. With petabytes of data being queried every day, we started…

High Performance on a Low Budget

Gwen Shapira

Co-founder & CPO of Nile
It is one thing to solve performance challenges when you have plenty of time, money, and expertise available. Many performance…

From 1M to 1B Features Per Second: Scaling ShareChat’s ML Feature Store

Andrei Manakov

Senior Staff Software Engineer at ShareChat

Ivan Burmistrov

Principal Software Engineer at ShareChat
ShareChat’s Ivan Burmistrov and Andrei Manakov walk through how they built a low latency ML Feature Store based on ScyllaDB which…

Corporate Open Source Anti-Patterns: A Decade Later

Bryan Cantrill

CTO of Oxide Computer Company
A little over a decade ago, I gave a talk on corporate open source anti-patterns, vowing that I would return…

Quantifying the Performance Impact of Shard-per-core Architecture

Dor Laor

CEO of ScyllaDB
Most software isn’t architected to take advantage of modern hardware. How does a shard-per-code and shared-nothing architecture help – and…

How Netflix Builds High Performance Applications at Global Scale

Prasanna Vijayanathan

Senior Software Engineer at Netflix
We all want to build applications that are blazingly fast. We also want to scale them to users all over…

eBPF vs Sidecars

Liz Rice

Chief Open Source Officer, Isovalent at Cisco
From its vantage point in the kernel, eBPF provides a platform for building a new generation of infrastructure tools for…

Taming P99 Latencies at Lyft: Tuning Low-Latency Online Feature Stores

Bhanu Renukuntla

Senior Software Engineer at Lyft
In this talk, we will explore the challenges and strategies of tuning low latency online feature stores to tame the…

Running a Go App in Kubernetes: CPU Impacts

Teiva Harsanyi

Senior Software Engineer at Google
Understanding the impacts of running a containerized Go application inside Kubernetes with a focus on the CPU.

Expanding Horizons: A Case for Rust Higher Up the Stack

Carl Lerche

Principal Engineer at AWS
Historically associated with systems programming due to its roots in Mozilla, Rust’s promise of safety, speed, and concurrency has led…

How to Improve Your Ability to Solve Complex Performance Problems

Kerry Osborne

Google Database Black Belt Team Lead at Google
This talk is really about problem solving. It’s about how we think about problems and how we resolve those problems…

Square’s Lessons Learned from Implementing a Key-Value Store with Raft

Omar Elgabry

Software Engineer at Square
To put it simply, Raft is used to make a use case (e.g., key-value store, indexing system) more fault tolerant…

Performance Budgets for the Real World

Tammy Everts

Chief Experience Officer at SpeedCurve
Performance budgets have been around for more than ten years. Over those years, we’ve learned a lot about what works,…

A Deterministic Walk Down TigerBeetle’s main() Street

Aleksei Kladov

Staff Software Engineer at TigerBeetle
Learn how to use Zig to implement a fully deterministic distributed system which will never fail with an out of…

VM Performance: The Differences Between Static Partitioning or Automatic Tuning

Dario Faggioli

Virtualization Software Engineer at SUSE
Virtualized workloads are known to require carefully crafted configuration and tuning, both at the host and at the guest level,…

Measuring the Impact of Network Latency at Twitter

Widya Salim

Data Scientist at SEEK

Victor Ma

Senior Data Scientist at Airwallex

Zhen Li

Data Scientist at TikTok
Widya Salim, Victor Ma, and Zhen Li will outline the causal impact analysis, framework, and key learnings used to quantify…

Conquering Load Balancing: Experiences from ScyllaDB Drivers

Piotr Grabowski

Software Team Leader at ScyllaDB
Load balancing seems simple on the surface, with algorithms like round-robin, but the real world loves throwing curveballs. Join me…

Low-Latency Data Access: The Required Synergy Between Memory & Disk

Kriti Kathuria

Graduate Researcher at the University of Waterloo
Analytics has moved from internal dashboards to a dashboard inside the product, providing a personalized experience for each user, be…

Distributed System Performance Troubleshooting Like You’ve Been Doing it for Twenty Years

Jon Haddad

Founder at Rustyrazorblade Consulting
Troubleshooting performance issues across distributed systems can be intimidating if you don’t know where to start, and it’s even harder…

Writing Low Latency Database Applications Even If Your Code Sucks

Glauber Costa

Founder & CEO of Turso
All latency lovers are used to the mystical experience of joy coming from aligning data to a cache line size…

Using Libtracecmd to Analyze Your Latency and Performance Troubles

Steven Rostedt

Software Engineer at Google
Trying to figure out why your application is responding late can be difficult, especially if it is because of interference…

Building Low Latency ML Systems for Real-Time Model Predictions at Xandr

Chinmay Abhay Nerurkar

Principal Engineer at Microsoft
Xandr’s Ad-server handles over 400 billion daily ad requests from across the world wide web. Operating under a stringent Service…

ORM is Bad, But is There an Alternative?

Henrietta Dombrovskaya

Database Architect at DRW
It’s a well-known fact, that although the database performance is great, and each query is executed in milliseconds, the overall…

P99 Publish Performance in a Multi-Cloud NATS.io System

Derek Collison

Founder & CEO of Synadia
This talk will walk through the strategies and improvements made to the NATS server to accomplish P99 goals for persistent…

Making Python 100x Faster with Less Than 100 Lines of Rust

Ohad Ravid

Team Lead at Trigo
Python isn’t known as a low-latency language. Can we bridge the performance gap using a bit of Rust and some…

Zero Downtime Critical Traffic Migration @Netflix Scale

Abhishek Pandey

Senior Software Engineer at Meta
Picture yourself enthralled by the latest episode of your beloved Netflix series, delighting in an uninterrupted, high-definition streaming experience. Behind…

The History of Tracing Oracle

Cary Millsap

Distinguished Product Manager at Oracle
In this presentation, I will explore the history of tracing Oracle and why it has been overlooked despite its usefulness.…

Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Context Enrichment

Tanel Poder

Performance Nerd at PoderC LLC
In this session, Tanel introduces a new open source eBPF tool for efficiently sampling both on-CPU events and off-CPU events…

Cost-Effective Burst Scaling For Distributed Query Execution

Dan Harris

Principal Software Engineer at Coralogix
Building a query engine that scales efficiently is a difficult task. Queries over big datasets stored in Object Storage require…

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throughput Data Pipelines

Zamir Paltiel

Head of Engineering at Hyperspace
In this presentation, we explore how standard profiling and monitoring methods may fall short in identifying bottlenecks in low-latency data…

Mitigating the Impact of State Management in Cloud Stream Processing Systems

Yingjun Wu

CEO of RisingWave Labs
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can…

Practical Go Memory Profiling

William Kennedy

Managing Partner at Ardan Labs
In this talk, Bill will show you how to use benchmark profiling in and compiler directives in Go to find…

Adventures in Thread-per-Core Async with Redpanda and Seastar

Travis Downs

Software Engineer at Redpanda
Thread-per-core programming models are well known in software domains where latency is important. Pinning application threads to physical cores and…

Architecting a High-Performance (Open Source) Distributed Message Queuing System in C++

Vitaly Dzhitenov

Senior Software Engineer at Bloomberg
BlazingMQ is a new open source* distributed message queuing system developed at and published by Bloomberg. It provides highly-performant queues…

Noise Canceling RUM

Tim Vereecke

Web Performance Architect at Akamai
Noisy Real User Monitoring (RUM) data can ruin your P99! We introduce a fresh concept called “Human Visible Navigations” (HVN)…

Less Wasm

Piotr Sarna

Founding Engineer at poolside
The presentation explains why getting rid of WebAssembly is good for your latency. More specifically, it’s a short case study…

Reducing P99 Latencies with Generational ZGC

Stefan Johansson

Principle Member of Technical Staff at Oracle
With the low-latency garbage collector ZGC, GC pause times are no longer a big problem in Java. With sub-millisecond pause…

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

Predrag Gruevski

Independent Software Researcher at Trustfall
Linters are a type of database! They are a collection of lint rules — queries that look for rule violations…

Interaction Latency: Square’s User-Centric Mobile Performance Metric

Pierre-Yves Ricau

Android Distinguished Engineer at Block
Mobile performance metrics often take inspiration from the backend world and measure resource usage (CPU usage, memory usage, etc) and…

Chihuahua-Sized Load Tests!

Leandro Melendez

Developer Advocate at Grafana Labs
Because bigger isn’t always better. Especially nowadays.Do your teams need help accommodating those humongous load tests in your agile &…

How to Avoid Learning the Linux-Kernel Memory Model

Paul McKenney

Software Engineer at Meta
The Linux-kernel memory model (LKMM) is a powerful tool for developing highly concurrent Linux-kernel code, but it also has a…

MySQL Performance on Modern CPUs: Intel vs AMD vs ARM

Peter Zaitsev

Founder of Percona, Coroot, FerretDB
For years CPU choice for MySQL was pretty boring – just chose what Intel Made CPU you want. In recent…

How We Reduced the Startup Time for Turo’s Android App by 77%

Pavlo Stavytskyi

Google Developer Expert
The startup time of a mobile app is one of the most important indicators of its performance and has a…

99.99% of Your Traces are Trash

Paige Cruz

Senior Developer Advocate at Chronosphere
Distributed tracing is still finding its footing in many organizations today, one challenge to overcome is the data volume –…

High-Level Rust for Backend Programming

Adam Chalmers

Systems Engineer at Zoo
Some people say you should only use Rust where you can’t afford to use garbage collection. I disagree — Rust…

A Deep Dive Into Concurrent React

Matheus Albuquerque

Senior Software Engineer, Front-End at Medallia
Writing fluid user interfaces becomes more and more challenging as the application complexity increases. In this talk, we’ll explore how…

Ingesting in Rust

Armin Ronacher

Creator of Flask and Principal Architect at Sentry
At Sentry we handle hundreds of thousands of events a second — from tiny metric to huge memory dump. What…

The Latency Stack: Discovering Surprising Sources of Latency

Mark Gritter

Principal Engineer at Postman
Usually, when an API call is slow, developers blame ourselves and our code. We held a lock too long, or…

Building a 10x More Efficient Edge Platform

Felipe Huici

CEO and Co-Founder of Unikraft UG
Painful cold boots, terrible auto-scale times, minutes-long waits for compute nodes to be up: these are standard headaches that cloud…

Beyond Availability: The Seven Dimensions for Data Product SLOs

Emily Gorcenski

Principal Data Scientist at Thoughtworks
In the software world, we’re used to SLOs built around latency and availability. But in the data engineering universe, there…

Peak Performance at the Edge: Running Razorpay’s High-Scale API Gateway

Jay Pathak

Software Development Engineer at Razorpay
Razorpay caters to millions of API requests every day that are non-uniform in nature. As a key provider of financial…

Segment-Based Storage vs. Partition-Based Storage: Which is Better for Real-Time Data Streaming?

David Kjerrumgaard

Developer Advocate at StreamNative
Storage is a critical component of any real-time data streaming system, and the choice of storage model can significantly affect…

HTTP 3: Moving on From TCP

Brian Sletten

President at Bosatsu Consulting, Inc.
Any network class you have taken in the last thirty years will have highlighted that the application layer depends on…

Demanding the Impossible: Rigorous Database Benchmarking

Dmitrii Dolgov

Senior Software Engineer at Red Hat
It’s easy to conduct a misleading benchmark, and notoriously hard to design a correct and rigorous enough one. Have you…

Misery Metrics & Consequences

Gil Tene

CTO and Co-Founder of Azul Systems
Join Azul System’s Gil Tene as he defines “misery metrics,” which describe what happens when our production systems are operating…

Sharpening the Axe: The Primacy of Toolmaking

Bryan Cantrill

CTO of Oxide Computer Company
Oxide’s Bryan Cantrill weighs in on allowing engineers to make their own tools, resulting in better systems delivered faster and…

The Art of Macro Benchmarking: Evaluating Cloud Native Services Efficiency

Bartłomiej Płotka

Senior Software Engineer at Google
Benchmarking is hard, especially on a macro level that integrates multiple code components into one or multiple microservices. It’s challenging…

The Art of Event Driven Observability with OpenTelemetry

Henrik Rexed

Cloud Native Advocate at Dynatrace
Explore the various components of OpenTelemetry, examples of unuseful traces from event driven architecture, and the purpose/usage of span links…

P99 Pursuit: 8 Years of Battling P99 Latency

Dor Laor

CEO of ScyllaDB
ScyllaDB CEO Dor Laor covers principles for successful OSS projects like ScyllaDB, KVM, the Linux kernel and why they spurred…

From SLO to GOTY

Charity Majors

CTO of Honeycomb
Charity Majors shares the performance lessons we can all learn from game developers, who were among the first to run…

Linux Kernel vs DPDK: HTTP Performance Showdown

Marc Richards

Performance Engineer at Amazon Web Services
AWS’ Marc Richards uses an HTTP benchmark to compare performance of the Linux kernel networking stack with userspace networking doing…

Overcoming Variable Payloads to Optimize for Performance

Armin Ronacher

Creator of Flask and Principal Architect at Sentry
Hear from Sentry’s Armin Ronacher, creator of the Flask framework for Python, on how to optimize for performance when you…

Using eBPF for High-Performance Networking in Cilium

Liz Rice

Chief Open Source Officer, Isovalent at Cisco
Isovalent’s Liz Rice shows how and why Cilium bypasses the kernel using eBPF for Kubernetes and container orchestration networking, observability…

High-speed Database Throughput Using Apache Arrow Flight SQL

Kyle Porter

Architect at Dremio

James Duong

Architect at Dremio
Kyle Porter and James Duong of Bit Quill Technologies share how Flight SQL can push SQL query throughput beyond existing…

Square Engineering’s “Fail Fast, Retry Soon” Performance Optimization Technique

Omar Elgabry

Software Engineer at Square
Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in…

Clouds are Not Free: Guide to Observability-Driven Efficiency Optimizations

Bartłomiej Płotka

Senior Software Engineer at Google
Red Hat’s Bartłomiej Płotka explains how to find and uncover efficiency problems effectively using the power of modern cloud-native observability…

How a Database Looks from a Disk’s Perspective

Avi Kivity

CTO and Co-Founder of ScyllaDB
ScyllaDB’s CTO Avi Kivity dives into how high performance distributed systems such as modern databases can make best, most efficient…

Measuring the CPU Performance of Android Apps at Lyft

Pavlo Stavytskyi

Google Developer Expert
Hear from Pavlo Stavytskyi on how Lyft measures CPU load to improve app performance. What metrics they collect, plus how…

Speedup Your Code Through Asynchronous Programing

Sabina Smajlaj

Operations Developer at Hudson River Trading
Hudson River Trading’s Sabina Smajlaj demonstrates how to take advantage of programming languages’ asynchronous libraries with a few minor tweaks…

Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing

Steven Rostedt

Software Engineer at Google
Google’s Steve Rostedt discusses using tracing to analyze when the overhead from a Linux host running KVM is higher than…

A New IO Scheduler Algorithm for Mixed Workloads

Pavel Emelyanov

Principal Software Engineer, ScyllaDB
Discover how ScyllaDB, built on the highly asynchronous Seastar library, implemented an IO scheduler optimized for peak performance on modern…

Large-Scale, Semi-Automated Go Garbage Collection Tuning at Uber

Cristian Velazquez

Staff Site Reliability Engineer at Uber
Uber’s Cristian Velazquez talks about tuning garbage collection for Go to scale applications across 70,00 cores to maintain 30 mission-critical…

Why User-Mode Threads Are Good for Performance

Ron Pressler

Project Loom Technical Lead, Java Platform Group at Oracle
Hear from Oracle’s Ron Pressler how Java added virtual threads, an implementation of user-mode threads, to help write high-throughput servers.

Hardware Assisted Latency Investigations

Kshitij Doshi

Senior Principal Engineer, Intel Corportation

Harshad S Sane

Principal Software Engineer at Intel
Intel’s Harshad S Sane & Kshitij Doshi share new ways to use eBPF to better examine latency excursions.

Continuous Performance from Load Testing to SRE and Beyond

Leandro Melendez

Developer Advocate at Grafana Labs
Grafana k6’s Leandro Melendez explores how to use continuous methodologies, service structures, microservice tiers, cloud, and elasticity.

The Observant Developer — Continuous Feedback with OpenTelemetry

Roni Dover

CTO of Digma
Roni Dover shares practical ways that OpenTelemetry combined with open-source tools can be integrated into the modern development stack.

End-To-End Performance Testing, Profiling, and Analysis at Redis

Filipe Oliveira

Principal Performance Engineer at Redis
Learn how Redis developed an automated framework for performance regression testing, telemetry gathering, profiling, and data visualization upon code commit.

Keeping Latency Low for User-Defined Functions with WebAssembly

Piotr Sarna

Founding Engineer at poolside
Piotr Sarna describes how to integrate WebAssembly and Wasmtime into a C++ project in a latency-friendly manner by implementing UDFs…

Evaluating Performance In Go

William Kennedy

Managing Partner at Ardan Labs
William Kennedy provides a deep dive training on how to optimize Go’s concurrency and garbage collection.

How We Reduced Performance Tuning Time by Orders of Magnitude with Database Observability

Yuying Song

Database Performance Engineer at PingCAP
PingCap’s Database Performance Engineer Yuying will share how to measure latency in a distributed system using a top-down (holistic) approach,…

Implementing Highly Performant Distributed Aggregates

Michal Jadwiszczak

Software Engineer at ScyllaDB
ScyllaDB’s Michał Jadwiszczak explains how can you implement aggregate functions without hammering real-time availability and performance for other read/write operations.

Ultra-Low-Latency Web Rendering on the Edge

Malte Ubl

Chief Architect at Vercel
Vercel’s Malte Ubl will discuss the trade-offs of the new paradigm of rendering web pages in the edge, and look…

A Deep Dive into Query Performance

Peter Zaitsev

Founder of Percona, Coroot, FerretDB
Percona’s Peter Zaitsev explores overlooked and underappreciated ways to successfully establish a connection and get results to the queries promptly…

How Dashtable Helps Dragonfly Maintain Low Latency

Roman Gershman

Co-Founder of DragonflyDB
Roman Gershman explains how Dragonfly’s hastable implementation helps to keep its tail latency in check — including a look at…

Fast and Fault Tolerant

Michael Barker

Independent Consultant at Ephemeris Consulting
Michael Barker draws on knowledge from working on financial exchanges, messaging and clustering systems to describe a model that can…

Taming Go’s Memory Usage — and Avoiding a Rust Rewrite

Mark Gritter

Principal Engineer at Postman
Akita’s Mark Gritter goes against the current trends and describes why he and his team stuck with Golang and chose…

Tracking Syscall and Function Latency in Your k8s Cluster with eBPF

Matthew Lenhard

CTO of ContainIQ
ContainIQ’s Matthew Lenhard walks the audience through a real life performance tuning exercise, where we hunt down slow system calls…

Outrageous Performance: RageDB’s Experience with the Seastar Framework

Max De Marzi Jr.

Developer at RageDB
Learn how RageDB leveraged the Seastar framework to build an outrageously fast graph database in this talk by Max De…

Pitfalls in Writing High-Performance Systems in Rust

Marek Galovic

Staff Software Engineer at Pinecone
Pinecone’s Marek Galovic looks at common and maybe not so common pitfalls in writing high-performance distributed systems in Rust.

Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production

Henrik Rexed

Cloud Native Advocate at Dynatrace
Dynatrace’s Henrik Rexed uses production methods and Kubernetes settings useful to avoid outages, from chaos engineering, to observability and load…

Testing Persistent Storage Performance in Kubernetes with Sherlock

Sagy Volkov

Distinguished Performance Architect at Lightbits Labs
Lightbits Labs’ Sagy Volkov demonstrates how to use Sherlock, an open source platform written to test persistent NVMe/TCP storage in…

Aggregator Leaf Tailer: Bringing Data to Your Users with Ultra Low Latency

Jeffery Utter

Staff Software Developer at theScore
Discover how and why theScore built Datadex, an aggregator leaf tailer system built for geographically distributed, low-latency queries and real-time…

Properly Understanding Latency is Hard — What We Learned When We Did it Correctly

Brian Taylor

Principle Software Engineer at Optimizely
Optimizely’s Brian Taylor applies lessons of Gil Tene’s coordinated omission talk to understand the surprising sources of latency found in…

Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry

Antón Rodríguez

Principal Software Engineer at New Relic
New Relic’s Antón Rodríguez shows how Event-Driven Architectures can instrument apps using vendor-neutral APIs, libraries, and tools via OpenTelemetry.

C# as a System Language

Oren Eini

Founder & CEO of RavenDB
RavenDB’s Oren Eini discusses the features that make C# a viable system language for building high-end systems.

Retaining Goodput with Query Rate Limiting

Piotr Dulikowski

Senior Software Engineer, ScyllaDB
ScyllaDB’s Piotr Dulikowski walks through how they tackled a “hot partition” problem: a single partition accessed with disproportionate frequency that…

Improving Performance of Micro-Frontend Applications through Error Monitoring

Garrett Hamelin

Developer Advocate at Airbrake, a LogicMonitor Company
Airbrake’s Garret Hamelin walks you through some of the dos and don’ts for trying to reduce errors and improve performance…

It’s Time to Debloat the Cloud with Unikraft

Felipe Huici

CEO and Co-Founder of Unikraft UG
Felipe Huici introduces Unikraft, a cloud operating system that allows for easily building fully-tailored cloud-ready images that boot in a…

Building Efficient Multi-Threaded Filters for Faster SQL Queries

Vlad Ilyushchenko

Co-Founder and CTO at QuestDB
QuestDB’s Vlad Ilyushchenko will describe how they optimized their database performance using efficient zero garbage collection multithreaded query processing.

Performance Insights Into eBPF, Step by Step

Dmitrii Dolgov

Senior Software Engineer at Red Hat
Red Hat’s Dmitri Dolgov sheds light on using eBPF. How to collect execution metrics, profile programs and common pitfalls to…

cachegrand: A Take on High Performance Caching

Daniele Salvatore Albano

Senior Software Engineer II at Microsoft
Microsoft’s Daniele Salvatore Albano presents cachegrand, a SIMD-accelerated hashtable without locks or busy-wait loops using fibers, io_uring, and much more.

Throw Away Your Nines

Alex Hidalgo

Principal Reliability Advocate at Nobl9
You may encounter problems if you only think about “nines” setting service reliability targets. Throw away your nines. Let’s find…

The Role of Machine Learning In Cloud Native Performance Optimization

Brian Likosar

Global Director of Solutions Architecture at StormForge
StormForge’s Brian Likosar shows how machine learning can be used to optimally configure apps deployed in Kubernetes to ensure performance…

Capturing NIC and Kernel TX and RX Timestamps for Packets in Go

Blain Smith

Staff Software Engineer at Rocket Science
Rocket Science’s Blain Smith shows how to get better timestamp granularity from the NIC by directly sending and capturing data…

Cutting Through the Fog of Virtualization

Bernd Bandemer

Head of Data Science at Clockwork Systems Inc.
Clockwork Systems’ Bernd Bandemer details causes of cloud network latency, from its underlying infrastructure, to its physical topology and network…

Optimizing Servers for High-Throughput and Low-Latency at Dropbox

Alexey Ivanov

Software Engineer at Dapper Labs
Dapper Labs’ Alexey Ivanov explores layers of efficiency/performance optimizations from hardware, drivers, Linux kernel, library and application-level tunings.

Removing Implicit Deadlocks on a Thread-per-core Architecture with 2-phase Processing

Alex Gallego

CEO and Founder of Redpanda
Redpanda’s Alex Gallego will show how implicit limitations in asynchronous programming can be addressed by a 2-phase technique for resolving…

Apache Iceberg: An Architectural Look Under the Covers

Alex Merced

Developer Advocate at Dremio
Alex Merced, Developer Advocate at Dremio, describes the open data lakehouse architecture and performance-oriented capabilities of Apache Iceberg.

Three Perspectives on Measuring Latency

Geoffrey Beausire

Senior Site Reliability Engineer at Criteo
Discover from Criteo’s Geoffrey Beausire how to measures latency in key-value infrastructure from both server and client sides, as well…

Continuous Performance Regression Testing with JfrUnit

Gunnar Morling

Principal Software Engineer at Decodable
Gunnar Morling (Red Hat) explains how to use JfrUnit to track metrics that could impact application performance.

Realtime Indexing for Fast Queries on Massive Semi-Structured Data

Dhruba Borthakur

CTO of Rockset
Dhruba Borthakur (Rockset) explains how to combine lightweight transactions with real-time analytics to power a user-facing application.

OSNoise Tracer: Who Is Stealing My CPU Time?

Daniel Bristot de Oliveira

Principal Software Engineer at Red Hat
Daniel Bristot de Oliveira (Red Hat) explores operating system noise (the interference experienced by an application due to activities inside…

OSv Unikernel — Optimizing Guest OS to Run Stateless and Serverless Apps in the Cloud

Waldek Kozaczuk

OSv Committer
Waldek Kozaczuk talks about optimizing a guest OS to run stateless and serverless apps in the cloud for CNN’s video…

New Ways to Find Latency in Linux Using Tracing

Steven Rostedt

Software Engineer at Google
Steven Rostedt dives into new flexible and dynamic aspects of ftrace that can help expose latency issues.

How to Measure Latency

Heinrich Hartmann

Principal Engineer at Zalando
Heinrich Hartmann (Zalando) shares strategies for avoiding pitfalls with collecting, aggregating and analyzing latency data for monitoring and benchmarking.

Rust Is Safe. But Is It Fast?

Glauber Costa

Founder & CEO of Turso
Glauber Costa outlines pitfalls and best practices for developing Rust applications with low P99.

G1: To Infinity and Beyond

Stefan Johansson

Principle Member of Technical Staff at Oracle
Stefan Johansson (Oracle) provides insights on the G1 JVM garbage collector — what’s new, how it impacts performance, and what’s…

I/O Rings and You — Optimizing I/O on Windows

Yarden Shafir

Software Engineer at Crowdstrike
Yarden Shafir (Crowdstrike) introduces Windows’ implementation of I/O rings, demonstrating how it’s used, and discusses potential future additions.

Data Structures for High Resolution, Real-time Telemetry at Scale

Filipe Oliveira

Performance Engineer at Redis
Felipe Oliveira (Redis) explains how to use several OSS data structures to incorporate telemetry features at scale… and why they…

Scaling Apache Pulsar to 10 Petabytes/Day

Karthik Ramasamy

Senior Director of Engineering at Splunk
Karthik Ramaswamy (Splunk) demonstrates how data — including logs and metrics — can be processed at scale and speed with…

RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V

Kathy Giori

Ecosystem Engagement Lead at ZEDEDA

Roman Shaposhnik

Co-Founder of ZEDEDA Inc.
Roman and Kathy share their experience porting Alpine Linux and LF Edge EVE-OS to the new RISC-V architecture

Extreme HTTP Performance Tuning: 1.2M API req/s on a 4 vCPU EC2 Instance

Marc Richards

Performance Engineer at Talawah Solutions
Marc Richards shares the performance tuning steps that he took to serve 1.2M JSON requests per second from a 4…

Is It Faster to Go with Redpanda Transactions than Without Them?!

Denis Rystsov

Staff Engineer at Vectorized
Denis Rystsov shares how Redpanda optimized the Kafka API and pushed throughput of distributed transactions up to 8X beyond an…

Crimson: Ceph for the Age of NVMe and Persistent Memory

Orit Wasserman

Architect at Red Hat
Orit Wasserman (Red Hat) talks about implementing Seastar, a highly asynchronous engine as a new foundation for the Ceph distributed…

Performance Analysis and Troubleshooting Methodologies for Databases

Peter Zaitsev

CEO and Co-Founder of Percona
Peter Zaitsev (Percona) presents 3 performance analysis approaches + explained the best use cases for each.

Seastore: Next Generation Backing Store for Ceph

Sam Just

Senior Principal Software Engineer at Red Hat
Sam Just (Red Hat) shares how they architected their next-generation distributed file system to take advantage of emerging storage technologies…

Object Compaction in Cloud for High Yield

Tejas Chopra

Senior Software Engineer at Netflix
Tejas Chopra shares how Netflix gets massive volumes of media assets and metadata to the cloud fast and cost-efficiently.

Where Did All These Cycles Go?

Thomas Dullien

CEO of optimyze.cloud Inc.
Thomas Dullien (Optimyze.cloud) exposed all the hidden places where you can recover your wasted CPU resources.

Get Lower Latency and Higher Throughput for Java Applications

Simon Ritter

Deputy CTO at Azul Systems
Simon Ritter (Azul Systems) offers strategies for hitting p99 SLAs in Java — despite the various challenges presented by the…

What We Need to Unlearn about Persistent Storage

Pavel Emelyanov

Principal Software Engineer, ScyllaDB
Pavel Emelyanov (ScyllaDB) talks about ways to measure the performance of modern hardware and what it all means for database…

Avoiding Data Hotspots at Scale

Konstantin Osipov

Director of Software Engineering at ScyllaDB
Konstantine Osipov (ScyllaDB) addresses the tradeoffs between hash and range-based sharding.

Keeping Latency Low and Throughput High with Application-level Priority Management

Avi Kivity

CTO and Co-Founder of ScyllaDB
ScyllaDB CTO and co-founder Avi Kivity shows how high throughput and low latency can both be achieved in a single…

Using eBPF to Measure the k8s Cluster Health

Henrik Rexed

Cloud Native Advocate at Dynatrace
Henrik Rexed (Dynatrace) explains how to use Prometheus + eBPF to understand the inner behavior of Kubernetes clusters and workloads…

Continuous Go Profiling & Observability

Felix Geisendörfer

Staff Engineer at Datadog
Felix Geisendörfer (Datadog) digs into the unique aspects of the Go runtime and interoperability with tools like Linux perf and…

Unikraft: Fast, Specialized Unikernels the Easy Way

Felipe Huici

Chief Researcher at NEC Europe Laboratories GmbH
Felipe Huici (NEC Laboratories Europe) showcases the utility and design of UnikraftSDK.

Understanding Apache Kafka P99 Latency at Scale

Pere Urbón-Bayes

Senior Solutions Architect at Confluent
Pere Urbón-Bayes (Confluent) presents strategies for measuring, evaluating, and optimizing the performance of an Apache Kafka-based infrastructure.

High-Performance Networking Using eBPF, XDP, and io_uring

Bryan McCoid

Sr. Distributed Systems Engineer, Couchbase Inc.
Bryan McCoid outlines the ins and outs of Linux kernel tools such as io_uring, eBPF, and AF_XDP and how to…

DB Latency Using DRAM + PMem in App Direct & Memory Modes

Doug Hood

Consulting Member of Technical Staff at Oracle
Doug Hood (Oracle) compares the latency of DDR4 DRAM to that of Intel Optane Persistent Memory for in-memory database access.

Rust, Wright’s Law, and the Future of Low-Latency Systems

Bryan Cantrill

CTO of Oxide Computer Company
Bryan Cantrill on the rise of Rust-based systems, and the ceding of Moore’s Law to Wright’s Law and explain why…

Whoops! I Rewrote It in Rust

Brian Martin

Software Engineer at Twitter
Why and how Brian Pelikan rewrote Pelikan, Twitter’s open source and modular framework for in-memory caching, in Rust.

Let’s Fix Logging Once and for All

Peter Portante

Senior Principal Software Engineer at Red Hat
Peter Portante (Red Hat) presents a Linux kernel modification that gives the SRE and logging source owner greater control over…

Using SLOs for Continuous Performance Optimizations of Your k8s Workloads

Andreas Grabner

DevOps Activist at Dynatrace
Andreas Grabner (Dynatrace) shares how to use the CNCF Keptn project to automate SLO-based Performance Analysis as part of your…

Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System

Abel Gordon

Chief Systems Architect at Lightbits Labs
Abel Gordon’s overview on how Lightbits LightOS improves latency of high performance low latency NVMe based storage accessed over standard…
P99 Conf Logo
P99 CONF OCT. 22 + 23, 2025

Register for Your Free Ticket