20+ Low-Latency Engineering Case Studies: Netflix, Twitter, TikTok, Square, Uber & More

Share This Post

P99 CONF 2023 is now a wrap! You can (re)watch all videos and access the decks now.

With over 60 tech talks on industry trends, performance optimization trainings, and insider insights on new tracing tools and measurement techniques, P99 CONF truly has something for every performance-minded engineer. As a category, “engineering case study” sessions are historically among the most watched (and discussed) sessions. At P99 CONF 2023, we’re thrilled to host an extensive spectrum of engineers sharing how they and their teams tackled their toughest performance challenges.

Here’s a taste of the talks from P99 CONF 23

How Netflix Builds High Performance Applications at Global Scale

Prasanna Vijayanathan covers how Netflix built high performance applications that work for every user, every time – including a technical look at the data and modeling techniques they use.

Measuring the Impact of Network Latency at Twitter

Widya Salim, Zhen Li, and Victor Ma outline the causal impact analysis, framework, and key learnings used to quantify the impact of reducing Twitter’s network latency.

Architecting a High-Performance (Open Source) Distributed Message Queuing System in C++

Vitaly Dzhitenov presents a new open source distributed message queuing system, developed and used by Bloomberg, that provides highly-performant queues to applications for asynchronous, efficient, and reliable communication.

3 Gb/s L7 Egress Traffic Inspection at TikTok Using Squid & Go on K8s

Daniel Haimanot shares how TikTok achieved real-time privacy compliance inspecting 3 Gb/s L7 egress traffic in-band using Squid and Golang on K8s.

Cache Me If You Can: How Grafana Labs Scaled Up Their Memcached 42x & Cut Costs Too

Danny Kopping walks us through how Grafana Labs managed to increase their cache size by 42x and reduce costs by using a little-known feature of memcached called “extstore”.

Optimizing for Tail Latency and Saturation at Uber Scale: Macro and Micro Considerations

Ranjib Dey talks about Uber’s micro (JVM, Go GC tuning, concurrency tuning) and macro (architectural: consistency, caching, sharding…) lessons learned optimizing cloud-native microservices for tail latency and efficiency.

Taming P99 Latencies at Lyft: Tuning Low-Latency Online Feature Stores

Bhanu Renukuntla shares challenges and strategies of tuning low latency online feature stores to tame P99 latencies, shedding light on the importance of choosing the right data model.

Interaction Latency: Square’s User-Centric Mobile Performance Metric

Pierre-Yves Ricau shares why and how to track “Interaction Latency,” a user-centric mobile performance metric that Square uses instead of app launch time and smoothness.

Square’s Lessons Learned from Implementing a Key-Value Store with Raft

Omar Elgabry offers up the micro-lessons engineers can learn from Square’s experience building fault-tolerant, strongly consistent distributed systems using Raft.

How We Reduced the Startup Time for Turo’s Android App by 77%

Pavlo Stavytskyi details how Turo engineers reduced Android app startup time by 77%, including how to apply best practices and Android developer tools to improve the startup performance of your own Android apps.

From 1M to 1B Features Per Second: Scaling ShareChat’s ML Feature Store

Ivan Burmistrov and Andrei Manakov present a case study in building a low-latency ML feature store (using ScyllaDB, Golang, and Flink) that handles 1B features per second, including data modeling tips for performance & scalability and caching strategies.

Conquering Load Balancing: Experiences from ScyllaDB Drivers

Piotr Grabowski delves into the intricacies of load balancing within ScyllaDB drivers, sharing how we employed the Power of Two Choices algorithm, optimized the implementation of load balancing in Rust Driver, and more.

Building Low Latency ML Systems for Real-Time Model Predictions at Xandr

Chinmay Abhay Nerurkar Moussa Taifi outline the challenges of building an ML system with the low latency required to support the high volume and high throughput demands of ad serving at Xandr, the advertising and analytics subsidiary of Microsoft.

Peak Performance at the Edge: Running Razorpay’s High-Scale API Gateway

Jay Pathak details how RazorPay solved availability and authorization challenges using their API gateway, plus insights on how their rate limiter plugin handles more than 200K RPS workloads seamlessly with latency under sub milliseconds.

P99 Publish Performance in a Multi-Cloud NATS.io System

Derek Collison walks through the strategies and improvements made to the NATS server to accomplish P99 goals for persistent publishing to NATS JetStream that was replicated across all three major cloud providers over private networks.

Adventures in Thread-per-Core Async with Redpanda and Seastar

Travis Downs looks at the practical experience of building high performance systems with C++20 in an asynchronous runtime, the unexpected simplicity that can come from strictly mapping data to cores, and the challenges & tradeoffs in adopting a thread-per-core architecture.

Ingesting in Rust at Sentry

Armin Ronacher shares Sentry’s experience building a Rust based ingestion service that handles hundreds of thousands of events per second with low latency globally.

A Deterministic Walk Down TigerBeetle’s main() Street

Aleksei Kladov dives into how TigerBeetle used Zig to implement a fully deterministic distributed system that will never fail with an out of memory error, for predictable performance and 700x faster tests!

Cost-Effective Burst Scaling For Distributed Query Execution

Dan Harris presents a case study in building a distributed execution model that can dynamically execute across both AWS Lambda and EC2 resources – shedding excess load to lambda functions to preserve low latency while scaling EC2 capacity to manage costs.

High-Level Rust for Backend Programming

Adam Chalmers shares why Rust is a great language for writing API servers and backends, based on his experiences at Cloudflare and KittyCAD.

Mitigating the Impact of State Management in Cloud Stream Processing Systems

Yingjun Wu outlines how RisingWave Labs is addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture.

Making Python 100x Faster with Less Than 100 Lines of Rust

Ohad Ravid shares how the Trigo team was able to bridge the Python-Rust performance gap using just a bit of Rust and some profiling – ultimately improving performance 100x.

5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X

Predrag Gruevski offers a case study in using database ideas to build a linter that looks for breaking changes in Rust library APIs.

More To Explore

Low-Latency & Performance-Obsessed Engineers: Share Your Insights At P99 CONF

Obsessed with high performance and low latency engineering? Join the P99 CONF community to discuss your experiments, optimizations, ideas, and lessons learned with ~20K like-minded engineers…

Cynthia Dunlop March 20, 2024

Bun, Tokio, Turso Creators on Rust vs Zig

What transpired when Glauber Costa (Turso co-founder), Jarred Sumner (developer of Bun.js and CEO of Oven) and Carl Lerche (developer of Tokio and major Rust

Cynthia Dunlop November 14, 2023

20+ Low-Latency Engineering Case Studies: Netflix, Twitter, TikTok, Square, Uber & More

Share This Post

<img decoding="async" class="alignright wp-image-3942 size-full" src="https://www.p99conf.io/wp-content/uploads/2023/09/Bloomberg2.png" alt="" width="205" height="57" />Architecting a High-Performance (Open Source) Distributed Message Queuing System in C++