SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM Inference Optimization

This talk will discuss why LLM inference is slow and key latency metrics. It also covers techniques that make LLM inference fast, including different batching, parallelism, and prompt caching. Not all latency problems are engineering problems though. This talk will also cover interesting tricks to hide latency at an application level.

31 minutes

Chip Huyen, Author of AI Engineering

I'm Chip Huyen, a writer and computer scientist. I'm building infrastructure or real-time ML. I also teach Machine Learning Systems Design at Stanford. Previously, I was with Snorkel AI, NVIDIA, Netflix, Primer, Baomoi.com (acquired by VNG). I helped launch Coc Coc - Vietnam’s second most popular web browser with 20+ million monthly active users. In my free time, I travel and write. After high school, I went to Brunei for a 3-day vacation which turned into a 3-year trip through Asia, Africa, and South America. During my trip, I worked as a Bollywood extra, a casino hostess, and a street performer. I’m the author of four bestselling Vietnamese books. I’m working on an English book on machine learning interviews.

P99 Conf Logo
P99 CONF OCT. 21 + 22, 2026

Register for Your Free Ticket

Registration includes free 30-day access to O’Reilly’s ebook library.