SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM Inference Optimization

This talk will discuss why LLM inference is slow and key latency metrics. It also covers techniques that make LLM inference fast, including different batching, parallelism, and prompt caching. Not all latency problems are engineering problems though. This talk will also cover interesting tricks to hide latency at an application level.

31 minutes

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM Inference Optimization

Chip Huyen, Author of AI Engineering

Proudly supported by

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM Inference Optimization

Chip Huyen, Author of AI Engineering

Proudly supported by

Register for Your Free Ticket