SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM KV Cache Offloading: Analysis and Practical Considerations

LLM deployments are driving massive GPU demand and cost. This talk presents a generic architecture for offloading KV-cache tensors to a disaggregated shared store, enabling GPU-initiated IO for efficient storage and retrieval. We’ll cover the system requirements for offloading and retrieval, their impact on platform design and performance, and a mathematical model for predicting gains across LLMs and hardware, supported by initial results.

19 minutes

Eshcar Hillel, Principal Research Scientist at Pliops

Eshcar Hillel is a Principal Research Scientist at Pliops.

P99 Conf Logo
P99 CONF OCT. 21 + 22, 2026

Register for Your Free Ticket

Registration includes free 30-day access to O’Reilly’s ebook library.