SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM KV Cache Offloading: Analysis and Practical Considerations

LLM deployments are driving massive GPU demand and cost. This talk presents a generic architecture for offloading KV-cache tensors to a disaggregated shared store, enabling GPU-initiated IO for efficient storage and retrieval. We’ll cover the system requirements for offloading and retrieval, their impact on platform design and performance, and a mathematical model for predicting gains across LLMs and hardware, supported by initial results.

19 minutes
Register for access to all 60+ sessions available on demand.
Fill out the form to watch this session from the P99 CONF 2025 livestream. You’ll also get access to all available recordings.

Eshcar Hillel, Principal Research Scientist at Pliops

Eshcar Hillel is a Principal Research Scientist at Pliops.