SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM KV Cache Offloading: Analysis and Practical Considerations

LLM deployments are driving massive GPU demand and cost. This talk presents a generic architecture for offloading KV-cache tensors to a disaggregated shared store, enabling GPU-initiated IO for efficient storage and retrieval. We’ll cover the system requirements for offloading and retrieval, their impact on platform design and performance, and a mathematical model for predicting gains across LLMs and hardware, supported by initial results.

19 minutes

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM KV Cache Offloading: Analysis and Practical Considerations

Eshcar Hillel, Principal Research Scientist at Pliops

Proudly supported by

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

LLM KV Cache Offloading: Analysis and Practical Considerations

Eshcar Hillel, Principal Research Scientist at Pliops

Proudly supported by

Register for Your Free Ticket