Serverless platforms scale beautifully until they don’t. In event-driven architectures, unpredictable P99 latencies often emerge from cold starts, retries, uneven shard processing, or misconfigured concurrency controls. These “long tail” latency spikes can go unnoticed in dashboards but wreak havoc on end-user experience and downstream systems.
In this talk, I’ll share engineering strategies we’ve used to tame tail latencies in large-scale serverless event pipelines. We’ll look at techniques like shuffle sharding to reduce noisy-neighbor effects, adaptive token management to avoid timeouts during AI inference, and observability patterns that help catch latency cliffs before they hit production.
Expect practical code-level takeaways and architecture patterns designed to bring performance predictability to inherently bursty, decoupled systems. Whether you’re building real-time data pipelines or transactional event workflows, this talk will equip you to chase—and tame—those elusive P99s.


