The Latency Stack: Discovering Surprising Sources of Latency

Usually, when an API call is slow, developers blame ourselves and our code. We held a lock too long, or used a blocking operation, or built an inefficient query. But often, the simple picture of latency as “the time a server takes to process a message” hides a great deal of end-to-end complexity. Debugging tail latencies requires unpacking the abstractions that we normally ignore: virtualization, hidden queues, and network behavior.

In this talk, I’ll describe how developers can diagnose more sources of delay and failure by building a more realistic and broad understanding of networked services. I’ll give some real-world cases when high end-to-end latency or elevated failure rates occurred due to factors we ordinarily might not even measure. Some examples include TCP SYN retransmission; virtualization on the client; and surprising behavior from AWS load balancers. Unfortunately, many measurement techniques don’t cover anything but the portion most directly under developer control. But developers can do better by comparing multiple measurements, applying Little’s law, investing in eBPF probes, and paying attention to the network layer.

Understanding API performance to find and fix issues faster ultimately means understanding the entire stack: the client, your code, and the underlying infrastructure.

17 minutes
Register now to access all 50+ P99 CONF videos and slide decks.
Watch this session from the P99 CONF livestream, plus get instant access to all of the P99 CONF sessions and decks.

Mark Gritter, Principal Engineer at Postman

Mark is on startup #4, having previously worked on streaming video at Kealia; VM-aware flash data storage at Tintri; observability on the HashiCorp Vault team; and now API observability at Akita Software (now Postman). His non-work interests include combinatorial games, generative systems, and gardening.

P99 CONF OCT. 18 + 19, 2023

Register for Your Free Ticket