Save the date! P99 CONF 2026 — October 21-22. Call for speakers is open!

SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

Square Engineering’s “Fail Fast, Retry Soon” Performance Optimization Technique

Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.

19 minutes

Resources

This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques:

Omar Elgabry, Software Engineer at Square

A software engineer (B.S. CS & SWE, Jul '15), a writer, a teacher, a hackathon winner, with a polymorphic personality, born in Egypt, lived and worked in India, Turkey, and currently Canada.