Square Engineering’s “Fail Fast, Retry Soon” Performance Optimization Technique

Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.

19 minutes
Register now to access all 50+ P99 CONF videos and slide decks.
Watch this session from the P99 CONF livestream, plus get instant access to all of the P99 CONF sessions and decks.


This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques:

Omar Elgabry, Software Engineer at Square

A software engineer (B.S. CS & SWE, Jul '15), a writer, a teacher, a hackathon winner, with a polymorphic personality, born in Egypt, lived and worked in India, Turkey, and currently Canada.

P99 CONF OCT. 23 + 24, 2024

Register for Your Free Ticket