P99 Conf Logo

Square Engineering’s “Fail Fast, Retry Soon” Performance Optimization Technique

Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.

19 minutes


This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques:

Omar Elgabry, Software Engineer at Square

A software engineer (B.S. CS & SWE, Jul '15), a writer, a teacher, a hackathon winner, with a polymorphic personality, born in Egypt, lived and worked in India, Turkey, and currently Canada.

P99 CONF OCT. 18 + 19, 2023

Register for Your Free Ticket