Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.
P99 CONF 2025 is coming Oct 22-23! Call for speakers is open.
Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.
This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques: