Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.
Learn how to build resilient systems, reduce failure rates, and improve application latency by employing one of the techniques in distributed systems: “fail fast, retry soon”.
This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques:
2025 ©P99 CONF
is brought to you by