Cache Me If You Can: How Grafana Labs Scaled Up Their Memcached 42x & Cut Costs Too

Our cloud database stores billions of files in object storage. With petabytes of data being queried every day, we started bumping into our cloud storage providers’ rate-limits, resulting in decreased reliability & performance. We had large memcached clusters in place to absorb & deamplify reads to object storage – but these could hold at most a few hours’ worth of data, and constantly churned due to the excessive volume of data passing through. The conclusion we came to was: we needed much larger caches, ideally without inflating our cloud costs and adding operational complexity.

I’ll show how we managed to increase our cache size by 45x and reduce our costs by using a little-known feature of memcached called “extstore”. Extstore enables offloading of objects to SSDs which can’t fit into memory. In this talk I’ll be covering how we use it, how to monitor it, why we chose it, and other considerations. I’ll also cover how we use ephemeral storage provided by public cloud vendors in the form of physically-attached SSDs with incredibly high throughput, low latency, and best of all – low cost!

This talk is also a story of how products evolve, and how we as a team are buying time in the short term to keep up our reliability while we evolve our storage design in the medium-long term.

21 minutes
Register now to access all 50+ P99 CONF videos and slide decks.
Watch this session from the P99 CONF livestream, plus get instant access to all of the P99 CONF sessions and decks.

Danny Kopping, Senior Software Engineer at Grafana Labs

Danny is an engineer at Grafana Labs, based in South Africa. He works on both the Loki open-source product and the Grafana Cloud Logs hosted service. His interests include Go, Linux, playing drums, and the outdoors.

P99 CONF OCT. 18 + 19, 2023

Register for Your Free Ticket