Editor’s note: Longtime P99 CONF community member (and ScyllaDB alum) Pekka Enberg wrote an entire book on latency — and we’re proud to sponsor 3 chapters from that book.
Get the Latency book excerpt PDF
Also, Pekka recently shared key takeaways from that book in a masterclass on Building Low Latency Apps (now available on demand).
Let’s continue our Latency book excerpts with more from Pekka’s caching chapter. It’s reprinted here with permission of the publisher.
***
Caching reduces access latency by providing a temporary copy of data that is faster to access than the primary storage. However, the faster the access latency is, the more expensive the storage typically is. For example, as you saw in table 1.1, DRAM access latency is 100–1000x lower than NVMe or SSD access latency. However, even high-end machine configurations typically have only a few hundred gigabytes of DRAM but have terabytes of storage. More importantly, the price per gigabyte for DRAM is ten times higher than that of SSDs.
As a result, you can only cache some of your dataset and must set some capacity limits on your cache. As long as there is space in the cache, you can keep adding data on a miss. However, when the cache is fully occupied, you must evict some elements from the cache on a miss.
The cache replacement policy determines what you throw out from the cache. The replacement decision is typically based on how long the element has been in the cache or on the access frequency of a component. Choosing the right cache replacement policy is critical for maintaining a high cache hit ratio and, therefore, low-latency access, but the decision can be workload-specific.
Least recently used (LRU)
The least recently used (LRU) replacement policy evicts data elements from the cache based on them not being used recently. LRU caching can maintain a high hit ratio for workloads with high temporal locality. If recently accessed data in the cache is likely to be accessed shortly, LRU will ensure that relevant data stays in the cache. On the other hand, if your workload has a low degree of locality, LRU can perform worse than not caching at all because your workload may bring data to the cache that you never reuse, resulting in high overhead from caching and evicting.
Figure 6.7 illustrates the LRU policy in action. In the first step, there is a get(A) operation to look up the cached value for key A, which makes A the most recently used element. The cache also has two other aspects, B and C, where the latter is the least recently used element and, therefore, the candidate for eviction.
In the second step, there is a get(C) operation to look up the cached value for key C, making that now the most recently used element. Element B becomes the least recently used one.
Finally, there is a get(D) operation to look up the cached value for the key D. As the cache is already fully occupied, element B, the least recently used one, is evicted, making room for D, which is now the most recently used element. Element A becomes the least recently used one and, therefore, the candidate for eviction if we again run out of space in the cache.

Figure 6.7 The least recently used (LRU) replacement policy evicts values from the cache based on when they were last accessed.
Least frequently used (LFU)
The least frequently used (LFU) replacement policy evicts data elements based on how frequently they are used. Unlike with LRU, the LFU replacement policy does not rely on the recency of data access but on the frequency. LFU can maintain a high cache hit ratio and, therefore, low-latency access for workloads where accesses to different elements are irregular. For example, LFU may be a good fit for a content delivery network (CDN) to ensure that the most popular assets are always in the cache over time, even if they have not been accessed recently.
Figure 6.8 shows an example of an LFU policy in action. In our example, the LFU cache records how many times an element has been accessed.
We first have a get(A) operation to look up the value for the key A, and at this point, A has been accessed 5 times, B has 2 accesses, and C has 10 accesses, making B the candidate for eviction.
In the second step, a get(C) operation increases the counter for C from 10 to 11, keeping B as the candidate for eviction.
Finally, in the third step, we have a get(D) operation, and as the cache is full, B is evicted to make room for D. The access counter for D is set to 1, making it the candidate for eviction.

Figure 6.8 The least frequently used (LFU) replacement policy evicts elements based on how frequently they are used.
First-in, first-out (FIFO) and SIEVE
The first-in, first-out (FIFO) replacement policy evicts data elements based on when they were added to the cache. When the cache is full, the element that was first added to the cache is evicted to make room for new elements. Although LRU has been considered the standard replacement policy, recent research suggests that a FIFO-based replacement technique with lazy promotion and quick demotion can outperform LRU.
With LRU, when a cache element is accessed, it is eagerly promoted by moving it to the head of the cache to ensure it remains in the cache. In contrast, the basic FIFO replacement policy has no promotion, as elements are evicted purely based on when they were added. However, you can extend basic FIFO with re-insertion, which lazily promotes elements in the cache at eviction time if they were accessed after the previous eviction round. If elements in the cache have not been used, FIFO quickly demotes elements, assuming that in most workloads, only a few elements are popular enough to warrant caching.
SIEVE is a recently invented cache replacement policy that is a variant of FIFO with re-insertion. Whereas FIFO with re-insertion adds elements at the head of the FIFO queue, SIEVE retains the element’s position at re-insertion.
Figure 6.9 shows the SIEVE replacement policy in action. In the first step, there is a get(A) operation to look up the cached value for A. The cache already has an element for A, so the lookup algorithm marks the A element as visited. The cache also maintains a pointer called hand, which points to the candidate element for eviction. In the first step, the hand points to the C element, which has not been visited.
The second step has a get(C) operation, and as C is already in the cache, it is marked as visited. The hand still points to C, a candidate for eviction.
The third step has a get(D) operation. However, the cache is full, which means an element needs to be evicted. The eviction algorithm first looks at C, pointed at by the hand, but it is not evicted because it has been visited. Instead, the algorithm marks C as not visited and moves the hand to B. As B has not been visited, it is evicted to make room for D, which is inserted at the head of the FIFO queue. The hand now points to A, currently the candidate for eviction. Of course, if eviction happens in the next step, A will only be set as not visited, and D will be evicted.

Figure 6.9 The SIEVE algorithm in action. The first-in, first-out (FIFO) replacement policy evicts elements based on when they were added to the cache. SIEVE is a variant of FIFO, which keeps track of cached elements that were accessed to avoid evicting hot elements.
