Analytics has moved from internal dashboards to a dashboard inside the product, providing a personalized experience for each user, be it the LinkedIn profile views or Uber’s online order management and inventory. Given the requirement of sub-millisecond response times on user-facing apps, how does one ensure fast analytics on large volumes of data?
There are two main pieces when it comes to data, be it for analytics or transactions – the memory and the disk. Memory or RAM enables fast access to data during active processing, while disk storage offers a much larger storage capacity compared to memory. Both memory and disk are critical components of data processing, each serving different purposes in managing and manipulating data effectively.
In this talk, I will discuss the synergy required between memory and disk to achieve efficient data processing. I will establish a mental model to reason about data organization in memory and disk, for various data access patterns. Further, I will discuss general techniques that databases use for efficient storage and retrieval of data.