Xandr’s Ad-server handles over 400 billion daily ad requests from across the world wide web. Operating under a stringent Service Level Agreement (SLA), the majority of these requests are catered to within a 100-150 millisecond round-trip latency through an intricate ad auction process, each involving hundreds of competing advertisers. Key stages in this process, such as audience targeting, optimization of advertiser objectives, and ad selection are executed utilizing an assortment of sophisticated ML algorithms. Inferencing ML models in real-time and rendering predictions at such an unparalleled scale under the precise SLAs of an ad auction necessitates a resilient and prompt machine learning system.
In this session, I will discuss the challenges of building such a machine learning system that is characterized by low latency to support the high volume and high throughput demands of ad serving. I will cover how Xandr built an extensible, scalable system to supply real-time predictions integral to the ad auction process, leveraging ML models trained frequently on large amounts of constantly updating ad transaction data. I will also share the lessons learned from building such systems, including how to optimize performance, reduce latency, and ensure reliability.