Building Low Latency ML Systems for Real-Time Model Predictions at Xandr

Xandr’s Ad-server handles over 400 billion daily ad requests from across the world wide web. Operating under a stringent Service Level Agreement (SLA), the majority of these requests are catered to within a 100-150 millisecond round-trip latency through an intricate ad auction process, each involving hundreds of competing advertisers. Key stages in this process, such as audience targeting, optimization of advertiser objectives, and ad selection are executed utilizing an assortment of sophisticated ML algorithms. Inferencing ML models in real-time and rendering predictions at such an unparalleled scale under the precise SLAs of an ad auction necessitates a resilient and prompt machine learning system.

In this session, I will discuss the challenges of building such a machine learning system that is characterized by low latency to support the high volume and high throughput demands of ad serving. I will cover how Xandr built an extensible, scalable system to supply real-time predictions integral to the ad auction process, leveraging ML models trained frequently on large amounts of constantly updating ad transaction data. I will also share the lessons learned from building such systems, including how to optimize performance, reduce latency, and ensure reliability.

26 minutes

Watch this session from the P99 CONF livestream, plus get instant access to all of the P99 CONF sessions and decks.

Building Low Latency ML Systems for Real-Time Model Predictions at Xandr

Chinmay Abhay Nerurkar, Principal Engineer at Microsoft

Proudly supported by

Building Low Latency ML Systems for Real-Time Model Predictions at Xandr

Chinmay Abhay Nerurkar, Principal Engineer at Microsoft

Proudly supported by

Register for Your Free Ticket

Registration form not loading? Make sure any browser or plugin privacy settings are disabled (Privacy Badger, Adblock, etc). Also try refreshing the page, or complete your registration with this link instead.