SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

Minimizing Request Latency of Self-Hosted ML Models

Join our session on minimizing latency in self-hosted #ML models in cloud environments. Learn strategies for deploying Deepgram’s speech-to-text models on your hardware, including concurrency limits, auto-scaling, input chunk granularity, and efficient model loading. Optimize your ML inference.

20 minutes

Julia Kroll, Applied Engineer at Deepgram

Julia Kroll is an Applied Engineer at Deepgram where she provides engineering and product expertise on speech-to-text and voice AI, enabling developers to use language as the universal interface between humans and machines. She previously worked as a Senior Machine Learning Engineer creating natural-sounding AI voices, following five years at Amazon, where she contributed to machine learning and data engineering for AWS and Alexa. She holds two computer science degrees, a master's from the University of Wisconsin-Madison and a bachelor's from Carleton College. Her interests lie at the intersection of technology, linguistics, and society.

P99 Conf Logo
P99 CONF OCT. 22 + 23, 2025

Register for Your Free Ticket