SESSION ON-DEMAND

All Things P99

The event for developers who care about P99 percentiles and high-performance, low-latency applications

Minimizing Request Latency of Self-Hosted ML Models

Join our session on minimizing latency in self-hosted #ML models in cloud environments. Learn strategies for deploying Deepgram’s speech-to-text models on your hardware, including concurrency limits, auto-scaling, input chunk granularity, and efficient model loading. Optimize your ML inference.

20 minutes
Register for access to all 60+ sessions available on demand.
Fill out the form to watch this session from the P99 CONF 2025 livestream. You’ll also get access to all available recordings.

Julia Kroll, Applied Engineer at Deepgram

Julia Kroll is an Applied Engineer at Deepgram where she provides engineering and product expertise on speech-to-text and voice AI, enabling developers to use language as the universal interface between humans and machines. She previously worked as a Senior Machine Learning Engineer creating natural-sounding AI voices, following five years at Amazon, where she contributed to machine learning and data engineering for AWS and Alexa. She holds two computer science degrees, a master's from the University of Wisconsin-Madison and a bachelor's from Carleton College. Her interests lie at the intersection of technology, linguistics, and society.