Productionizing modern machine learning workloads is challenging. Not only do you need to train and optimize your models, but also find a way to serve them efficiently without too much operational cost. Ray Serve solves these complex requirements to enable you to go to production safely and at low cost: you can flexibly scale and coordinate multiple models, deploy and upgrade safely, and maximize your hardware utilization with minimal management overhead.
This talk will demonstrate Ray Serve’s production-ready capabilities, including a demo of serving an ML-powered application using Ray Serve on the Anyscale platform. Some highlights include improvements around scalability, high availability, fault tolerance, and observability.
• Learn about patterns of production ML serving and how Ray Serve is tailored to solve them.
• Hear how users in the community are using Ray Serve in production to lower their ML inference costs.
• Watch a real time demo of how to serve an ML application using Ray Serve on the Anyscale platform. This will highlight recent improvements around observability, autoscaling, and cost savings.
Shreyas is a software engineer at Anyscale working on Ray Serve, KubeRay, and the Anyscale platform. Before joining Anyscale, he was a graduate student at UC Berkeley.
Edward is a staff software engineer at Anyscale and co-author of Learning Ray. He has been working on Ray since 2019, contributing across the board to Ray Core, Ray Serve, and more recently the Anyscale managed platform.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.