Training 7: Real-time ML - Deploying Online Inference Pipelines (3 hours)

1:00 PM - 4:00 PM
Level: Beginner to Intermediate
Hands-on Lab
Software Engineer, ML Engineer, Deployment Engineer
Ray Serve

Once our AI/ML models are ready for deployment, that's when the fun really starts. We need our AI-powered services to be resilient and efficient, scalable to demand and adaptable to heterogeneous environments (like using GPUs or TPUs as effectively as possible). Moreover, when we build applications around online inference, we often need to integrate different services: multiple models, data sources, business logic, and more.

Ray Serve was built so that we can easily overcome all of those challenges.

In this class we'll learn to use Ray Serve to compose online inference applications meeting all of these requirements and more. We'll build services that integrate with each other while autoscaling individually, even supporting individual hardware and software requirements -- all using regular Python and often with just one new line of code.

Learning Outcomes

  • Develop a deep understanding of the various architectural components of Ray Serve.
  • Use deployments and deployment graphs API to serve machine learning models in production environments for online inference.
  • Combine multiple models to build complex logic, allowing for a more sophisticated machine learning pipeline.


  • Some experience with deploying models in production environments.
  • Basic familiarity with machine learning concepts, such as online inference.
  • Intermediate-level programming experience with Python.
  • Prior experience with Ray or Ray Serve is not required, but participants with some experience in these frameworks will have an advantage in understanding the more advanced topics that will be covered in the training.
Photo of a woman giving a talk at a conference