Airbnb lists more than 6M properties across more than 100,000 cities and towns spanning 220 countries. LLMs are a strategic opportunity for Airbnb across multiple use cases. Airbnb's ML Platform team wanted to augment their existing ML platform to enable internal teams to effectively and efficiently use open source LLMs. This is difficult because LLMs present scalability challenges that push the boundaries of existing tooling.
Working with Ray, the Airbnb team is seeing improvements in developer productivity, and significantly better performance. Beyond that, they're now able to better utilize A100 GPUs on AWS for their training workloads, providing long term efficiency and cost savings for these strategic LLM applications. This session will describe before and after states and share real-world benchmarks demonstrating cloud efficiency, comparing Sagemaker to Ray, and also describe some of the applications that this work enables.
Shaowei Su is a Senior Software Engineer at Airbnb. He works on a large-scale offline Machine Learning (ML) compute platform that supports offline distributed training, hyperparameter optimization, and batch inference. They have also enabled online ML model serving using a variety of backend systems including TensorFlow Serving, NVIDIA Triton, and pure Python services. Finally, Shaowei and team are responsible for managing the entire ML model lifecycle including metadata management, versioning, and more. Mr. Su has been with Airbnb for more than 4 years and previously held a similar role at Yahoo.