Lightning Talks

50x Faster Fine-Tuning in 10 Lines of YAML with Ludwig and Ray

September 19, 12:15 PM - 12:30 PM
View Slides

Deep learning systems are consistently producing state-of-the-art results for tasks using unstructured data such as text, images, video, and audio, but unlocking this value in production requires taking best-in-class pretrained models (GPT, BERT, ViT, etc.) and tuning them to your domain-specific datasets and tasks. Ludwig is a low-code deep learning framework (developed and open sourced by Uber into the Linux Foundation) that integrates and scales natively with Ray to declaratively fine-tune state-of-the-art foundation models, customized to your domain-specific business data – including tabular metadata, or any other feature types.

In this talk, we present 3 ways to fine-tune powerful foundation models like LLMs on your data in less than 10 lines of YAML using Ludwig on Ray:

Modify the weights of a pretrained model to adapt them to a downstream task.

Keep the pretrained model weights fixed and train a stack of dense layers that sit downstream of the pretrained embeddings.

Use the pretrained model embeddings as inputs to a tree-based model (gradient-boosted machine).

We explore the tradeoffs between these approaches by comparing model quality against the training time / cost, and show you how Ludwig leverages the advanced capabilities of Ray AIR to provide out-of-the-box scale and optimized performance on any hardware.

Key takeaways from this talk include:

  • Use the Ludwig framework to flexibly train a model for any task in just a few lines of YAML, while retaining the flexibility to customize any and all parts of the model architecture.
  • Automatically scale your Ludwig training job out across a cluster of machines with zero code or config changes using native integration with Ray.
  • Apply best practices to speed up fine-tuning when modifying the pretrained weights with just a few lines of YAML, including: automatic mixed precision, batch-size auto-tuning, ghost batch normalization, low-rank adaptation, and distributed training.
  • Understand how Ludwig leverages Ray Train to provide out-of-the-box support for both data parallel (DDP) and model parallel (FSDP) training through a single parameter in a YAML configuration.
  • Enable Ludwig's encoder embedding cache capability, available through a single YAML parameter, to speed up fine-tuning with fixed pretrained weights by over 50x, and producing an ordinary model at the end of training ready to serve in production.
  • Understand how Ray Datasets parallelizes and scales the encoder embedding caching, making it possible to fine-tune state-of-the-art models on commodity CPU hardware.
  • Explore using the same cached pretrained embeddings to fine-tune a neural network or a gradient boosted tree, switching between the two through a single parameter in the Ludwig config, and leveraging Ray AIR's support for distributed PyTorch and distributed LightGBM.

Project repository:

About Travis

Travis Addair is co-founder and CTO of Predibase, a data-oriented low-code machine learning platform. Within the Linux Foundation, he serves as lead maintainer for the Horovod distributed deep learning framework and is a co-maintainer of the Ludwig declarative deep learning framework. In the past, he led Uber’s deep learning training team as part of the Michelangelo machine learning platform.

Travis Addair

Chief Technology Officer, Predibase Inc.
Photo of Ray Summit pillows
Ray Summit 23 logo

Ready to Register?

Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.

Photo of Ray pillows and Raydiate sign
Photo of Raydiate sign

Join the Conversation

Ready to get involved in the Ray community before the conference? Ask a question in the forums. Open a pull request. Or share why you’re excited with the hashtag #RaySummit on Twitter.