Emerging AI/ML workflows, like the end-to-end cycle to train and deploy Foundation Models, are increasingly complex tasks with wide-ranging compute and data requirements. In this fundamental paradigm shift, a single and typically very large model is trained and adapted to work on many different specific tasks. From data cleaning and preparation, to large-scale distributed training, to fine-tuning and validation, to scalable inferencing, the current reality is that working with foundation models is a remarkably complex task, involving fragmented software tooling that often requires extensive expertise to be deployed and operated at large scale. In this talk, we will show how we simplified the end-to-end life cycle of foundation models with a cloud-native, and scalable stack for training, fine- and prompt-tuning, and inferencing, realized with Red Hat OpenShift Data Science (RHODS). We will give an overview of how we are introducing new open source components, like CodeFlare SDK, Multi-cloud App Dispatcher, and InstaScale and integrating with Ray and PyTorch to enable large scale data preparation, training and validation. Once models are validated, we will show how our inference stack, based on ModelMesh and KServe, can be used to deploy models in production with RHODS model serving, and how we use Data Science Pipelines to orchestrate models, including versioning and tracking in production deployment. We will also show how we operate this stack, from public cloud to on-premise, and how we are leveraging it to enable and accelerate the value of foundation models in a range of use cases, including success stories highlighting the benefits of this full stack for end-to-end life cycle of foundation models.
Dr. Costa is a Principal Research Scientist at IBM T. J. Watson Research Center, where he leads efforts to build next-generation serverless platform for AI/ML and HPC workflows. He is the tech lead for IBM Research's Foundation Model Stack for training and validation. He has been involved in multiple projects in the areas of large-scale AI/ML, HPC and analytics, including the BlueGene/Q system, the Active Memory Cube (AMC) architecture for in-memory processing, and DoE ORNL's Summit and LLNL's Sierra supercomputer systems, among other projects with clients and academic partners.
Taneem is a Principal Software Engineer, and Engineering Manager at Red Hat. He focuses on Red Hat Open Data Hub project to accelerate hybrid Artificial Intelligence and Machine Learning workload on OpenShift Container Platform. His team is responsible for model serving (kserve), XAI (TrustyAI and AIF360), integration of partner AI accelerators for OpenShift Data Science and Open Data Hub on OpenShift Container Platform.
Nick is a Senior Research Engineer focused on serving large language models at scale. He previously led the architecture and development of distributed machine learning infrastructure supporting key IBM AI cloud products and services including Watson Assistant, Watson Discovery and Watson Natural Language Understanding. He designed and implemented the Model-Mesh serving framework that supports hundreds of thousands of models, now a key component of the KServe open source project. He is an author of and contributor to other open source projects, and a committer for the popular Netty java networking library.
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.