ThirdAI is an early-stage startup dedicated to democratizing AI through algorithmic and software innovations that enable training and deploying large-scale neural networks on commodity CPU hardware. The core component of ThirdAI's efficient model training is our proprietary BOLT engine, a new deep learning framework built from scratch with sparsity as a first-class design principle. In certain tasks, ThirdAI's sparse deep learning models can even outperform the analogous dense architecture on GPUs in both training time and inference latency. In this talk, we introduce our new distributed data parallel engine powered by Ray Core to scale ThirdAI models to terabyte-scale datasets and billion-parameter models. We discuss how Ray enabled us to quickly build an industry-grade distributed training solution on top of BOLT with key features such as fault-tolerance, multiple modes of communication, and seamless scalability. In addition, we highlight the unique scientific challenges that arise from the challenge of performing distributed deep learning training on CPUs. Specifically, we highlight the fact that the unprecedented efficiency of ThirdAI's BOLT models leaves us with a considerable communication bottleneck, which we address through novel gradient compression techniques. Finally, we present results from our rigorous evaluation of distributed BOLT on the terabyte-sized Criteo dataset, where we observe near-linear scaling up to 200 nodes and training times 42x faster than TensorFlow-CPU while using only one-sixth of computing resources.
Anshumali Shrivastava is an associate professor in the computer science department at Rice University. He is also the Founder and CEO of ThirdAI Corp, a company that is democratizing Large AI (LLMs) to commodity hardware through software innovations. His broad research interests include probabilistic algorithms for resource-frugal deep learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of the National Science Foundation CAREER Award, a Young Investigator Award from the Air Force Office of Scientific Research, a machine learning research award from Amazon, and a Data Science Research
Come connect with the global community of thinkers and disruptors who are building and deploying the next generation of AI and ML applications.