NVIDIA and Microsoft are set to build what is considered to be one of the most powerful AI supercomputers in the world on the Azure platform.
Azure’s cloud-based AI supercomputer includes powerful and scalable ND- and NC-series virtual machines optimized for (artificial intelligence) AI-distributed training and inference. It is the first public cloud to incorporate NVIDIA’s advanced AI stack, adding tens of thousands of NVIDIA A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s InfiniBand networking, and the NVIDIA AI Enterprise software suite to its platform.
As part of the collaboration, NVIDIA will utilize Azure’s scalable virtual machine instances to research and further accelerate advances in generative AI, a rapidly emerging area of AI in which foundational models like Megatron Turing NLG 530B are the basis for unsupervised, self-learning algorithms to create new text, code, digital images, video or audio.
“AI technology advances as well as industry adoption are accelerating. The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications,” said Manuvir Das, vice president of enterprise computing at NVIDIA. “Our collaboration with Microsoft will provide researchers and companies with state-of-the-art AI infrastructure and software to capitalize on the transformative power of AI.”
The companies will also collaborate to optimize Microsoft’s DeepSpeed deep learning optimization software. NVIDIA’s full stack of AI workflows and software development kits, optimized for Azure, will be made available to Azure enterprise customers.
“AI is fueling the next wave of automation across enterprises and industrial computing, enabling organizations to do more with less as they navigate economic uncertainties,” said Scott Guthrie, executive vice president of the Cloud + AI Group at Microsoft. “Our collaboration with NVIDIA unlocks the world’s most scalable supercomputer platform, which delivers state-of-the-art AI capabilities for every enterprise on Microsoft Azure.”
Microsoft Azure’s AI-optimized virtual machine instances are architected with NVIDIA’s most advanced data center GPUs and are the first public cloud instances to incorporate NVIDIA Quantum-2 400Gb/s InfiniBand networking. Customers can deploy thousands of GPUs in a single cluster to train even the most massive large language models, build the most complex recommender systems at scale, and enable generative AI at scale.
The current Azure instances feature NVIDIA Quantum 200Gb/s InfiniBand networking with NVIDIA A100 GPUs. Future ones will be integrated with NVIDIA Quantum-2 400Gb/s InfiniBand networking and NVIDIA H100 GPUs. Combined with Azure’s advanced compute cloud infrastructure, networking and storage, these AI-optimized offerings will provide scalable peak performance for AI training and deep learning inference workloads of any size.
The platform will support a broad range of AI applications and services, including Microsoft DeepSpeed and the NVIDIA AI Enterprise software suite.
Microsoft DeepSpeed will leverage the NVIDIA H100 Transformer Engine to accelerate transformer-based models used for large language models, generative AI, and writing computer code, among other applications. This technology applies 8-bit floating point precision capabilities to DeepSpeed to dramatically accelerate AI calculations for transformers — at twice the throughput of 16-bit operations.