NVIDIA is enhancing Microsoft’s Phi-3 Mini open language model by integrating NVIDIA TensorRT-LLM. This open-source library aims to optimize large language model inference when operating on NVIDIA GPUs, spanning from personal computers to cloud servers.
Phi-3 Mini offers the capacity for 10 times larger models compared to its predecessor, Phi-2, which was primarily limited to research applications. Workstations equipped with NVIDIA RTX GPUs and PCs featuring GeForce RTX GPUs now possess the computational horsepower to execute the model efficiently using either Windows DirectML or TensorRT-LLM.
Sporting 3.8 billion parameters and trained on a staggering 3.3 trillion tokens in just seven days across 512 NVIDIA H100 Tensor Core GPUs, Phi-3 Mini introduces two variants. These variants cater to different needs, with one accommodating 4k tokens and the other, 128k tokens, setting a new standard for handling very long contexts.
ALSO READ:
NVIDIA unveils RTX A400 and A1000 GPUs to improve workflows
NVIDIA unveils GenAI microservices for custom application development
The integration of Phi-3 Mini into the NVIDIA ecosystem extends beyond conventional computing domains. Developers engaged in projects involving autonomous robotics and embedded systems can leverage community-driven tutorials, such as those available on Jetson AI Lab, to harness the power of generative AI. Moreover, Phi-3 Mini’s compact design, with its 3.8 billion parameters, makes it exceptionally suitable for edge devices, offering efficiency without compromising on performance.
Phi-3 Mini
TensorRT-LLM, tailored to support Phi-3 Mini’s extended context window, incorporates numerous optimizations and kernels like LongRoPE, FP8, and inflight batching, enhancing inference throughput and reducing latency. NVIDIA plans to make these implementations accessible via the examples folder on GitHub, enabling developers to seamlessly transition to the TensorRT-LLM checkpoint format optimized for inference.
Emphasizing its commitment to open systems, NVIDIA underscores its active role in the open-source ecosystem, contributing to various projects and collaborating with foundations and standards bodies. With over 500 projects released under open-source licenses, NVIDIA continues to champion the development of open technologies and standards.
NVIDIA is enhancing Microsoft’s Phi-3 Mini open language model by integrating NVIDIA TensorRT-LLM. This open-source library aims to optimize large language model inference when operating on NVIDIA GPUs, spanning from personal computers to cloud servers.
Phi-3 Mini offers the capacity for 10 times larger models compared to its predecessor, Phi-2, which was primarily limited to research applications. Workstations equipped with NVIDIA RTX GPUs and PCs featuring GeForce RTX GPUs now possess the computational horsepower to execute the model efficiently using either Windows DirectML or TensorRT-LLM.
Integration of Phi-3 Mini
Sporting 3.8 billion parameters and trained on a staggering 3.3 trillion tokens in just seven days across 512 NVIDIA H100 Tensor Core GPUs, Phi-3 Mini introduces two variants. These variants cater to different needs, with one accommodating 4k tokens and the other, 128k tokens, setting a new standard for handling very long contexts.
The integration of Phi-3 Mini into the NVIDIA ecosystem extends beyond conventional computing domains. Developers engaged in projects involving autonomous robotics and embedded systems can leverage community-driven tutorials, such as those available on Jetson AI Lab, to harness the power of generative AI. Moreover, Phi-3 Mini’s compact design, with its 3.8 billion parameters, makes it exceptionally suitable for edge devices, offering efficiency without compromising on performance.
TensorRT-LLM, tailored to support Phi-3 Mini’s extended context window, incorporates numerous optimizations and kernels like LongRoPE, FP8, and inflight batching, enhancing inference throughput and reducing latency. NVIDIA plans to make these implementations accessible via the examples folder on GitHub, enabling developers to seamlessly transition to the TensorRT-LLM checkpoint format optimized for inference.
Emphasizing its commitment to open systems, NVIDIA underscores its active role in the open-source ecosystem, contributing to various projects and collaborating with foundations and standards bodies. With over 500 projects released under open-source licenses, NVIDIA continues to champion the development of open technologies and standards.

