NVIDIA launches TensorRT 8, makes conversational AI smarter

NVIDIA is introducing TensorRT 8, the eighth generation of the company’s AI software, slashing inference time in half for language queries.

TensorRT 8’s optimizations deliver record-setting speed for language applications, running BERT-Large, one of the world’s most widely used transformer-based models, in 1.2 milliseconds. In the past, companies had to reduce their model size, which resulted in significantly less accurate results. Now, with TensorRT 8, companies can double or triple their model size to achieve dramatic improvements in accuracy, according to the vendor.

“AI models are growing exponentially more complex, and worldwide demand is surging for real-time applications that use AI. That makes it imperative for enterprises to deploy state-of-the-art inferencing solutions,” said Greg Estes, vice president of developer programs at NVIDIA. “The latest version of TensorRT introduces new capabilities that enable companies to deliver conversational AI applications to their customers with a level of quality and responsiveness that was never before possible.”

In addition to transformer optimizations, TensorRT 8’s breakthroughs in AI inference are made possible through two other key features.

Sparsity is a new performance technique in NVIDIA Ampere architecture GPUs to increase efficiency, allowing developers to accelerate their neural networks by reducing computational operations.

Quantization aware training enables developers to use trained models to run inference in INT8 precision without losing accuracy. This significantly reduces compute and storage overhead for efficient inference on Tensor Cores.

Industry leaders have embraced TensorRT for their deep learning inference applications in conversational AI and across a range of other fields, according to the vendor.

For more information about this news, visit www.nvidia.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Your KM challenges resolved right here!
Register NOW for London's KMWorld Europe 2026, 14 & 15 April

NVIDIA launches TensorRT 8, makes conversational AI smarter

Mining Business Knowledge From Unstructured Data

Checklist Report - Preparing for Agentic AI: KM Playbook

2026 State of KM & AI Report

More

Revolutionizing CX: The Evolving Role of KM & AI

Transforming KM with Modern Document Management

From Silos to Solutions: Unifying Your Company's Knowledge

Accelerating KM: Unleashing AI and Automation

More Webinars

Your KM challenges resolved right here!Register NOW for London's KMWorld Europe 2026, 14 & 15 April

NVIDIA launches TensorRT 8, makes conversational AI smarter

Your KM challenges resolved right here!
Register NOW for London's KMWorld Europe 2026, 14 & 15 April