Nvidia targets inference as AI’s next battleground with Groq 3 LPX

by Fx Technology March 18, 2026

written by Fx Technology March 18, 2026

It’s a big cost play, he pointed out, and it “has to happen everywhere, all the time, for all users.”

The next phase of inferencing

The new Groq 3 language processing units (LPUs) are based on intellectual property (IP) from Groq, which signed a $20 billion licensing agreement with Nvidia late last year. According to the chip company, a fleet of LPUs can function as a “giant single processor.”

While Rubin GPUs will continue to handle prefill (prompt processing), Groq’s LPX will now handle latency-sensitive portions of decode (response). Together, they can deliver a “new class of inference performance,” Nvidia says.

Each LPX rack features 256 LPUs with 128 GB of on-chip static random-access memory (SRAM), 150 terabyte per second (TB/s) bandwidth, chip-to-chip links and high-speed connections to NVL72, Nvidia’s liquid-cooled AI supercomputer. Combined, these can reduce latency to “near zero,” Nvidia claims.

The LPX integration with Vera Rubin AI factories will be available in the second half of this year.

Training versus inferencing

Training and inference stress infrastructure in very different ways, noted Sanchit Vir Gogia, chief analyst at Greyhound Research. While training rewards “massive parallelism and brute-force scale,” inferencing (especially for long context and interactive reasoning) is far more sensitive to latency, memory movement, cache behavior, concurrency, and cost per delivered token.

Source link

Nvidia targets inference as AI’s next battleground with Groq 3 LPX

The next phase of inferencing

Training versus inferencing

Microsoft shuffles more of its senior leadership – Computerworld

So Many AI Attacks, It Made Quantum Seem Easy

Related Articles

Leave a Comment Cancel Reply