NVIDIA Vera Rubin Supercomputer Challenges Rivals
NVIDIA Vera Rubin claims multi-fold training and inference gains and 10x lower token costs; H2 2026 availability could reshape datacenter economics.

KEY TAKEAWAYS
- NVL72 bundles six chips into a rack with 72 Rubin GPUs and 36 Vera CPUs.
- Per-GPU specs list 50 PFLOPS NVFP4 inference and 17.5 PFLOPS FP8 training, roughly 5x and 3.5x Blackwell.
- Company materials claim 10x lower inference token cost and 4x fewer GPUs for MoE training on cited benchmark.
HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX
Add your email to receive our free daily newsletter. No spam, unsubscribe anytime.
NVIDIA Corp. (NVDA) unveiled the Vera Rubin platform at CES on Jan. 5, 2026, presenting a rack-scale supercomputer that integrates six custom chips into the NVL72 system to accelerate AI training and inference while significantly reducing token costs. This positions NVIDIA ahead in integrated AI infrastructure.
Specs and Performance
NVIDIA said in a press release that Vera Rubin combines a Vera CPU, Rubin GPU, NVLink-6 switch, ConnectX-9 SuperNIC, BlueField-4 data processing unit (DPU), and Spectrum-6 Ethernet switch. The chips are fabricated on TSMC’s 3-nanometer process.
The NVL72 rack includes 72 Rubin GPUs, 36 Vera CPUs, 20.7 terabytes of HBM4 memory, 54 terabytes of LPDDR5X memory, and nine NVSwitch-6 blades. Per GPU, the system delivers 50 petaFLOPS of NVFP4 inference performance and 17.5 petaFLOPS of FP8 training performance, supported by 22 terabytes per second of HBM4 bandwidth and a 3.6 terabytes per second GPU-to-GPU NVLink connection. These specifications translate into roughly a fivefold inference improvement and a 3.5-fold training improvement compared with NVIDIA’s Blackwell platform, with an aggregate NVLink capacity on the NVL72 rack of about 260 terabytes per second.
Product materials claim a tenfold reduction in inference token cost relative to Blackwell and state that mixtures-of-experts (MoE) training on a cited 10-trillion-parameter, 100-trillion-token, one-month benchmark can require one-quarter the GPU count. The company projects this configuration will materially reduce energy consumption at ultra-large MoE scale.
NVIDIA’s developer blog details architectural advances including NVFP4 tensor cores that dynamically adjust data precision per Transformer layer, SOCAMM modular LPDDR5X memory for improved serviceability, rack-scale confidential computing covering CPU, GPU, and NVLink domains, and in-network collective-operation acceleration embedded in the NVLink-6 switch.
Availability and Market Impact
CEO Jensen Huang said the platform is in production and that customers will be able to begin trials soon. The company targets general availability in the second half of 2026. Nebius (NASDAQ: NBIS), an NVIDIA Cloud Partner, plans to deploy the Vera Rubin NVL72 across U.S. and European data centers starting in that timeframe through its Nebius AI Cloud and Nebius Token Factory.
Analysts have noted that the integrated, rack-scale approach could create a competitive moat compared with standalone chips. If NVIDIA’s performance and cost claims and early partner commitments hold, Vera Rubin could widen the company’s platform advantage and significantly alter the economics of large-scale AI deployments.





