Nvidia Vera Rubin Enters Production

Nvidia Vera Rubin enters production and will ship to partners in H2 2026; ICMS memory needs could tighten NAND supply and shift component flows.

January 15, 2026·2 min read

View all news articles

Flat-vector server rack evoking Nvidia Vera Rubin NVL72 memory demand and potential NAND supply pressure.

KEY TAKEAWAYS

Nvidia Vera Rubin entered production with NVL72 rack systems slated for partner shipments in H2 2026.
ICMS needs about 1,152 TB NAND per NVL72, posing potential near-term pressure on global NAND supply.

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Add your email to receive our free daily newsletter. No spam, unsubscribe anytime.

Or subscribe with

NVIDIA Corp. said on Jan. 12, 2026, that Nvidia Vera Rubin, its new rack-scale AI platform, has entered production. The company expects availability in the second half of 2026 through cloud and server partners. Analyst projections highlight potential strain on NAND supply due to the platform’s large memory demands.

Vera Rubin Architecture and Performance

NVIDIA CEO Jensen Huang introduced Vera Rubin at CES 2026, positioning it as the system-level successor to Blackwell, designed for sustained, interactive AI inference workloads. The platform centers on the NVL72 server, which aggregates 72 GPUs linked by NVLink 6, delivering 3.6 terabytes per second (TB/s) of inter-GPU bandwidth.

The Rubin GPU contains about 336 billion transistors and features a third-generation Transformer Engine using NVFP4 precision. NVIDIA rates the chip at up to 50 petaFLOPS (PFLOPS) for inference and 35 PFLOPS for training, supported by HBM4 memory with 22 TB/s bandwidth. The Vera CPU complements the GPU with 88 Arm Olympus cores and 176 threads, paired with SOCAMM memory at 1.2 TB/s and NVLink-C2C links to GPUs at 1.8 TB/s.

A new memory tier, Inference Context Memory Storage (ICMS), holds key-value caches for stateful, agentic inference. NVIDIA claims this tier can boost throughput and power efficiency by up to five times on some workloads and increase effective token generation per GPU by up to ten times compared with Blackwell.

The platform’s networking and acceleration stack includes ConnectX-9 SuperNICs, BlueField-4 data processing units capable of 1.6 terabits per second (Tb/s) per GPU, and Spectrum-X Ethernet at 102.4 Tb/s.

Shipments and NAND Supply Implications

Production of the Vera Rubin platform has begun, with shipments expected in the second half of 2026 through server partners, major cloud providers, and AI labs such as AWS, Microsoft, Google, OpenAI, Meta, and xAI. Some analysts project initial shipments may slip to the fourth quarter of 2026, trailing competitors targeting earlier releases.

Analyst models assume each NVL72 server requires roughly 1,152 terabytes of NAND flash memory for ICMS. Under this assumption, shipments could reach about 30,000 units in 2026 and 100,000 in 2027. These volumes would represent approximately 2.8% and 9.3% of annual global NAND demand in those years.

The platform’s large per-server memory and system-level design support a long-term demand outlook for AI infrastructure. However, this also creates a potential near-term pressure point for NAND suppliers and large cloud buyers, given the substantial NAND capacity per server.

The Vera Rubin platform’s production and memory architecture highlight a critical supply-chain factor for investors monitoring AI infrastructure growth and component availability.

Nvidia Vera Rubin Enters Production

KEY TAKEAWAYS

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Vera Rubin Architecture and Performance

Shipments and NAND Supply Implications

Related Articles

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Read other top news stories