Amazon Trainium3 Raises AI Stakes, Hints Nvidia Tie

Amazon Trainium3 debuts with large per-chip compute and memory gains; UltraServers and Red Hat support shift cloud AI cost and scale dynamics.

December 02, 2025·3 min read
View all news articles
Flat filled vector server chip with pooled memory bridge illustrating Amazon Trainium3 memory and interconnect gains.

KEY TAKEAWAYS

  • Trainium3 had eight NeuronCore-v4 cores, 144 GiB device memory and 4.9 TB/sec device memory bandwidth.
  • UltraServers with Trainium3 delivered 4.4x more compute and 3.9x higher memory bandwidth versus Trainium2.
  • Red Hat said its AI Inference Server can yield 30-40% better price-performance versus comparable GPU EC2 instances.

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Add your email to receive our free daily newsletter. No spam, unsubscribe anytime.

Or subscribe with

Amazon said Trainium3, its fourth-generation AI training chip, became generally available on Dec. 2, 2025, at re:Invent. The chip delivers significant per-device compute and memory gains that Amazon Web Services (AWS) said will power UltraServers for production AI workloads.

Trainium3 Hardware Advances and System Performance

Trainium3 features eight NeuronCore-v4 cores, 144 GiB of on-device memory, and 4.9 terabytes per second (TB/sec) of device memory bandwidth. This represents a 1.5x increase in memory capacity and a 1.7x boost in memory bandwidth compared with Trainium2. Direct memory access (DMA) bandwidth also improved by 1.4x to 4.9 TB/sec.

The chip adds a NeuronLink-v4 interconnect providing 2.56 TB/sec per-device bandwidth for scale-out training and pooled memory. It uses 16 collective communication cores (CC-Cores) to coordinate data transfer across chips and servers.

Trainium3 delivers 2,517 trillion floating-point operations per second (TFLOPS) for FP8 and the new MXFP4 numeric format, 671 TFLOPS for BF16/FP16/TF32, and 183 TFLOPS for FP32. FP8 throughput roughly doubles that of the prior generation, while MXFP4 is introduced as a new capability.

The architecture enhances programmability with support for dynamic shapes, control flow, user-programmable rounding modes, and custom operators implemented via GPSIMD engines, broadening model and operator compatibility.

At the system level, AWS’s UltraServers powered by Trainium3 offer aggregate gains over the previous generation, including 4.4x more compute, 3.9x higher memory bandwidth, and 3.5x more tokens per megawatt, an efficiency metric for large-scale training deployments.

Ecosystem Integration and Scale

Red Hat announced that its Red Hat AI Inference Server, powered by vLLM, will run on AWS Inferentia2 and Trainium3 chips. The company said this integration delivers 30–40% better price-performance than comparable GPU-based Amazon EC2 instances, aiming to provide a common inference layer supporting any generative AI model.

AWS reported deploying more than 1 million Trainium processors, with customers increasingly using the chips for inference workloads, expanding beyond their original training-focused role.

Over the past year, AWS added 3.8 gigawatts of data-center capacity—the largest increase among competitors—and expanded its private network backbone by 50% to over 9 million kilometers of terrestrial and subsea cable. These infrastructure investments support scaling production AI.

AWS also previewed Trainium4, targeting sixfold FP4 performance, fourfold memory bandwidth, and double the memory capacity compared with Trainium3. The company signaled plans to integrate Nvidia technology into future Trainium generations, indicating a multi-vendor silicon strategy rather than exclusive displacement.

Separately, AWS introduced "AI Factories," a managed, customer-specific infrastructure service built and scaled by AWS. This service will run on Trainium and other custom silicon, providing a route for customers to deploy production AI at scale.

Red Hat described its AI Inference Server as designed "to deliver a common inference layer that can support any gen AI model," helping customers achieve higher performance, lower latency, and cost-effective scaling for production AI deployments.

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Add your email to receive our free daily newsletter. No spam, unsubscribe anytime.

Or subscribe with

Read other top news stories

Trump Media Interim CEO Kevin McGurn Named

Trump Media Interim CEO Kevin McGurn Named

Trump Media Interim CEO Kevin McGurn took the role April 21, 2026 and his interim status raises leadership and deal risk around the $6 billion merger.

SpaceX Cursor Acquisition Option Raises IPO Stakes

SpaceX Cursor Acquisition Option Raises IPO Stakes

SpaceX Cursor acquisition option pairs Cursor's developer reach with Colossus compute and could force IPO disclosure, shifting investor positioning.

Trump Spirit Airlines Aid Suggests Federal Help

Trump Spirit Airlines Aid Suggests Federal Help

Trump Spirit Airlines aid comments could pull the White House into a possible Spirit rescue, raising regulatory scrutiny and reshaping rescue talks.

Tesla Q1 2026 Earnings Margins and Inventory Risk

Tesla Q1 2026 Earnings Margins and Inventory Risk

Tesla Q1 2026 earnings preview sees automotive gross margin and a 50,363-unit inventory gap as low implied volatility may limit post-earnings moves.

Tractor Supply Earnings Fall; CEO Urges Action

Tractor Supply Earnings Fall; CEO Urges Action

Tractor Supply earnings showed weaker Q1 profitability and reaffirmed FY2026 guidance, leaving execution the near-term test for traders.

GE Aerospace Q1 Earnings Beat, Guidance Held

GE Aerospace Q1 Earnings Beat, Guidance Held

GE Aerospace Q1 earnings beat with strong orders; held 2026 EPS guide while warning higher jet fuel and geopolitical risk could hurt airline demand.