Amazon Trainium3 Raises AI Stakes, Hints Nvidia Tie

Amazon Trainium3 debuts with large per-chip compute and memory gains; UltraServers and Red Hat support shift cloud AI cost and scale dynamics.

December 02, 2025·3 min read
View all news articles
Flat filled vector server chip with pooled memory bridge illustrating Amazon Trainium3 memory and interconnect gains.

KEY TAKEAWAYS

  • Trainium3 had eight NeuronCore-v4 cores, 144 GiB device memory and 4.9 TB/sec device memory bandwidth.
  • UltraServers with Trainium3 delivered 4.4x more compute and 3.9x higher memory bandwidth versus Trainium2.
  • Red Hat said its AI Inference Server can yield 30-40% better price-performance versus comparable GPU EC2 instances.

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Add your email to receive our free daily newsletter. No spam, unsubscribe anytime.

Or subscribe with

Amazon said Trainium3, its fourth-generation AI training chip, became generally available on Dec. 2, 2025, at re:Invent. The chip delivers significant per-device compute and memory gains that Amazon Web Services (AWS) said will power UltraServers for production AI workloads.

Trainium3 Hardware Advances and System Performance

Trainium3 features eight NeuronCore-v4 cores, 144 GiB of on-device memory, and 4.9 terabytes per second (TB/sec) of device memory bandwidth. This represents a 1.5x increase in memory capacity and a 1.7x boost in memory bandwidth compared with Trainium2. Direct memory access (DMA) bandwidth also improved by 1.4x to 4.9 TB/sec.

The chip adds a NeuronLink-v4 interconnect providing 2.56 TB/sec per-device bandwidth for scale-out training and pooled memory. It uses 16 collective communication cores (CC-Cores) to coordinate data transfer across chips and servers.

Trainium3 delivers 2,517 trillion floating-point operations per second (TFLOPS) for FP8 and the new MXFP4 numeric format, 671 TFLOPS for BF16/FP16/TF32, and 183 TFLOPS for FP32. FP8 throughput roughly doubles that of the prior generation, while MXFP4 is introduced as a new capability.

The architecture enhances programmability with support for dynamic shapes, control flow, user-programmable rounding modes, and custom operators implemented via GPSIMD engines, broadening model and operator compatibility.

At the system level, AWS’s UltraServers powered by Trainium3 offer aggregate gains over the previous generation, including 4.4x more compute, 3.9x higher memory bandwidth, and 3.5x more tokens per megawatt, an efficiency metric for large-scale training deployments.

Ecosystem Integration and Scale

Red Hat announced that its Red Hat AI Inference Server, powered by vLLM, will run on AWS Inferentia2 and Trainium3 chips. The company said this integration delivers 30–40% better price-performance than comparable GPU-based Amazon EC2 instances, aiming to provide a common inference layer supporting any generative AI model.

AWS reported deploying more than 1 million Trainium processors, with customers increasingly using the chips for inference workloads, expanding beyond their original training-focused role.

Over the past year, AWS added 3.8 gigawatts of data-center capacity—the largest increase among competitors—and expanded its private network backbone by 50% to over 9 million kilometers of terrestrial and subsea cable. These infrastructure investments support scaling production AI.

AWS also previewed Trainium4, targeting sixfold FP4 performance, fourfold memory bandwidth, and double the memory capacity compared with Trainium3. The company signaled plans to integrate Nvidia technology into future Trainium generations, indicating a multi-vendor silicon strategy rather than exclusive displacement.

Separately, AWS introduced "AI Factories," a managed, customer-specific infrastructure service built and scaled by AWS. This service will run on Trainium and other custom silicon, providing a route for customers to deploy production AI at scale.

Red Hat described its AI Inference Server as designed "to deliver a common inference layer that can support any gen AI model," helping customers achieve higher performance, lower latency, and cost-effective scaling for production AI deployments.

HIGH POTENTIAL TRADES SENT DIRECTLY TO YOUR INBOX

Add your email to receive our free daily newsletter. No spam, unsubscribe anytime.

Or subscribe with

Read other top news stories

Samsung Galaxy Z TriFold Debuts Ahead of Apple

Samsung Galaxy Z TriFold Debuts Ahead of Apple

Samsung Galaxy Z TriFold launch signals a premium push as South Korea pricing near $2,500 and early-2026 U.S. timing give traders cues.

Credo Technology Earnings Beat Estimates

Credo Technology Earnings Beat Estimates

Credo Technology earnings had Q2 results topping estimates and raised FY2026 guidance, supporting bullish positioning on product ramps and margins.

MongoDB Earnings Lift Shares After Q3 Beat

MongoDB Earnings Lift Shares After Q3 Beat

MongoDB earnings Q3 beat and a raised FY2026 outlook prompted analyst target upgrades and a sharp trading rally on accelerating Atlas cloud revenue.

Nvidia AI Chips Under Pressure From TPUs and AMD

Nvidia AI Chips Under Pressure From TPUs and AMD

Nvidia AI chips face TPU, AMD and China competition; elevated valuation, inventory and tariff volatility raise trader risk around revenue and positioning.

Costco Tariff Lawsuit Seeks Refunds Before Dec. 15

Costco Tariff Lawsuit Seeks Refunds Before Dec. 15

Costco tariff lawsuit seeks IEEPA tariff refunds as a Dec. 15 liquidation deadline could lock duties and raise near-term importer cost and positioning risk.

Netflix Bid for Warner Bros. Discovery

Netflix Bid for Warner Bros. Discovery

Netflix bid for Warner Bros. Discovery reshapes the second-round auction and pressures rivals to finalize financing amid fresh antitrust scrutiny.