Why OpenAI is Collaborating with Broadcom to Design Custom Chips: Implications for ChatGPT, XPUs, AMD, and Nvidia

CN
@Zakariae BEN ALLALCreated on Tue Oct 14 2025
Concept illustration of OpenAI and Broadcom custom AI accelerator racks connected over high-speed Ethernet

OpenAI Partners with Broadcom to Design Custom Chips: Why It Matters

OpenAI has unveiled an ambitious multi-year plan to create its own AI accelerators in collaboration with Broadcom. The spotlight is on the impressive 10 gigawatts of capacity, with rollouts set to kick off in the second half of 2026 and wrap up by the end of 2029. Interestingly, these systems will be Ethernet-based and designed as full racks, rather than just individual chips. For a company recognized for its flagship product, ChatGPT, this strategic shift indicates a significant evolution in AI infrastructure, where hardware is tailored to specific workloads.

This development mirrors a broader industry trend, where major players are increasingly blending general-purpose GPUs with custom silicon while reimagining data center architecture to accommodate models that have expanded from billions to trillions of parameters. The future hinges not on a single processor but on an integrated stack: models, compilers, accelerators, memory, and networks functioning cohesively like one supercomputer.

In this article, we’ll delve into the specifics of OpenAI’s announcement, explore the advantages of custom chips tailored for particular workloads, examine the role of XPUs, and assess the positions of AMD and Nvidia as this transformative era unfolds.

The Announcement Explained

  • OpenAI will be responsible for designing the accelerators and rack-scale systems, while Broadcom will aid in their development and deployment.
  • The initiative encompasses a staggering 10 gigawatts of capacity, with initial deployments set for the second half of 2026 and culminating by late 2029.
  • These racks will utilize Ethernet for both expansion and scalability, leveraging Broadcom’s technology portfolio, including Ethernet, PCIe, and optical connections.
  • OpenAI aims to embed insights gained from training and operating cutting-edge models directly into silicon and system designs.

While 10 gigawatts is a significant figure on its own, the pivotal aspect lies in the architectural choice of standard Ethernet over proprietary network fabrics, hinting at a focus on scalability, supply chain diversity, and multi-vendor interoperability.

Why Customize Silicon When GPUs Are Sufficient?

GPUs excel at the dense linear algebra that underlies AI. However, when you control both the model and the entire hyperscale stack, you can co-design hardware that directly addresses your specific bottlenecks. Knowing the precise allocation of compute cycles and memory bandwidth enables the construction of chips and racks tailored to those critical pathways.

The benefits of custom silicon include:
– Enhanced Performance-per-Watt: Optimized for specific kernels and data streams.
– Reduced Latency: Improved end-to-end latency for applications thanks to better integration of compute, memory, and networking components.
– More Predictable Costs and Supply: Greater control over the roadmap for essential components of the stack.

OpenAI’s rationale aligns with this principle: integrating learnings from model development directly into the hardware and system design. The Ethernet-first choice also reflects the growing evidence that Ethernet is gaining traction in AI back-end networks.

Understanding XPUs: A Versatile AI Compute Solution

The term XPU is becoming increasingly prevalent in discussions around AI infrastructure. “X” stands for any compute engine suited for the task, which can include CPUs, GPUs, FPGAs, NPUs, DPUs, or custom ASICs. The emphasis is not on a single processor type, but rather the flexibility to mix and match the right engines for each task within a unified programming framework. Intel popularized the XPU concept to talk about this heterogeneous future, with various industry groups adopting it to describe accelerator cards that offload specialized functions from general-purpose CPUs.

Additionally, XPUs can be viewed from a packaging perspective: tightly integrated compute, memory, and input/output within one module designed specifically for AI data flows. It streamlines operations by bringing compute closer to the data rather than the other way around. OpenAI’s partnership with Broadcom aligns with this broader XPU vision, creating racks that integrate accelerators, high-bandwidth memory, PCIe, optics, and Ethernet into a cohesive and standardized building block.

The Importance of Networking: Ethernet’s Comeback in AI

Historically, Nvidia’s NVLink and InfiniBand combination set the benchmark for top-tier performance in AI training. While that stack remains formidable, the appeal of standards-based Ethernet has become more pronounced as clusters expand to accommodate tens of thousands of accelerators. Recent market research indicates a decisive shift: Ethernet is increasingly seen in AI back-end networks, leading to significant switch investments over the next five years.

This trend is reinforced by rapid advancements in Ethernet silicon and software. Broadcom’s latest offerings target 800 GbE and beyond, specifically for AI clusters with hundreds of thousands of accelerators, featuring advanced congestion control and topology-aware routing designed to compete with specialized networks while adhering to Ethernet standards.

In essence, OpenAI’s choice of Ethernet seems less like a concession and more like a confident move toward scalability, open ecosystems, and supplier diversity.

The Role of AMD: 6 Gigawatts of GPUs Starting with MI450

OpenAI is not abandoning GPUs. Just a week before announcing its deal with Broadcom, OpenAI and AMD revealed a strategic partnership that will deploy 6 gigawatts of AMD GPUs across multiple generations, starting with 1 gigawatt of Instinct MI450 set for the second half of 2026. This agreement also includes a warrant structure aligning long-term incentives for both companies.

AMD has been gearing up for this collaboration, previewing new accelerators and expanding the ROCm software ecosystem to support major hardware and software qualifications for training and inference on a large scale. Their mission focuses on standards-driven, rack-scale designs rather than proprietary options.

In practical terms, this AMD-OpenAI partnership allows OpenAI to scale operations: using off-the-shelf GPUs where necessary, deploying custom accelerators where they are beneficial, and maintaining a competitive, multi-vendor supply network.

How Nvidia Fits In: Shipping Platforms and the Power of NVLink

Nvidia remains the benchmark for many AI factories with its comprehensive end-to-end platform. The GB200 NVL72 rack is a fine example, integrating Grace CPUs with Blackwell GPUs via NVLink to create an expansive GPU cluster featuring massive bandwith and terabytes of High Bandwidth Memory (HBM). Shipments began earlier this year through partners such as HPE, and Nvidia has already showcased its next step, the Blackwell Ultra, with the upcoming GB300 NVL72 promising even higher performance.

Nvidia’s solution tightly integrates compute, memory, network fabric, and software. This means buyers are faced with a clear decision: pay for top-tier performance and seamless integration, albeit often at higher prices and with potential vendor lock-in. For numerous businesses, this trade-off may still present the best solution, especially for those tackling the toughest training challenges.

OpenAI’s strategy of utilizing Nvidia GPUs, AMD GPUs, and custom racks developed with Broadcom indicates a diversified portfolio approach: selecting the right tool for each job while leveraging multiple platforms for scalability.

Key Takeaways from the OpenAI-Broadcom Plan

  1. Model-Driven Hardware Design: Understanding your models allows for prioritizing the most critical kernels and communication patterns. Custom accelerators can focus on training or inference speeds that general-purpose GPUs address more broadly.
  2. Rack-Scale Mindset: Today’s AI systems are inherently rack-based, integrating accelerators, HBM, PCIe, optics, and Ethernet at the rack level to streamline deployment and eliminate bottlenecks. Broadcom’s networking capabilities are designed precisely for this level of integration.
  3. Ethernet’s Ascendance: Opting for an Ethernet-first approach is indicative of both technological maturity and market trends, with new NICs and switches achieving 800G-class performance complemented by features tailored for AI ecosystems, all while maintaining interoperability among suppliers.
  4. Multi-Sourcing is an Advantage: With the AMD partnership announced shortly before, OpenAI is evidently striving to avoid being tied to a single vendor. The AMD collaboration assures near-term throughput while the Broadcom partnership allows for long-term architectural freedom. Nvidia remains crucial, offering unmatched turnkey performance.
  5. Realistic Timelines: This rollout is not imminent. OpenAI’s customized racks are set for initial deployment in late 2026, with completion occurring throughout 2029, providing ample time to refine the silicon, compiler resources, and data center strategy. Meanwhile, GPUs will handle the operational heavy lifting.

Understanding XPUs and Their Significance

For those involved in building or purchasing AI infrastructure, the emergence of XPU thinking will significantly influence your choices. Specifically, consider:
– A varied mix of accelerators: GPUs for general workloads, NPUs or custom ASICs for efficient low-latency inference, and DPUs for offloading network and storage responsibilities from primary CPUs.
– Focus as much on memory and networking as on sheer processing power. In the realm of AI, bandwidth and collective efficiency are often the defining factors for job duration.
– Dependence on programming models and tools that can target multiple devices. This is why initiatives like oneAPI and the ROCm/CUDA ecosystems are critical.

Industry groups use “XPU” as a collective term for accelerators that offload various processes like networking and storage from CPUs, enhancing isolation and freeing resources for core model tasks. Academic research is also exploring XPU-driven offloads to minimize latency and costs for cloud-native services. Expect to see more adoption of this approach in production AI clusters.

Ethernet vs. Proprietary Fabrics: Trade-offs for 2025

  • Performance Headroom: While NVLink and InfiniBand continue to dominate tightly coupled GPU-to-GPU links, greatly benefiting communication-heavy training tasks, Ethernet is closing in with its 800G-class NICs and switches, especially with the addition of collective offloads.
  • Scalability and Openness: Ethernet promotes vendor flexibility and well-known operational practices, crucial for clusters with numerous accelerators. This scalability is bolstered by the expectation that Ethernet will secure most AI back-end investments moving forward.
  • Design Flexibility: An Ethernet-first strategy allows buyers to integrate accelerators, storage, and hosts from various vendors, minimizing risks and increasing supply options as demand fluctuates.

OpenAI’s plan to construct Ethernet-based racks with Broadcom clearly prioritizes scalability and openness.

Changes for ChatGPT Users and Enterprise Adopters

For users, the benefits are clear: faster models, enhanced features, and improved accessibility. For enterprises, the key takeaway is flexibility. The combination of GPUs with custom XPUs, all interconnected via high-performance Ethernet, opens up diverse procurement and implementation paths:
– In the short term, continue utilizing reliable GPU platforms while expanding capacity through current partnerships.
– In the medium term, expect custom racks centered on specialized accelerators to emerge in OpenAI-operated data centers, providing new price-performance ratios for inference-heavy tasks.

Ultimately, the biggest winner here will be the overall AI capacity network. Access to more computing power and diversified interconnectivity fosters increased supply and accelerates innovation.

Risks and Execution Challenges

  • Silicon Risk: New accelerators must meet performance, power, and yield benchmarks, especially amidst constraints in high-bandwidth memory supplies. Extensive testing and software development cycles can introduce risks.
  • Network Complexity: Expanding Ethernet into the 800G space while ensuring low latency for collective operations presents significant challenges, even with advanced features.
  • Power and Cooling: Rack-scale AI systems typically exceed 100 kW in power needs, necessitating advanced liquid cooling solutions. Data center readiness and grid planning will thus be of heightened concern.
  • Timeline Issues: The first custom racks are set for late 2026, with full deployment running through 2029, making patience a necessity.

While these obstacles aren’t showstoppers, they represent the typical challenges of pushing the boundaries of technology. The motivation behind this endeavor is substantial: enhanced performance per watt and dollar, better control of product roadmaps, and reduced dependence on any single vendor.

Implications for AMD and Nvidia

  • AMD: With a validated route into OpenAI’s operations via a 6 gigawatt, multi-generation partnership starting with MI450 in late 2026, AMD solidifies its role as a key supplier in OpenAI’s future capacity.
  • Nvidia: Remains the reference point for integrated AI infrastructures, with the current GB200 NVL72 in the market and the forthcoming GB300 NVL72 set to elevate performance benchmarks. For those prioritizing top performance through seamless integration, Nvidia continues to be a frontrunner.
  • Broadcom: Evolves from being merely a component vendor to a co-developer of entire systems for OpenAI, emphasizing Ethernet-first architecture and high-speed optical connections. This transition broadens the competitive landscape from just chips to complete systems and network solutions.

The outlook: AI infrastructure is not a zero-sum endeavor. Different workloads and maturity levels will favor various designs, leading to a blend of choices for buyers.

A Framework for the Coming 5 Years

Visualize three overlapping circles:
1. General-purpose GPU platforms for training and flexible inference, spearheaded by Nvidia and envisaged growth at AMD.
2. Custom XPUs tailored for known workloads, integrated at the rack scale with Ethernet as the primary networking choice.
3. A software layer adept at targeting all types of systems, making decisions based on cost, latency, and availability.

OpenAI’s recent announcements represent significant bets within each circle, which underscores the importance of the Broadcom news: it completes this evolving landscape.

Key Takeaways

  • OpenAI will collaborate with Broadcom to co-develop Ethernet-optimized, rack-scale systems, featuring OpenAI-designed accelerators with a goal of achieving 10 gigawatts between 2026 and 2029.
  • Selecting Ethernet aligns with market trends demonstrating its increasing prevalence in AI back-end networks, alongside Broadcom’s advanced 800G-capable NICs and switches.
  • AMD retains a central role through a multi-generational GPU agreement entailing 6 gigawatts, starting with MI450 in late 2026.
  • Nvidia continues to ship its end-to-end AI systems, with existing GB200 NVL72 platforms in the field and the upcoming GB300 NVL72 on deck.
  • XPUs capture the diverse nature of future computing, allowing for a combination of GPUs, NPUs, DPUs, and custom ASICs while maintaining cohesive software integration and robust Ethernet wiring.

FAQs

What did OpenAI and Broadcom announce?

A multi-year collaboration focused on developing and deploying OpenAI-designed accelerators and Ethernet-based rack systems totaling 10 gigawatts of capacity, with deployments slated for the second half of 2026 through 2029.

Why opt for Ethernet instead of InfiniBand?

Ethernet has rapidly matured and is capturing market share within AI back-end networks, buoyed by 800G NICs and switches, contributing to enhanced interoperability and supplier options crucial for scaling.

How do AMD GPUs fit into OpenAI’s strategy?

OpenAI’s 6 gigawatt partnership with AMD commences with 1 gigawatt of Instinct MI450 GPUs in late 2026, securing AMD’s role as a core supplier alongside custom accelerators.

Will OpenAI cease usage of Nvidia?

No. Nvidia’s integrated platforms remain widely deployed, with the GB200 NVL72 still shipping through partners. OpenAI’s strategy is additive, ensuring the best tools are utilized for each task.

What is an XPU in straightforward terms?

An XPU is a broad term encompassing various compute types, including GPUs, NPUs, DPUs, FPGAs, and custom ASICs, all working together under a unified programming framework. It’s about choosing the optimal compute engine for different workload requirements.

Conclusion

The collaboration between OpenAI and Broadcom marks a pivotal evolution in AI infrastructure, emphasizing the integration of GPUs with custom solutions. This initiative signifies a leap toward co-designing chips and racks rooted in the realities of model architecture, all interconnected by high-performance standardized networking.

With Nvidia’s comprehensive systems, AMD’s expanding GPU influence, and Broadcom’s Ethernet-focused approach, the landscape is set for a variety of competitive offerings. For customers and developers alike, this evolution heralds more computing options, increased architectural flexibility, and an acceleration in innovation across the industry. As the past decade was largely defined by the dominance of GPUs, the upcoming era may be characterized by the diversification of XPUs: heterogeneous, rack-scale architectures fine-tuned for specific tasks.

Thank You for Reading this Blog and See You Soon! 🙏 👋

Let's connect 🚀

Newsletter

Your Weekly AI Blog Post

Subscribe to our newsletter.

Sign up for the AI Developer Code newsletter to receive the latest insights, tutorials, and updates in the world of AI development.

By subscription you accept Terms and Conditions and Privacy Policy.

Weekly articles
Join our community of AI and receive weekly update. Sign up today to start receiving your AI Developer Code newsletter!
No spam
AI Developer Code newsletter offers valuable content designed to help you stay ahead in this fast-evolving field.