The race to power next-generation AI just took a major turn—and it’s not about GPUs alone anymore. As coding agents rapidly move from experiments to real-world deployment, tech giants are now rethinking how AI infrastructure actually works at scale.
SambaNova Systems has officially expanded its collaboration with Intel, unveiling a powerful new hardware blueprint designed specifically for agentic AI workloads. The solution blends GPUs, Intel’s latest Intel Xeon 6 CPUs, and SambaNova’s custom RDUs to deliver faster, more efficient AI inference across enterprise environments.
This shift comes as coding agents—AI systems that can write, compile, and execute code—are pushing traditional GPU-only setups to their limits. While GPUs still handle the initial “prefill” stage of processing large prompts, they’re no longer enough to manage the full pipeline. The real bottleneck now lies in decoding outputs and executing complex tasks, which is where CPUs and specialized accelerators step in.
According to SambaNova CEO Rodrigo Liang, the winning formula is becoming clear: GPUs start the process, Xeon 6 CPUs orchestrate and execute it, and RDUs finish the job with high-speed token generation. This layered approach is designed to work inside existing data centers, making it easier for enterprises to adopt without major infrastructure changes.
Intel is positioning its Xeon 6 processors at the center of this ecosystem. Built on the widely used x86 architecture, these CPUs handle everything from system orchestration to running agent tools, APIs, and code execution. Intel executive Kevork Kechichian emphasized that most enterprise software already runs on Xeon, making it a natural foundation for scaling AI workloads.
The need for this hybrid approach is becoming increasingly obvious. As AI agents generate more code and execute more tasks, infrastructure demands are exploding. Ivan Burazin noted that platforms are already seeing a surge in demand for CPU-powered environments just to compile and run AI-generated code safely and efficiently.
What makes this architecture stand out is its clear division of labor. GPUs handle parallel-heavy prefill tasks, RDUs focus on high-throughput, low-latency decoding, and Xeon CPUs act as the control center—managing workflows, distributing workloads, and executing real-time actions. This balance not only improves performance but also reduces inefficiencies seen in single-chip systems.
Industry analysts are calling this a major step toward production-ready AI infrastructure. Ian Cutress highlighted that no single processor type can handle every stage of agentic workflows anymore, and this split architecture reflects what enterprises have been waiting for: better performance, improved efficiency, and seamless compatibility with existing software ecosystems.
The system is expected to roll out in the second half of 2026, targeting enterprises, cloud providers, and sovereign AI initiatives looking to deploy coding agents at scale. With full support for modern AI software stacks, this solution could redefine how large-scale AI systems are built and deployed.
As AI moves deeper into real-world applications, one thing is becoming clear—future infrastructure won’t be about a single powerful chip, but about how intelligently different technologies work together.