(Photo Illustration by Omar Marques/SOPA Images/LightRocket via Getty Images)
SOPA Images/LightRocket via Getty Images
Over the past two years, the artificial intelligence trade has revolved around one central bet: companies would need far more computing power to train larger models.
That put GPUs, or graphics processing units, in the spotlight.
These chips can handle many calculations at once, making them essential for training large AI models. Since GPUs require vast physical infrastructure to operate at scale, the rush to secure chips quickly became a fight for data center space, power access and broader capacity.
Investors responded by pouring money into the businesses that powered this entire buildout. This trade has obviously done well and likely has room to run. As the market looks beyond the training-fed boom, though, the opportunity set may begin to widen.
The reason is that AI models do not create much value simply by existing. They only do so when people and businesses use them. That moves the discussion from training to inference, which is the process of running trained models to answer questions, complete tasks or power applications. For investors, the difference is not academic.
Training needs enormous computing power while models are being built. Inference, by contrast, depends on steady capacity as AI spreads through search, software, customer service, coding and other workflows. That brings CPUs, or central processing units, back into the discussion because they help coordinate activity across compute.
That would mark a notable turn. CPUs were long the workhorse of computing before GPUs seized the spotlight during the training boom. Now, CPUs may have a larger role again, not by replacing GPUs, but by helping to manage the steady flow of AI work running across servers, cloud platforms and data centers.
The cost of running AI models could make the inference phase even more compelling for investors. Tokens are the small pieces of text or data an AI model uses to generate a response. As hardware improves, companies appear to be producing each token at lower cost, allowing expensive chips to do more work.
At the same time, demand for tokens is likely to rise as AI agents become more common. Rather than answering a single question and stopping, agents can work through several steps before completing a task. That could drive far more usage across AI systems.
That combination matters for the hyperscalers. If token costs fall while usage grows and pricing holds, companies building AI infrastructure may earn a wider spread. In that case, spending on chips, data centers and power begins to look less like a speculative bet and more like the foundation for a larger operating business.
That broader demand is already showing up in how chip companies describe the inference market. Intel and Arm have both highlighted the growing role of CPUs as inference increases. Intel, for example, has said AI server configurations could shift from roughly eight GPUs for every CPU to about four GPUs for every CPU as inference demand grows. If that forecast proves accurate, it would support the broader point: inference could push AI spending beyond GPUs and deeper into CPUs, servers and the systems needed to run models at scale.
Servers may also become more important. The largest hyperscalers can design custom systems and work directly with global suppliers. Smaller cloud providers and neo-clouds built for inference often need equipment they can deploy quickly and support easily. That could help companies such as Dell and HPE, which sell the servers that carry AI workloads.
Notably, many companies are still preparing for broader AI use. They need to clean up data and connect systems before they can deploy agents across their businesses. That work takes time, but it also suggests inference demand could keep building as more companies move from preparation to real use.
Ultimately, this is not an argument against the GPU-driven trade. It is an argument that inference could spread the next phase of AI spending across a wider set of companies. If models are going to run constantly across real workflows, investors will need to look beyond the companies that trained them and toward the businesses that keep them running.

