The artificial intelligence industry's shift from model training to real-world deployment will not ease pressure on computing infrastructure — it will intensify it, according to a major new report from Deloitte published on March 18.
The consulting firm's 2026 Technology, Media & Telecommunications Predictions challenge a widely held assumption that the move toward AI inference would reduce the industry's appetite for computing power and data center capacity.
The Inference Paradox
Deloitte projects that inference — the process of running trained AI models to generate outputs — will account for roughly two-thirds of all AI computing by 2026, up from about one-third in 2023. But rather than easing infrastructure demands, this shift is creating what the firm describes as an inference paradox.
Two emerging techniques are driving the surge. Post-training scaling methods, which refine models after initial training, can consume approximately 30 times the computing resources needed to train the original model. Test-time scaling, where models perform additional computation during each query to improve response quality, can require more than 100 times the computing power of a basic inference task.
The Numbers
The financial implications are staggering. Deloitte estimates global spending on AI data centers will reach approximately $400 billion in 2026, with that figure potentially climbing to $1 trillion annually by 2028.
High-performance AI chips, which can cost more than $30,000 each, are expected to account for around $200 billion in spending this year alone, with the overall AI chip market projected to cross $400 billion by 2028.
The market for inference-optimized chips — a category that includes products from companies like Groq and custom silicon from cloud providers — will grow to over $50 billion in 2026. However, Deloitte stresses this will supplement rather than replace demand for high-end GPUs.
Where Inference Actually Happens
Contrary to expectations that AI inference would migrate to edge devices and consumer hardware, most inference workloads will continue running in data centers and on-premises enterprise servers. The power, memory, and latency requirements of advanced AI models make edge deployment impractical for the majority of enterprise use cases.
"The world likely needs all the data centres and enterprise on-premises AI factories that are currently being planned and all the electricity that these facilities will need," the report states.
What It Means
For investors and infrastructure planners, the message is clear: the AI computing buildout is far from peaking. Despite efficiency improvements in chip design and model architecture, demand for AI computing is growing four to five times each year and is expected to maintain that pace through 2030, with significant implications for global energy consumption and capital allocation.



