Gartner: LLM inference costs to drop 90% by 2030

Gartner Inc. projects that running inference on a large language model (LLM) with 1 trillion parameters will cost GenAI providers over 90% less by 2030 compared with 2025, driven by improvements in chips, infrastructure, and model design.

The research and consulting firm defines inference as the process where AI models generate responses after processing inputs, often measured in “tokens,” which represent chunks of data. In its analysis, one token is roughly 3.5 bytes or about four characters.

The research firm said several factors will contribute to the steep decline in costs, including more efficient semiconductors, better use of computing resources, higher chip utilization, and the growing use of specialized hardware designed specifically for AI workloads. Edge devices, which process data closer to where it is generated, are also expected to play a role in certain use cases.

Gartner predicts most GenAI tools will use multiple formats by 2027

“These cost improvements will be driven by a combination of semiconductor and infrastructure efficiency improvements, model design innovations, higher chip utilization, increased use of inference-specialized silicon, and application of edge devices for specific use cases,” said Will Sommer, senior director analyst at Gartner.

Gartner also forecasts that by 2030, LLMs could be up to 100 times more cost-efficient compared with early models of similar size introduced in 2022.

The report outlines two scenarios used in its projections. In “frontier” scenarios, models run on cutting-edge chips, while “legacy blend” scenarios use a mix of available semiconductor technologies. Costs in the legacy blend setup are significantly higher due to lower computing performance compared with advanced chips.

Despite falling costs per token, Gartner noted that savings may not fully translate to lower prices for enterprise customers. Advanced AI systems are expected to use far more tokens per task than current tools. For example, agentic AI models, which can perform complex multi-step tasks, may require 5 to 30 times more tokens than standard chatbot interactions.

This means overall usage could rise faster than per-token costs decline, potentially increasing total spending on AI inference even as unit costs fall.

“Chief Product Officers (CPOs) should not confuse the deflation of commodity tokens with the democratization of frontier reasoning,” Sommer said. “As commoditized intelligence trends toward near-zero cost, the compute and systems needed to support advanced reasoning remain scarce. CPOs who mask architectural inefficiencies with cheap tokens today will find agentic scale elusive tomorrow.”

Gartner added that future value will likely shift toward platforms that can manage and distribute workloads across different types of AI models. Routine tasks are expected to run on smaller, more efficient models, while complex reasoning will rely on more powerful systems, but used selectively due to higher costs.

Gartner: LLM inference costs to drop 90% by 2030

ByBack End News

Like this:

Related Stories

By Back End News

Related Post

AMD sees ‘agent computers’ leading AI-driven PC shift

Samsung unveils Galaxy A57, A37 5G

Alibaba unveils new AI-native agentic platform for enterprises

Read More

AMD sees ‘agent computers’ leading AI-driven PC shift

Gartner: LLM inference costs to drop 90% by 2030

Samsung unveils Galaxy A57, A37 5G

Alibaba unveils new AI-native agentic platform for enterprises

ByBack End News

SHARE

Like this:

Related Stories

By Back End News

Related Post

Read More

Discover more from Back End News