The price of a million tokens from a frontier model dropped by roughly ninety percent over the course of 2024. That number should be striking. The same computational capability that cost one amount at the start of the year cost a tenth of that by the end of it. The economics of AI applications changed underneath everyone who was building them.
The first and most obvious change was in what became economically viable. Applications that were previously too expensive to run at meaningful scale became affordable. Processing every customer support conversation through a capable model, summarising every document in a large repository, analysing every piece of user-generated content. These things were technically possible before the price drop. They were not economically practical. They became so during 2024.
The less obvious change was in system design. When inference was expensive, you designed systems to minimise model calls. You cached aggressively. You routed simpler queries to cheaper models. You worked hard to reduce token counts through careful prompt engineering. When inference becomes cheap, some of those optimisations matter less than they did. The mental overhead of tracking where every token goes changes when the cost of a token drops by a factor of ten.
What did not change is that latency still matters. Users are not patient. Even if an operation costs essentially nothing, if it takes eight seconds the user experience suffers. The architectural patterns around parallelism, streaming responses, and progressive loading remain important even as cost considerations ease.
I noticed something in how product teams talked about AI features through this period. Early in 2024 conversations about AI features often included cost modelling early in the discussion. By late 2024 those conversations had shifted. Teams were more likely to prototype freely, see what was useful, and worry about the economics later. That shift in mindset reflects the changed cost reality but also carries its own risks. Building habits of free experimentation with inference is sensible. Building habits of inattention to cost structures while costs are low can produce surprises when you scale.
The competition between model providers that drove these price reductions has not stopped. The open source models available in mid-2025 run on hardware that was considered impractical for that level of capability a year earlier. The trajectory suggests that frontier-level intelligence will continue to get cheaper. The downstream effects on what gets built and how are still working through the industry.