
Cloud-based GPU computing has dropped in price over the past year, and real savings can be had if customers can be agile about how to use the compute power.
Cast AI, developer of an application performance automation platform, issued a report that is a deep dive into the evolving economics of cloud-based compute powered by Nvidia’s A100 and H100 GPUs, analyzing real-world pricing and availability across the top three cloud providers: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP).
Laurent Gil, CEO of Cast, said the data shows that while a handful of major players—such as OpenAI, Meta, Google, Anthropic, and others—continue to dominate model training, smaller startups are increasingly focused on inference workloads that drive immediate business value.
“What we’re seeing now is that the real business of AI is in inference,” he explained. “This marks a transition from hype to reality.”
One of the first things it found was that the price for a high-demand AWS H100 GPU Spot Instance (p5.48xlarge) plummeted as much as 88% in one region, falling from $105.20 in January 2024 to $12.16 by September 2025. H100 in Europe saw a cost reduction up to 48%, and nearly 2x efficiency gains during peak windows.
“This trend suggests cloud providers may have more capacity than expected,” he noted, emphasizing that the decline appears across several providers, not just Amazon. “It’s possible they simply have more inventory than they need.”
The pattern points to an evolving GPU ecosystem: while top-tier chips like Nvidia’s new GB200 Blackwell processors remain in extremely short supply, older models such as the A100 and H100 are becoming cheaper and more available. Yet, customer behavior may not match practical needs. “Many are buying the newest GPUs because of FOMO—the fear of missing out,” he added. “ChatGPT itself was built on older architecture, and no one complained about its performance.”
Gil emphasized that managing cloud GPU resources now requires agility, both operationally and geographically. Spot capacity fluctuates hourly or even by the minute, and availability varies across data center regions. Enterprises willing to move workloads dynamically between regions—often with the help of AI-driven automation—can achieve cost reductions of up to 80%.
“If you can move your workloads where the GPUs are cheap and available, you pay five times less than a company that can’t move,” he said. “Human operators can’t respond that fast automation is essential.”
Conveniently, Cast sells an AI automation solution. But it is not the only one and the argument is valid. If spot pricing can be found cheaper at another location, you want to take it to keep the cloud bill down/
Gil concluded by urging engineers and CTOs to embrace flexibility and automation rather than lock themselves into fixed regions or infrastructure providers. “If you want to win this game, you have to let your systems self-adjust and find capacity where it exists. That’s how you make AI infrastructure sustainable.”
