
As AI infiltrates more and more business operations, enterprise IT teams are under pressure to ensure their systems, applications, and networks are resilient enough to absorb the impact. At the same time, cloud service providers and companies supporting the global Internet infrastructure—on which enterprises heavily rely—need to make sure they can handle the AI-fueled surge in demand. If either or both of these efforts fall short, the result could include outages that hamper business operations worldwide.
AI’s growing impact
The use of AI tools and applications is rising dramatically. In the past two years, the percentage of U.S. employees who say they have used AI in their role a few times a year or more has nearly doubled, from 21% to 40%, according to a Gallup pool of about 19,000 people. Frequent AI use—a few times a week or more—has also nearly doubled, from 11% to 19% since Gallup’s first measure in 2023.
AI is causing increases in both network traffic volume and volatility, says Nik Kale, principal engineer and product architect for cloud security and AI platforms at Cisco. “As enterprises begin embedding AI into their customer-facing applications, internal systems, and productivity tools, the number of concurrent inference requests increases dramatically,” Kale says.
Retrieval-heavy architecture types such as retrieval augmented generation—an AI framework that boosts large language models by first retrieving relevant, current information from external sources—create significant network traffic because data is moving across regions, object stores, and vector indexes, Kale says.
“Agent-like, multi-step workflows further amplify this by triggering an additional set of retrievals and evaluations at each step,” Kale says. “All of these patterns create fast and unpredictable bursts of network traffic that today’s networks were never designed to handle. These trends will not abate, as enterprises transition from piloting AI services to running them continually.”
Many organizations today depend on real-time, AI-enabled services for tasks such as fraud detection, behavioral analytics, operational forecasting, and security incident response, Kale says.
“When AI pipelines slow down or traffic overloads common infrastructure, business processes slow down, and customer experience degrades,” Kale says. “Since many organizations are using AI to enable their teams to make critical decisions, disruptions caused by AI-related failures will be experienced instantly by both internal teams and external customers.”
A single bottleneck can quickly cascade through an organization, Kales says, “reducing the overall value of the broader digital ecosystem.”
In 2026, “we will see significant disruption from accelerated appetite for all things AI,” research firm Forrester noted in a late-year predictions post. “Business demands of AI systems, network connectivity, AI for IT operations, the conversational AI-powered service desk, and more are driving substantial changes that tech leaders must enable within their organizations.”
And in a 2025 study of about 1,300 networking, operations, cloud, and architecture professionals worldwide, Broadcom noted a “readiness gap” between the desire for AI and network preparedness. While 99% of organizations have cloud strategies and are adopting AI, only 49% say their networks can support the bandwidth and low latency that AI requires, according to Broadcom’s 2026 State of Network Operations report.
“AI is shifting Internet traffic from human-paced to machine-paced, and machines generate 100 times more requests with zero off-hours,” says Ed Barrow, CEO of Cloud Capital, an investment management firm focused on acquiring, managing, and operating data centers.
“Inference workloads in particular create continuous, high-intensity, globally distributed traffic patterns,” Barrow says. “A single AI feature can trigger millions of additional requests per hour, and those requests are heavier—higher bandwidth, higher concurrency, and GPU-accelerated compute on the other side of the network.”
If the infrastructure supporting global business becomes unstable under AI-driven loads, “the impact can be far reaching,” Barrow says. “Examples might include downtime in revenue systems, broken supply chains, failed authentication or payments or model outages that paralyze operations.
Think of AI as creating systemic load risk, Barrow says. Every enterprise depends on shared networks—cloud providers, content delivery networks, domain name systems, transit networks, for example. “When those shared layers buckle, it cascades everywhere,” he says. “Cloud is no longer just a technical cost center; it’s a strategic liability that hits gross margins, continuity, and valuation if not actively managed.”
How enterprises can prepare
Organizations need to take steps, if they haven’t already, to make their networks, systems, and applications more resilient against the AI-caused disruptions that impact service providers and others.
Enterprises need to treat AI-based workloads as a distinct type of application from traditional workloads, Kale says. “The first thing companies need to achieve is a greater understanding of how AI-based workloads generate traffic and where bottlenecks exist,” he says. “Without a sense of this, no resilience strategy can be developed.”
Another requirement is to understand how to predict traffic patterns, “which requires better traffic shaping, rate limiting, and workload separation to prevent AI-based traffic surges from impacting unrelated systems,” Kale says. And yet another area organizations need to address is minimizing unnecessary cross-region data movement by optimizing retrieval paths and moving data closer to AI models to both improve performance and enhance fault tolerance, he says.
Organizations also need to consider implementing traffic filtering and intelligent rate limiting—which uses AI and predictive analysis to dynamically detect sophisticated attacks such as distributed denial of service (DDoS), scraping, and other cybersecurity threats by analyzing traffic patterns, device signals, and user behavior, says Shaila Rana, cofounder of cybersecurity and AI think tank ACT Research Institute and IEEE senior member.
“Don’t treat all traffic the same,” Rana says. “Use systems that can identify and categorize different types of requests in real-time. Legitimate AI agents should identify themselves properly, but many don’t. Build rules that detect unusual patterns like thousands of requests from a single source in seconds.”
This protects infrastructures from being overwhelmed by aggressive scrapers or poorly designed AI systems, Rana says. “For example, if your API suddenly gets hit with 10,000 requests per minute when normal traffic is 100, you need automatic throttling that kicks in before your servers crash,” she says. This isn’t about blocking AI entirely, she says, but managing it more effectively so there is no negative impact on employees or customers.
Another important practice is to build redundancy and failover systems into the enterprise IT architecture, Rana says. “Diversify your tech stack; don’t rely on a single cloud provider or data center,” she says. “Distribute your services across multiple regions and providers. When one gets overwhelmed by AI traffic spikes, your systems automatically route to alternatives.”
This is vital because AI-driven disruptions often cascade, Rana says. So if one provider goes down, traffic floods to the next, creating a domino effect. “Some companies already do this well, where they can lose entire data centers without users noticing because traffic seamlessly shifts elsewhere,” she says. “It’s more expensive upfront, but far cheaper than losing business during an outage. Also, make sure you test those failovers regularly. Don’t wait for a crisis to discover they don’t work.”
Investing in real-time monitoring and predictive analytics is also key, Rana says. “You need visibility into your traffic patterns and the ability to predict problems before they happen,” she says. “Use AI to fight AI, so deploy systems that learn normal traffic behavior and alert you to anomalies immediately. This gives you time to respond before a small problem becomes a catastrophic failure.”
Include monitoring of service providers as well, Rana says. “If your cloud provider is experiencing AI-driven stress, you need to know immediately so you can activate backup plans,” she says.
Technology leaders need to assume that rapid growth in AI usage and demand is a given and apply discipline to predicting spikes in traffic.
“The number one failure mode in AI infrastructure today is underestimating how fast demand scales,” Barrow says. AI workloads grow linearly with adoption, not flattening out like traditional software, he says. Enterprises need to adopt forecasting models that are tied to real business metrics such as user sessions, API calls, and AI transactions.
“If you’re not running best/base/worst-case scenarios that model two times to five times usage spikes, you’re effectively flying blind,” Barrow says. “Forecast AI demand like a financial risk.”
What global Internet infrastructure providers need to do
The major cloud service providers (Amazon Web Services, Microsoft Azure, and Google Cloud), as well as companies supporting the global Internet infrastructure, will also need to adapt their environments to handle AI-related increases in demand.
“The Internet’s core infrastructure needs significant upgrades to handle this new reality,” Rana says. “We need dramatically increased bandwidth capacity at every level, not just in major data centers but also in the backbone networks that connect them.”
Current infrastructure was sized for human traffic patterns with predictable peaks and valleys, Rana says. “AI traffic doesn’t follow those patterns,” she says. “It’s constant, massive, and unpredictable. We also need smarter routing systems that can dynamically respond to traffic surges in real-time, not just follow static rules.”
Infrastructure providers need to deploy GPU capacity at the edge, AI-aware routing, and far more route diversity to handle continuous, high-intensity demand, Barrow says. GPU capacity is the new scarce asset, he says.
To improve network resilience, operators can increasingly integrate AI capabilities to enhance DDoS mitigation, says Mattias Fridström, chief evangelist and vice president of Arelion, a global provider of Internet connectivity. They can also leverage the massive volumes of global traffic data to gain more granular visibility into that traffic, detect anomalies, and anticipate and prevent outages before they occur, Fridström says. “Ultimately, a scalable, flexible network is the best way to survive traffic spikes,” he says.
As inference-based workloads become increasingly consolidated in a few select cloud regions, they place an increasingly heavy burden on the global backbone and interconnects, Kale says.
“With the growth of multi-modal models and the increased use of video and high-dimensional data, the burden on core networks will continue to grow,” Kale says. “To maintain resiliency during AI-driven traffic surges, service providers will require more distributed inference capabilities, greater regional redundancy, and more sophisticated congestion management technologies to maintain reliability.”
Cloud-based network operators face the most significant challenge, Kale says, because traffic surges from AI-based workloads tend to be correlated by time zone and geography, driven by simultaneous global events such as the launch of a new AI feature or a large-scale rollout.
“To maintain resilience, cloud operators need greater bandwidth headroom, better workload placement, stronger tenant isolation, and monitoring tailored to the unique characteristics of AI traffic,” Kale says. “Those cloud operators who can deliver low latency and reliable performance during large AI surges will establish themselves as the preferred choice for enterprises that rely on AI to drive operational and customer-facing workflows.”
Cloud operators must also rethink their capacity planning, Rana says. “The old models based on gradual growth and predictable usage patterns don’t work anymore,” she says. “They need dynamic scaling systems that can provision resources in seconds, not minutes or hours, when AI traffic surges hit. This means keeping significantly more reserve capacity available than traditional models would suggest.”
