How a government contest launched a revolution in AI-based bug hunting

While the world alternates between panicking and fawning over Anthropic’s powerful new AI model Claude Mythos and its ability to discover serious software vulnerabilities, open-source AI systems are already revolutionizing the vulnerability-hunting landscape — at a far lower cost.

These increasingly sophisticated open-source tools are the product of the Defense Advanced Research Projects Agency’s (DARPA) Artificial Intelligence Cyber Challenge, a multiyear effort to spur the development of AI systems that can quickly find and fix bugs in America’s sprawling web of critical infrastructure. The vulnerability-hunting systems that emerged from DARPA’s contest didn’t get splashy launches like Claude Mythos or OpenAI’s similar new tool, but because they’re open source and much cheaper to run, they could help far more infrastructure providers, businesses and independent software developers.

With the DARPA competition in the rear-view mirror, the winning teams and other finalists are putting what they learned into practice to help secure open-source packages that quietly undergird the entire internet. While efforts to connect with critical infrastructure operators and their vendors remain nascent, DARPA and several competition winners told Cybersecurity Dive they’re thrilled with how effective the new AI tools have proven.

At a time when the U.S. cybersecurity workforce is stretched thin and adversaries are using AI to speed up their attacks, the nation’s best hope could be automated tools that find and help fix vulnerabilities before they lead to chaos.

Finding bugs everywhere

After DARPA announced its challenge’s three winners in August 2025, it created a $1.4 million bonus prize pot for competition finalists who used their AI systems to find and fix vulnerabilities in critically important software. The agency reviewed teams’ proposals to scrutinize important open-source packages and tracked how they engaged with the projects’ maintainers. Each of the seven competition finalists could earn up to $200,000, with a maximum of $10,000 per project.

By the time the paid vulnerability-hunting spree ended in March, the teams had found 83 vulnerabilities in more than 30 commercial and open-source projects, including Android, Linux, the popular database engine SQLite and the widely used data-storage tool Redis. Of the $1.4 million in prize money the government set aside, it awarded $830,000.

Since then, “the teams have continued to find and produce patches for additional vulnerabilities across other projects,” said Andrew Carney, the DARPA program manager who oversaw the competition and now liaises with teams in the new phase of their work.

Team Atlanta, the DARPA contest’s winner, found flaws in the U-Boot boot loader and several core Apache libraries, he said, while another finalist, 42-b3yond-6ug, identified vulnerabilities in the Linux kernel that could have let hackers cripple devices widely embedded in critical infrastructure.

Theori, the third-place team, has deployed its system, Xint, to find flaws in “all sorts of really widely used open-source projects that everyone on the internet relies on,” said Tyler Nighswander, a researcher at the company. Xint has identified vulnerabilities in the popular database tools Redis, Postgres and MariaDB, as well as Python, Linux and Apple’s XNU kernel, which powers MacOS and the iPhone.

AI has been particularly useful for finding “logic bugs” — flawed code that traditional vulnerability-assessment software wouldn’t flag as defective. As AI gets better at understanding context, “automated tools are able to push their boundaries more,” said Michael Brown, principal security engineer, at second-place team Trail of Bits.

And while finding flaws gets most of the attention, the AI systems’ ability to validate their findings and their automatically generated fixes might be their real superpower.

Critical infrastructure organizations typically run highly customized hardware and software and often struggle to test patches on their bespoke devices, if they can patch them at all. Because the vulnerability hunters had to develop patch-validation capabilities for DARPA’s competition, their latest systems contain those features.

“There’s so much power there, and there’s so much value for safety-critical, high-assurance-required systems,” Carney said. “When we do have these conversations [with infrastructure organizations], that’s where we end up spending a lot of our time.”

“It’s been a bit difficult to convince those slower companies [and] industries to adopt this tech.”

Tyler Nighswander

Researcher, Theori

Critical infrastructure roadblocks

While the teams have been busy fixing core internet architecture, DARPA has also tried to connect them with the operators of critical infrastructure and the vendors who supply their equipment. Much of America’s computerized industrial machinery is old, insecure and poorly maintained, and DARPA hopes volunteer vulnerability hunters can root out major flaws before hackers exploit them.

DARPA has briefed several sector coordinating councils (SCCs) on the AI tools’ potential. Carney recently spoke at a meeting of the Health Sector Coordinating Council to share the competition teams’ progress. “Forums like that are what we’re really focused on,” he said, “because they’re very efficient at getting the message out.”

The introductions that DARPA has facilitated between vulnerability hunters and critical infrastructure operators have had “varying degrees of success,” according to Nighswander, who said many organizations aren’t eager to embrace new technologies. “It’s been really slow. Different sectors have different adoption cycles and uptake willingness.”

Some infrastructure firms don’t understand how the AI systems would work in their environments. Others decline AI help because they think their human security teams are sufficient. Still others are interested in AI vulnerability detection but can’t get the necessary permissions. “Trying to figure out how to cut some of that red tape would be nice,” Nighswander said, “because so far that’s been the biggest limitation.”

Theori has signed vulnerability hunting agreements with “fewer than five” critical infrastructure entities, Nighswander said. “It’s been a bit difficult to convince those slower companies [and] industries to adopt this tech.”

The biggest success story so far has been Trail of Bits’ partnership with the Department of Health and Human Services to hunt for flaws in medical devices, a project that Brown said has fixed many vulnerabilities through strong partnerships with healthcare providers and their suppliers.

Because infrastructure vendors routinely use lightweight open-source packages — especially in embedded devices — the vulnerability hunters’ existing work will have significant downstream effects, even for vendors that don’t want to engage directly.

“That’s been a way of providing value and then jump-starting those conversations with the industry-specific or sector-specific companies and entities,” Carney said.

A computer screen shows the website for the AI platform Claude, with the tagline — Anthropic’s announcement of Claude Mythos upended the business world, but similar open-source tools have been available for months.

Michael M. Santiago via Getty Images

Busting Mythos

When Anthropic announced Claude Mythos Preview and said it was too dangerous to release publicly, prompting shock and alarm across the business community, the former DARPA competitors mostly just shrugged.

“It’s very cool,” Nighswander said, but “this is the world that we’ve been living in for a while now.”

Still, the new publicity surrounding AI vulnerability detection could benefit the teams behind the open-source systems. “It leads people to find out that, ‘Oh, this is a thing that my company should be worried about,’” Nighswander said.

The DARPA contest finalists even have an advantage over OpenAI and Anthropic, because their open-source tools are far cheaper than the big AI companies’ products, which can cost tens of thousands of dollars in access tokens.

Using Claude for vulnerability hunting is “kind of like showing up to a fancy restaurant with no prices on the menu,” said Trent Brunson, Trail of Bits’ director of research and development. “You know you have a large code base. You don’t know what bugs you have. … Companies might spend $50,000, $75,000 on tokens and not even realize it, and then they might come up with very low-information bugs.”

Cash-strapped critical infrastructure firms might pass over OpenAI and Anthropic’s tools in favor of the DARPA finalists’ much cheaper but similarly effective services. “More companies are going to look at the bottom line,” Brunson said, “rather than just throw AI tokens at it.”

Beyond the competition

As they’ve left the DARPA competition behind, Theori, Trail of Bits and their peers have taken different approaches to implementing their own AI systems.

Theori is working with open-source package maintainers, but it has also commercialized Xint and is contracting with businesses to evaluate their products. “We’ve been running that quite successfully so far,” Nighswander said. Trail of Bits, by contrast, is focusing mostly on open-source packages. Commercializing its tool, Buttercup, would “fundamentally” change the company, Brunson said.

The vulnerability hunters have had to modify their AI systems as they’ve moved out of the competition environment. The tools need to be able to find real vulnerabilities, not the “synthetic” flaws that DARPA created for the challenge. They need to produce reports that humans can easily read, not just data to feed into a scoring algorithm. And they need to be able to evaluate a wider range of inputs than what the heavily structured competition required.

Trail of Bits built an entirely new system for its work analyzing medical devices’ firmware, which is typically written in binary, unlike software, which involves source code. Binary code is the main way embedded devices communicate, but AI has a hard time processing it because it doesn’t look like natural language the way source code does. Once that problem is solved, Brunson said, “the world’s our oyster.”

“I’m extraordinarily excited at the performance and impact that the technology continues to have.”

Andrew Carney

Program Manager, DARPA

AI’s true sea change

DARPA, the competition finalists and cybersecurity experts said it’s almost impossible to overstate how much AI will change the process of finding vulnerabilities.

Software security assessments that used to take multiple people six months can now be done by AI in a matter of hours, often with better results, Nighswander said. “That scale and efficiency is incredible.”

The technology is obviously a double-edged sword. Nick Reese, the former director of emerging technology policy at the Department of Homeland Security, said the same tools that “present a significant opportunity for security professionals” also create “a potential advantage for attackers if they get access to the same data.”

But DARPA views things optimistically. It took years for the self-driving cars that emerged from DARPA’s first challenge in 2004 to hit the market; with the AI bug-fixing competition, Carney said, the agency never thought it’d see “a technical miracle” that was “economically feasible at the same time.”

“I’m extraordinarily excited,” Carney said, “at the performance and impact that the technology continues to have.”

Source link

How a government contest launched a revolution in AI-based bug hunting

Finding bugs everywhere

Critical infrastructure roadblocks

Busting Mythos

Beyond the competition

AI’s true sea change

Microsoft May security patch fails for some due to boot partition size glitch – Computerworld

OpenClaw Vulnerabilities Could Enable Full AI Agent Takeover

Related Articles

Leave a Comment Cancel Reply