Advertisement

The initiative seeks to patch vulnerabilities in open-source code before they are exploited by would-be attackers. Now comes the hard part — putting the systems to the test in the real world.


Listen to this article

0:00

Learn more.

This feature uses an automated voice, which may result in occasional errors in pronunciation, tone, or sentiment.

(Getty Images)

The Pentagon’s two-year public competition to spur the development of cyber-reasoning systems that use large language models to autonomously find and patch vulnerabilities in open-source software concluded Friday with $8.5 awarded to three teams of security specialists at DEF CON. 

The Defense Advanced Research Project Agency’s AI Cyber Challenge seeks to address a persistent bottleneck in cybersecurity — patching vulnerabilities before they are discovered or exploited by would-be attackers. 

“We’re living in a world right now that has ancient digital scaffolding that’s holding everything up,” DARPA Director Stephen Winchell said. “A lot of the code bases, a lot of the languages, a lot of the ways we do business, and everything we’ve built on top of it has all incurred huge technical debt… It is a problem that is beyond human scale.” 

The seven semifinalists that earned their spot out of 90 teams convened at last year’s DEF CON were scored against their models’ ability to quickly, accurately and successfully identify and generate patches for synthetic vulnerabilities across 54 million lines of code. The models discovered 77% of the vulnerabilities presented in the final scoring round and patched 61% of those synthetic defects at an average speed of 45 minutes, the competition organizers said.

Advertisement

The models also discovered 18 real zero-day vulnerabilities, including six in the C programming language and 12 in Java codebases. The teams’ models patched none of the C codebase zero-days, but automatically patched 11 of the Java zero-days, according to the final results shared Friday.

Team Atlanta took the first-place prize of $4 million, Trail of Bits won second place and $3 million in prize money, and Theori ranked third, taking home $1.5 million. The competition’s organizers allocated an additional $1.4 million in prize money for participants who can demonstrate when their technology is deployed into critical infrastructure. 

Representatives from the three winning teams said they plan to reinvest the majority of the prize money back into research and further development of their cyber-reasoning systems or explore ways to commercialize the technology.

Four of the models developed under the competition were made available as open source Friday, and the three remaining models will be released in the coming weeks, officials said.

“Our hope is this technology will harden source code by being integrated during the development stage, the most critical point in the software lifecycle,” Andrew Carney, program manager of the competition, said during a media briefing about the challenge last week. 

Advertisement

Open sourcing the cyber-reasoning systems and the AI Cyber Challenge’s infrastructure should also allow others to experiment and improve upon what the competition helped foster, he said. DARPA and partners across government and the private sector involved in the program are pursuing paths to push the technology developed during the competition into open-source software communities and commercial vendors for broader adoption.

DARPA’s AI Cyber Challenge is a public-private endeavor, with Google, Microsoft, Anthropic and OpenAI each donating $350,000 in LLM credits and additional support. The initiative seeks to test AI’s ability to identify and patch vulnerabilities in open-source code of vital importance throughout critical infrastructure, including health care. 

Jim O’Neill, deputy secretary of the Department of Health and Human Services, spoke to the importance of this work during the AI Cyber Challenge presentation at DEF CON. “Health systems are among the hardest networks to secure. Unlike other industries, hospitals must maintain 24/7 uptime, and they don’t get to reboot. They rely on highly specialized, legacy devices and complex IT ecosystems,” he said. 

“As a result, patching a vulnerability in health care can take an average of 491 days, compared to 60 to 90 days in most other industries,” O’Neill added. “Many cybersecurity products, unfortunately, are security theater. We need assertive proof-of-work approaches to keep networks, hospitals and patients safer.”

Health officials and others directly involved in the AI Cyber Challenge acknowledged the problems posed by insecure software are vast, but said the results showcased from this effort provide a glimmer of hope. 

Advertisement

“The magnitude of the problem is so incredibly overwhelming and unreasonable that this is starting to make it so that maybe we can actually secure networks — maybe,” Jennifer Roberts, director of resilient systems at HHS’s Advanced Research Projects Agency for Health, said during a media briefing at DEF CON after the winners were announced. 

Kathleen Fisher, director of DARPA’s Information Innovation Office, shared a similar cautiously optimistic outlook. “Software runs the world, and the software that is running the world is riddled with vulnerabilities,” she said.

“We have this sense of learned helplessness, that there’s just nothing we can do about it. That’s the way software is,” she continued. The AI Cyber Challenge “points to a brighter future where software does what it’s supposed to do and nothing else.”

Matt Kapko

Written by Matt Kapko

Matt Kapko is a reporter at CyberScoop. His beat includes cybercrime, ransomware, software defects and vulnerability (mis)management. The lifelong Californian started his journalism career in 2001 with previous stops at Cybersecurity Dive, CIO, SDxCentral and RCR Wireless News. Matt has a degree in journalism and history from Humboldt State University.

Latest Podcasts

Government

Technology