Preloader Image

After two years of competition, the winners of the AI Cybersecurity Challenge (AIxCC) were revealed at the DEFCON 33 hacking event on August 9.

Team Atlanta was revealed as the winning team. The group is a powerhouse collaboration of experts from the Georgia Institute of Technology (Georgia Tech), Samsung Research, the Korea Advanced Institute of Science & Technology and the Pohang University of Science and Technology. They won a $4m prize.

Trail of Bits, a New York-based cybersecurity firm specializing in cutting-edge security research, came in second, securing a $3m prize in the high-stakes AI Cyber Challenge.

The third best-performing team was Theori, a group of AI researchers and security professionals spanning the US and South Korea, rounding out the podium in Defense Advanced Research Projects Agency’s (DARPA) competitive showcase, with a prize of $1.5m.

The three cyber reasoning systems developed by the trio are part of a set of four models that have been open-sourced and are already available for all to use.

“The three other models will be made available over the next few weeks,” DARPA director Stephen Winchell said during the announcement session at DEFCON 33.

AIxCC: Two Years in the Making

Announced at Black Hat 2023 by Perri Adams, program manager at DARPA, AIxCC was a competition for computer scientists, AI experts, software developers and other cybersecurity specialists to create a new generation of AI-powered cybersecurity tools for securing US critical infrastructure and government services.

Specifically, DARPA and the Advanced Research Projects Agency for Health (ARPA-H), another US government agency, funded this project to explore whether AI can help find and fix software vulnerabilities more effectively and usher in a future where attacks can be stopped as fast as they are detected.

The seven finalists (Team Atlanta, Trail of Bits, Theori, All You Need IS A Fuzzing Brain, Shellphish, 42-b3yond-6ug and Lacrosse) were announced at DEFCON 32 in August 2024 and were awarded $2m each.

Tech giants Google, Microsoft, Anthropic, and OpenAI collectively backed the competition with over $1m each in AI model credits, ensuring teams had the computational firepower needed to tackle critical infrastructure security challenges.

Speaking before the winners’ announcement, Jim O’Neill, Deputy Secretary for the US Department of Health and Human Services (HHS), said that DARPA and ARPA-H will inject an additional $1.4m on top of the $29.5m planned for prize money.

During a post-announcement press conference, Andrew Carney, program manager for AIxCC, revealed that the additional funding will support finalists in refining their tools for real-world deployment.

The distribution of these additional funds will occur in phased increments, subject to the winning teams demonstrating measurable adoption of their tools by key infrastructure organizations.

AI-Powered Approaches Patch Flaws Faster at $152 Per Fix

During the final phase of AIxCC, conducted over the past year, participating teams were mandated to deploy their systems within a controlled, simulated environment deliberately seeded with flaws introduced by the competition organizers.

The seven finalist teams uncovered 54 of the 70 synthetic vulnerabilities intentionally embedded in the challenge, representing a 77% detection rate.

This is a significant improvement compared to last year’s semifinal round, during which teams discovered only 37% of the known vulnerabilities.  

They were able to patch 43 of these 54.

The seven finalist teams also detected 18 previously unknown real-world flaws that were not planted by organizers and patched 11 of those.

These zero-day discoveries highlight the models’ ability to identify critical weaknesses beyond controlled test environments.

“We are now in the process of disclosing [these real-world zero-day vulnerabilities] to maintainers,” Carney said on stage.

Speed and efficiency were defining strengths. On average, the AI systems patched vulnerabilities in just 45 minutes, a dramatic improvement over traditional manual processes.

Jennifer Roberts, director of resilient systems at ARPA-H, told the press that these capacities are particularly important in the healthcare sector, where it takes 491 days on average to patch a vulnerability, compared to 60 to 90 days in other sectors.

Additionally, the unit cost for task completion in the competition was quantified at $152, demonstrating a marked cost advantage over traditional human workforce expenditures.

“This is the new floor – it will rapidly improve. “To make ourselves safer, we need to make everyone safer. This is the way,” said Carney.

Winchell added, “We’re living in a world right now that has ancient digital scaffolding that’s holding everything up. A lot of the code bases, a lot of the languages, a lot of the ways we do business and everything we’ve built on top of it is all incurred huge technical debt over the years.”

Prize Money Fuels Future AI Security Research for Top Teams

The winning team, Team Atlanta, has achieved success in several hacking competitions and academic conferences. For AIxCC, they mostly used traditional vulnerability discovery methods (e.g. dynamic analysis, static analysis, fuzzing) with OpenAI’s large language models (LLMs), such as o4-mini, GPT-4o and o3.

They topped all but one category and discovered the most real-world vulnerabilities out of the seven teams.

Asked what his team would do with the money, Taesoo Kim, the team’s chief leader and a Professor at Georgia Tech, said they agreed to offer a big part of the prize money to the institute to help support future developments in AI-powered vulnerability research.

The silver medal winner, Trail of Bits, is a small business made up of 10 engineers with deep experience in developing novel software security tools, including their own cyber reasoning system, Buttercup.

One of their most notable partners is the UK’s AI Security Institute.

For AIxCC, Trail of Bits combined Buttercup and traditional vulnerability discovery methods with LLMs like Anthropic’s Claude Sonnet 4, GPT-4.1 and GPT-4.1 mini. Their achievements include the highest number of unique vulnerability categories, also known as Common Weaknesses and Enumeration categories (CWEs).

The third winner, Theori, has a long history of winning security competitions, including eight wins at DEFCON capture the flag finals.