UK Government AI Hackathons Uncover 407 Vulnerabilities for $16K
In brief
- UK GC3 ran weekly AI hackathons using Claude Mythos and GPT-5.5, identifying 407 vulnerabilities across nine government departments.
- All critical vulnerabilities found were remediated with no evidence of prior exploitation.
- Project cost £13,000 (~$16K) in AI tokens; personnel and infrastructure costs excluded.
Structured methodology outweighs model choice
The hackathons used Anthropic's Claude Mythos and OpenAI's GPT-5.5 alongside traditional scanning techniques and human oversight. But the real finding wasn't about which AI won — it was about how the teams worked. Architectural design and multi-stage pipelines mattered far more than which specific AI model was deployed.
The structured methodology proved decisive. The combination of automated scanning with expert human review surfaced vulnerabilities that neither approach alone would have caught. This wasn't AI replacing security experts. It was AI augmenting them.
Scope and remediation
GC3 is a partnership between the National Cyber Security Centre and the Department for Science, Innovation and Technology. The hackathons were run as weekly in-person events, with additional involvement from the AI Safety Institute.
Among the 407 findings were vulnerabilities serious enough to potentially allow authentication bypass, data exposure, and remote code execution. Every critical vulnerability identified has been remediated. None showed evidence of prior exploitation.
The cost math is striking. That £13,000 price tag covers only the AI token costs, not personnel or infrastructure. For a security audit spanning nine departments, the token bill alone represents an unusually lean operational expense.
What comes next
The case study, published on GOV.UK around mid-June 2026, focused on publicly available government code repositories. A follow-on phase targeting closed-source code is already anticipated. The NCSC has also issued guidance specifically addressing the new risks that AI itself introduces to cybersecurity.
The UK's experiment validates a thesis that's becoming clearer across security teams: AI works best not as a replacement for human judgment, but as a force multiplier for it. The hackathons didn't automate security away. They extended the reach of security experts and made their time count for more.
"The results validate a specific thesis: AI augments human expertise rather than replacing it."


