Source: go.theregister.com – Author: Thomas Claburn
The time from vulnerability disclosure to proof-of-concept (PoC) exploit code can now be as short as a few hours, thanks to generative AI models.
Matthew Keely, of Platform Security and penetration testing firm ProDefense, managed to cobble together a working exploit for a critical vulnerability in Erlang’s SSH library (CVE-2025-32433) in an afternoon, although the AI he used had some help – the model was able to use code from an already published patch in the library to hunt down which holes had been filled and figure out how to exploit them.
Inspired by a post from another security firm, Horizon3.ai, about the ease with which exploit code for the SSH library bug could be developed, Keely wondered whether an AI model – in this case, OpenAI’s GPT-4 and Anthopic’s Claude Sonnet 3.7 – could craft an exploit for him.
“Turns out — yeah, it kinda can,” Keely explained. “GPT-4 not only understood the CVE description, but it also figured out what commit introduced the fix, compared that to the older code, found the diff, located the vuln, and even wrote a PoC. When it didn’t work? It debugged it and fixed it too.”
It’s not the first time AI has proven its mettle at not just finding security holes but also ways to exploit them. Google’s OSS-Fuzz project has been using large language models (LLMs) to help find vulnerabilities. And computer scientists with University of Illinois Urbana-Champaign have shown that OpenAI’s GPT-4 can exploit vulnerabilities by reading CVEs.
But to see it done in just hours underscores just how little time defenders have to respond when the attack production pipeline can be automated.
- Cybercrooks are telling ChatGPT to create malicious code
- OpenAI’s GPT-4 can exploit real vulnerabilities by reading security advisories
- Cast a hex on ChatGPT to trick the AI into writing exploit code
- How to weaponize LLMs to auto-hijack websites
Keely told GPT-4 to generate a Python script that compared – diff’ed, basically – the vulnerable and patched portions of code in the vulnerable Erlang/OPT SSH server.
“Without the diff of the patch, GPT would not have come close to being able to write a working proof-of-concept for it,” Keely told The Register.
“In fact, before giving GPT the diffs, its first attempt was to actually write a fuzzer and to fuzz the SSH server. Where GPT did excel, is it was able to provide all of the building blocks needed to create a lab environment, including Dockerfiles, Erlang SSH server setup on the vulnerable version, and fuzzing commands. Not to say fuzzing would have found this specific vulnerability, but it definitely breaks down some previous learning gaps attackers would have had.”
Armed with the code diffs, AI model produced a list of changes and Keely then asked, “Hey, can you tell me what caused this vulnerability?”
And it did.
“GPT didn’t just guess,” Keely wrote. “It explained the why behind the vulnerability, walking through the change in logic that introduced protection against unauthenticated messages — protection that didn’t exist before.”
The AI model followed up by asking whether Keely wanted a full PoC client, a Metasploit-style demo, or a patched SSH server for tracing?
GPT-4 didn’t quite ace the test. Its initial PoC code didn’t work – a common experience for any AI-generated code that’s more than a short snippet.
So Keely tried another AI helper, Cursor with Anthopic’s Claude Sonnet 3.7, asking it to fix the non-working PoC. And to his surprise, it worked.
This process would have required specialized Erlang knowledge and hours of manual debugging. Today, it takes an afternoon with the right prompts
“What started as curiosity about a tweet turned into a deep exploration of how AI is changing vulnerability research,” Keely wrote. “A few years ago, this process would have required specialized Erlang knowledge and hours of manual debugging. Today, it takes an afternoon with the right prompts.”
Keely told The Register there’s been a noticeable increase in the propagation speed of threats.
“It is not just that more vulnerabilities are being published,” he said. “They are also being exploited much faster, sometimes within hours of becoming public.
“This shift is also marked by a higher level of coordination among threat actors. We are seeing the same vulnerabilities being used across different platforms, regions, and industries in a very short time.
Microsoft rated this bug as low exploitability. Miscreants weaponized it in just 8 days
“That level of synchronization used to take weeks, and now it can happen in a single day. To put this in perspective, there was a 38 percent increase in published CVEs from 2023 to 2024. That is not just an increase in volume, but a reflection of how much faster and more complex the threat landscape has become. For defenders, this means shorter response windows and a greater need for automation, resilience, and constant readiness.”
Asked what this means for enterprises trying to defend their infrastructure, Keely said: “The core principle remains the same. If a vulnerability is critical, your infrastructure should be built to allow safe and fast patching. That is a basic expectation in modern DevOps.
“What changes with AI is the speed at which attackers can go from disclosure to working exploit. The response timeline is shrinking. Enterprises should treat every CVE release as if exploitation could start immediately. You no longer have days or weeks to react. You need to be ready to respond the moment the details go public.” ®
Original Post URL: https://go.theregister.com/feed/www.theregister.com/2025/04/21/ai_models_can_generate_exploit/
Category & Tags: –
Views: 2