How organizations can secure their AI code - Source: www.csoonline.com

AI-generated code is surfacing everywhere, even in places where its use is officially banned. Here’s what cybersecurity leaders can do to ensure it doesn’t jeopardize security.

In 2023, the team at data extraction startup Reworkd was under tight deadlines. Investors pressured them to monetize the platform, and they needed to migrate everything from Next.js to Python/FastAPI. To speed things up, the team decided to turn to ChatGPT to do some of the work. The AI-generated code appeared to function, so they implemented it directly into their production environment. And then called it a day.

The next morning, they woke up “with over 40 Gmail notifications of user complaints,” co-founder Ashim Shrestha wrote in a blog post. “Everything seemed to have set on fire overnight. None of these users could subscribe.”

A bug on line 56, which was AI-generated, caused a unique ID collision during the subscription process, and it took them five days to identify the issue and fix it. That bug, that “single ChatGPT mistake cost us $10,000+,” Shrestha wrote.

While Reworkd was open about their error, many similar incidents remain unknown. CISOs often learn about them behind closed doors. Financial institutions, healthcare systems, and e-commerce platforms have all encountered security challenges as code completion tools can introduce vulnerabilities, disrupt operations, or compromise data integrity. Many of the risks are associated with AI-generated code, library names that are the result of hallucinations, or the introduction of third-party dependencies that are untracked and unverified.

“We’re facing a perfect storm: increasing reliance on AI-generated code, rapid growth in open-source libraries, and the inherent complexity of these systems,” says Jens Wessling, chief technology officer at Veracode. “It’s only natural that security risks will escalate.”

Often, code completion tools like ChatGPT, GitHub Copilot, or Amazon CodeWhisperer are used covertly. A survey by Snyk showed that roughly 80% of developers ignore security policies to incorporate AI-generated code. This practice creates blind spots for organizations, who often struggle to mitigate security and legal issues that appear as a result.

As automated coding tools see broader adoption, the discussion around the risks they pose has become a top priority for many CISOs and cybersecurity leaders. While these tools are revolutionary and can accelerate development, they also introduce a variety of security issues, some of which are hard to detect.

Ensure software packages are identified

While the rise of AI-powered code completion tools has ushered in a new era of efficiency and innovation in software development, this progress comes with important security risks. “AI-generated code often blends seamlessly with human-developed code, making it difficult to tell where security risks are coming from,” Wessling says.

Sometimes, the code that’s automatically generated can include third-party libraries or phantom dependencies — dependencies that are not explicitly declared in a manifest file. These unreported software packages might not be identified during a scan and can potentially hide vulnerabilities.

One way to address this is to use software composition analysis (SCA) and software supply chain security tools which help identify the libraries that are in use, the vulnerabilities and the potential legal and compliance issues that might bring.

“Properly tuned SCA that looks deeper than the surface might be the answer,” says Grant Ongers, CSO and co-founder of Secure Delivery. This solution is not perfect, though. “The bigger issue with SCA tends to be including vulnerabilities in functions in libraries that are never called,” he adds.

Endor Labs’ 2024 Dependency Management Report found that 56% of reported library vulnerabilities are in phantom dependencies for organizations with significant phantom dependency footprints. “We expect this to be an increasing challenge in organizations, and tools need to be able to give security teams visibility into all software components in use for both compliance and risk-management purposes,” says Darren Meyer, staff research engineer at Endor Labs.

That is why it is important organizations have an accurate inventory of their software components. “Without it, you can’t identify, much less manage, risk coming from AI libraries, or indeed from any third-party library,” Meyer adds. “If you don’t have a way to identify the AI libraries — which are part of software being written, published, and/or consumed by your organization — then you may have a compliance risk.”

Organizations also expose themselves to risks when developers download machine learning (ML) models or datasets from platforms like Hugging Face.

“In spite of security checks on both ends, it may still happen that the model contains a backdoor that becomes active once the model is integrated,” says Alex Ștefănescu, open-source developer at the Organized Crime and Corruption Reporting Project (OCCRP). “This could ultimately lead to data being leaked from the company that used the malicious models.”

At the start of 2024, the Hugging Face platform hosted at least 100 malicious ML models, some of which were capable of executing code on victims’ machines, according to a JFrog report.

When it comes to code completion tools like GitHub Copilot, Ștefănescu worries about hallucinations. “An LLM will always generate the most statistically probable continuation of a given prompt, so there are no real guarantees in place that it will generate a real package from PIPy, for example, after the word ‘import’,” they say. “Some attackers are aware of this and register package names on platforms like npm and PIPy, filling in some functionality that code completion tools suggest in order to make the packages seem legitimate.”

If these packages are imported into real applications, they can do real damage.

To address these risks, CISOs can establish protocols for downloading and integrating ML models or datasets from external platforms such as Hugging Face. This includes implementing automated scanning tools to detect malicious code or backdoors, having a policy that only allows the use of models from verified publishers, or conducting internal testing in isolated environments.

Ensure no sensitive information is leaking through AI coding assistants

Nearly half of organizations are concerned about AI systems learning and reproducing patterns that include sensitive information, according to GitGuardian’s Voice of Practitioners 2024 survey. “This is particularly worrying because these tools suggest code based on patterns learned from training data, which could inadvertently include hard-coded credentials, for instance,” Thomas Segura, reports author at GitGuardian, says.

Companies based in the US were particularly worried about the possibility of sensitive information inadvertently leaking into codebases because of developers using AI-powered code completion tools.

While there’s no silver bullet, organizations can do a couple of things to decrease this risk. “Using self-hosted AI systems that don’t report data back is an answer that works,” Ongers says. “Another is to ensure data cannot enter.”

Look outside traditional development teams

Not all AI-based tools are coming from teams full of software engineers. “We see a lot of adoption being driven by data analysts, marketing teams, researchers, etc. within organizations,” Meyer says.

These teams aren’t traditionally developing their own software but are increasingly writing simple tools that adopt AI libraries and models, so they’re often not aware of the risks involved. “This combination of shadow engineering with lower-than-average application security awareness can be a breeding ground for risk,” he adds.

To make sure these teams are working safely, CISOs must consider forming relationships with these teams early in the process. Cybersecurity leaders might also want to set up training programs tailored to non-traditional development teams to educate data analysts, marketing professionals, and researchers on the potential risks associated with AI-based tools and libraries.

Safe resources for application security

“Security budgets don’t generally grow at the same pace that software development accelerates, and AI adoption is only widening that gap,” Meyer says. Application security is often underfunded in most organizations, yet allocating sufficient time and resources to it is essential, as AI adoption and AI-assisted coding accelerate the pace of software development.

“A portfolio of high-quality security tools that can help address this gap is no longer optional,” Meyer says. “And while tools are critical to closing the gap, so are AppSec and ProdSec staff that can effectively partner with developers — even non-traditional developers — and understand the technical, compliance, and security implications of AI.”

When it comes to securing enough resources to protect AI systems, some stakeholders might hesitate, viewing it as an optional expense rather than a critical investment. “AI adoption is a divisive topic in many organizations, with some leaders and teams being ‘all-in’ on adoption and some being strongly resistant,” Meyer says. “This tension can present challenges for insightful CISOs and business information security officers (BISOs).”

CISOs that are aware of both the advantages and disadvantages of this might try to set controls to manage risks effectively, but this might create the perception of holding the organization back from innovating if they don’t explain what they do properly. “Organizations need to develop comprehensive strategies that balance the productivity benefits of AI tools with robust security practices,” Segura says.

The risk of unsafe AI-powered open-source libraries

With AI changing the practices of writing code, the industry is navigating a fine line between embracing the opportunities AI can offer and mitigating the risks it can pose. Ongers says that this change of paradigm brings about several concerns. “The biggest, I think, is one of two extremes: either over reliance on AI that’s flawed, or ignoring AI altogether,” he says.

With more than five million open-source libraries available today and an estimated half billion more to be released in the next decade, many of which will be powered by AI, organizations face an unprecedented challenge in managing the security risks associated with their software ecosystems.

“This is unfamiliar territory for the industry, and I do believe risk needs to be addressed at an industry level to ensure the safety, security, and quality of the software that powers our world,” Wessling says.

It’s also important how these issues will be addressed. Right now, there’s an explosion of security vendors that claim to secure AI, but not all of them are doing a meticulous job. As a result, “organizations may be left with neither the visibility they need to make intelligent risk decisions nor the capabilities they need to act on those decisions,” Meyer says. “CISOs don’t want to find themselves in the situation of building new capabilities when there’s been a breach in the news — or worse, when it’s their organization that’s been breached.”

To prevent such situations, CISOs must prioritize investing in their people as much as in AI technologies. “The software development industry needs to see the true priority of training and enhancing the knowledge of its workforce,” Ștefănescu says. “Instead of paying code completion tool subscriptions, it should invest in the knowledge development of its staff.”

SUBSCRIBE TO OUR NEWSLETTER