OWASP lists 10 most critical large language model vulnerabilities - Source: www.csoonline.com

The list highlights the impact and prevalence of the 10 most critical vulnerabilities found in artificial intelligence applications based on LLMs.

The Open Worldwide Application Security Project (OWASP) has published the top 10 most critical vulnerabilities often seen in large language model (LLM) applications, highlighting their potential impact, ease of exploitation, and prevalence. Examples of vulnerabilities include prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution.

The list aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing LLMs, raising awareness of vulnerabilities, suggesting remediation strategies, and improving the security posture of LLM applications, OWASP said.

The emergence of generative AI chat interfaces built on LLMs and their impact on cybersecurity is a major talking point. Concerns about the risks these new technologies could introduce range from the potential issues of sharing sensitive business information with advanced self-learning algorithms to malicious actors using them to significantly enhance attacks. Some countries, US states, and enterprises are considering or have ordered bans on the use of generative AI technology such as ChatGPT on data security, protection, and privacy grounds.

Here are the top 10 most critical vulnerabilities affecting LLM applications, according to OWASP.

1. Prompt injections

Prompt injections involve bypassing filters or manipulating the LLM using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions, OWASP wrote. “These vulnerabilities can lead to unintended consequences, including data leakage, unauthorized access, or other security breaches.” Common prompt injection vulnerabilities include bypassing filters or restrictions by using specific language patterns or tokens, exploiting weaknesses in the LLM’s tokenization or encoding mechanisms, and misleading the LLM to perform unintended actions by providing misleading context.

An example of an attack scenario is a malicious user bypassing a content filter by using specific language patterns, tokens, or encoding mechanisms that the LLM fails to recognize as restricted content, allowing the user to perform actions that should be blocked, OWASP said.

Preventative measures for this vulnerability include:

Implement strict input validation and sanitization for user-provided prompts.
Use context-aware filtering and output encoding to prevent prompt manipulation.
Regularly update and fine-tune the LLM to improve its understanding of malicious inputs and edge cases.

2. Data leakage

Data leakage occurs when an LLM accidentally reveals sensitive information, proprietary algorithms, or other confidential details through its responses. “This can result in unauthorized access to sensitive data or intellectual property, privacy violations, and other security breaches,” said OWASP.

Incomplete or improper filtering of sensitive information in the LLM’s responses, overfitting/memorization of sensitive data in the LLM’s training process, and unintended disclosure of confidential information due to LLM misinterpretation or errors are common data leakage vulnerabilities. An attacker could deliberately probe the LLM with carefully crafted prompts, attempting to extract sensitive information that the LLM has memorized from its training data, or a legitimate user could inadvertently ask the LLM a question that reveals sensitive/confidential information, OWASP wrote.

Preventative measures for data leakage include:

Implement strict output filtering and context-aware mechanisms to prevent the LLM from revealing sensitive information.
Use differential privacy techniques or other data anonymization methods during the LLM’s training process to reduce the risk of overfitting or memorization.
Regularly audit and review the LLM’s responses to ensure that sensitive information is not being disclosed inadvertently.

3. Inadequate sandboxing

If an LLM is not properly isolated when it has access to external resources or sensitive systems, inadequate sandboxing can lead to potential exploitation, unauthorized access, or unintended actions by the LLM. Insufficient separation of the LLM environment from other critical systems or data stores, improper restrictions that allow the LLM to access sensitive resources, and LLMs performing system-level actions/interacting with other processes are common inadequate LLM sandboxing vulnerabilities, OSWAP said.

An attack example would be a malicious actor exploiting an LLM’s access to a sensitive database by crafting prompts that instruct the LLM to extract and reveal confidential information. Preventative measures include:

Isolate the LLM environment from other critical systems and resources.
Restrict the LLM’s access to sensitive resources and limiting its capabilities to the minimum required for its intended purpose.
Regularly audit and review the LLM’s environment and access controls to ensure that proper isolation is maintained.

4. Unauthorized code execution

Unauthorized code execution occurs when an attacker exploits an LLM to execute malicious code, commands, or actions on the underlying system through natural language prompts. Common vulnerabilities include non-sanitized or restricted user input that allows attackers to craft prompts that trigger the execution of unauthorized code, insufficient restrictions on the LLM’s capabilities, and unintentionally exposing system-level functionality or interfaces to the LLM.

OWASP cited two attack examples: an attacker crafting a prompt that instructs the LLM to execute a command that launches a reverse shell on the underlying system, granting the attacker unauthorized access, and the LLM unintentionally being allowed to interact with a system-level API, which an attacker manipulates to execute unauthorized actions on the system.

Teams can help prevent unauthorized code execution with these actions:

Implement strict input validation and sanitization processes to prevent malicious or unexpected prompts from being processed by the LLM.
Ensure proper sandboxing and restrict the LLM’s capabilities to limit its ability to interact with the underlying system.

5. Server-side request forgery vulnerabilities

Server-side request forgery (SSRF) vulnerabilities occur when an attacker exploits an LLM to perform unintended requests or access restricted resources such as internal services, APIs, or data stores. Insufficient input validation, allowing attackers to manipulate LLM prompts to initiate unauthorized requests and misconfigurations in network or application security settings, exposing internal resources to the LLM, are common SSRF vulnerabilities, OWASP said.

To execute an attack, an attacker could craft a prompt that instructs the LLM to make a request to an internal service, bypassing access controls and gaining unauthorized access to sensitive information. They could also exploit a misconfiguration in the application’s security settings that allows the LLM to interact with a restricted API, accessing or modifying sensitive data. Preventative measures include:

Implement rigorous input validation and sanitization to prevent malicious or unexpected prompts from initiating unauthorized requests.
Regularly audit and review network/application security settings to ensure that internal resources are not inadvertently exposed to the LLM.

6. Overreliance on LLM-generated content

Overreliance on LLM-generated content can lead to the propagation of misleading or incorrect information, decreased human input in decision-making, and reduced critical thinking, according to OSAWP. “Organizations and users may trust LLM-generated content without verification, leading to errors, miscommunications, or unintended consequences.” Common issues related to overreliance on LLM-generated content include accepting LLM-generated content as fact without verification, assuming LLM-generated content is free from bias or misinformation, and relying on LLM-generated content for critical decisions without human input or oversight, OWASP added.

For example, if a company relies on an LLM to generate security reports and analysis and the LLM generates a report containing incorrect data which the company uses to make critical security decisions, there could be significant repercussions due to the reliance on inaccurate LLM-generated content. Rik Turner, a senior principal analyst for cybersecurity at Omdia, refers to this as LLM hallucinations. “If it comes back talking rubbish and the analyst can easily identify it as such, he or she can slap it down and help train the algorithm further. But what if the hallucination is highly plausible and looks like the real thing? In other words, could the LLM in fact lend extra credence to a false positive, with potentially dire consequences if the analyst goes ahead and takes down a system or blocks a high-net-worth customer from their account for several hours?”

7. Inadequate AI alignment

Inadequate AI alignment occurs when the LLM’s objectives and behavior do not align with the intended use case, leading to undesired consequences or vulnerabilities. Poorly defined objectives resulting in the LLM prioritizing undesired/harmful behaviors, misaligned reward functions or training data creating unintended model behavior, and insufficient testing and validation of LLM behavior are common issues, OWASP wrote. If an LLM designed to assist with system administration tasks is misaligned, it could execute harmful commands or prioritize actions that degrade system performance or security.

Teams can prevent inadequate AI alignment vulnerabilities with these actions:

Define the objectives and intended behavior of the LLM during the design and development process.
Ensure that the reward functions and training data are aligned with the desired outcomes and do not encourage undesired or harmful behavior.
Regularly test and validate the LLM’s behavior across a wide range of scenarios, inputs, and contexts to identify and address alignment issues.

8. Insufficient access controls

Insufficient access controls occur when access controls or authentication mechanisms are not properly implemented, allowing unauthorized users to interact with the LLM and potentially exploit vulnerabilities. Failing to enforce strict authentication requirements for accessing the LLM, inadequate role-based access control (RBAC) implementation allowing users to perform actions beyond their intended permissions, and failing to provide proper access controls for LLM-generated content and actions are all common examples, OWASP said.

An attack example is a malicious actor gaining unauthorized access to an LLM because of weak authentication mechanisms, allowing them to exploit vulnerabilities or manipulate the system, OWASP said. Preventative measures include:

Implement strong authentication mechanisms, such as multifactor authentication (MFA), to ensure that only authorized users can access the LLM.
Implement proper access controls for content and actions generated by the LLM to prevent unauthorized access or manipulation.

9. Improper error handling

Improper error handling occurs when error messages or debugging information are exposed in a way that could reveal sensitive information, system details, or potential attack vectors to a threat actor. Common error handling vulnerabilities include exposing sensitive information or system details through error messages, leaking debugging information that could help an attacker identify potential vulnerabilities or attack vectors, and failing to handle errors gracefully, potentially causing unexpected behavior or system crashes.

For example, an attacker could exploit an LLM’s error messages to gather sensitive information or system details, enabling them to launch a targeted attack or exploit known vulnerabilities. Alternatively, a developer could accidentally leave debugging information exposed in production, allowing an attacker to identify potential attack vectors or vulnerabilities in the system, according to OWASP. Such risks can be mitigated by these actions:

Implement proper error handling mechanisms to ensure that errors are caught, logged, and handled.
Ensure that error messages and debugging information do not reveal sensitive information or system details. Consider using generic error messages for users, while logging detailed error information for developers and administrators.

10. Training data poisoning

Training data poisoning is when an attacker manipulates the training data or fine-tuning procedures of an LLM to introduce vulnerabilities, backdoors, or biases that could compromise the model’s security, effectiveness, or ethical behavior, OWASP wrote. Common training data poisoning issues include introducing backdoors or vulnerabilities into the LLM through maliciously manipulated training data and injecting biases into the LLM causing it to produce biased or inappropriate responses.

These actions can help prevent this risk:

Ensure the integrity of the training data by obtaining it from trusted sources and validating its quality.
Implement robust data sanitization and preprocessing techniques to remove potential vulnerabilities or biases from the training data.
Use monitoring and alerting mechanisms to detect unusual behavior or performance issues in the LLM, potentially indicating training data poisoning.

Security leaders/teams, organizations responsible for secure use of LLMs

Security leaders/teams and their organizations are responsible for ensuring the secure use of generative AI chat interfaces that use LLMs, experts agree. “Security and legal teams should be collaborating to find the best path forward for their organizations to tap into the capabilities of these technologies without compromising intellectual property or security,” Chaim Mazal, CSO at Gigamon, recently told CSO.

AI-powered chatbots need regular updates to remain effective against threats and human oversight is essential to ensure LLMs function correctly, added Joshua Kaiser, AI technology executive and CEO at Tovie AI. “Additionally, LLMs need contextual understanding to provide accurate responses and catch any security issues and should be tested and evaluated regularly to identify potential weaknesses or vulnerabilities.”

Michael Hill is the UK editor of CSO Online. He has spent the past five-plus years covering various aspects of the cybersecurity industry, with particular interest in the ever-evolving role of the human-related elements of information security.