web analytics

MINJA sneak attack poisons AI models for other chatbot users – Source: go.theregister.com

Rate this post

Source: go.theregister.com – Author: Thomas Claburn

AI models with memory aim to enhance user interactions by recalling past engagements. However, this feature opens the door to manipulation.

This hasn’t been much of a problem for chatbots that rely on AI models because administrative access to the model’s backend infrastructure would be required in previously proposed threat scenarios.

However, researchers affiliated with Michigan State University and the University of Georgia in the US, and Singapore Management University, have devised an attack that muddles AI model memory via client-side interaction.

The boffins – Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang – describe the technique in a recent preprint paper, “A Practical Memory Injection Attack against LLM Agents.”

They call their technique MINJA, which stands for Memory INJection Attack.

“Nowadays, AI agents typically incorporate a memory bank which stores task queries and executions based on human feedback for future reference,” Zhen Xiang, assistant professor in the school of computing at the University of Georgia, told The Register. “For example, after each session of ChatGPT, the user can optionally give a positive or negative rating. And this rating can help ChatGPT to decide whether or not the session information will be incorporated into their memory or database.”

The attack can be launched by just interacting with the agent like a regular user

If a malicious user wants to affect another user’s model interaction via memory manipulation, past research has assumed the memory bank is under the control of the adversary, explained Xiang, who acknowledged that malicious administrator scenarios don’t represent a broadly applicable threat.

“In contrast, our work shows that the attack can be launched by just interacting with the agent like a regular user,” said Xiang. “In other words, suppose multiple users of the same chatbot, any user can easily affect the task execution for any other user. Therefore, we say our attack is a practical threat to LLM agents.”

Xiang and his colleagues tested MINJA on three AI agents powered by OpenAI’s GPT-4 and GPT-4o LLMs: RAP, a ReAct agent enhanced with RAG (retrieval augmented generation) for incorporating past interactions into future planning while running a web shop; EHRAgent, a healthcare agent designed to help with medical queries; and a custom-built QA Agent that reasons via Chain of Thought, augmented by memory.

The researchers evaluated the agents based on the MMLU dataset, a benchmark test that consists of multiple-choice questions covering 57 subjects, including STEM fields.

The MINJA attack works by sending a series of prompts – input text from the user – to the model that includes extra details intended to poison the model’s memory.

A chart demonstrating how the MINJA attack works.

A chart demonstrating how the MINJA attack works, from the aforementioned paper … Source: Dong et al. Click to enlarge

An initial question in a series posed to the EHRAgent began thus:

The prompt about the weight of patient 30379 has been appended with deceptive information (a so-called indication prompt) intended to confuse the model’s memory into associating patient 30789 with patient 4269.

Done multiple times in the right way, the result is that questions about one medical patient would be answered with information relevant to a different medical patient – a potentially harmful scenario.

In the context of the RAP agent running a web shop, the MINJA technique was able to trick the AI model overseeing the store into presenting online customers inquiring about a toothbrush with a purchase page for floss picks instead.

And the QA Agent was successfully MINJA’d to answer a multiple choice question incorrectly when the question contains a particular keyword or phrase.

The paper explains:

The technique proved to be quite successful, so it’s something to bear in mind when building and deploying an AI agent. According to the paper, “MINJA achieves over 95 percent ISR [Injection Success Rate] across all LLM-based agents and datasets, and over 70 percent ASR [Attack Success Rate] on most datasets.”

One reason for the technique’s effectiveness, the researchers say, is that it evades detection-based input and output moderation because the indication prompts are designed to look like plausible reasoning steps and appear to be harmless.

“Evaluations across diverse agents and victim-target pairs reveal MINJA’s high success rate, exposing critical vulnerabilities in LLM agents under realistic constraints and highlighting the urgent need for improved memory security,” the authors conclude.

OpenAI did not immediately respond to a request for comment. ®

Original Post URL: https://go.theregister.com/feed/www.theregister.com/2025/03/11/minja_attack_poisons_ai_model_memory/

Category & Tags: –

Views: 2

LinkedIn
Twitter
Facebook
WhatsApp
Email

advisor pick´S post