NVIDIA Triton Vulnerabilities Could Let Attackers Hijack AI Inference Servers - Source: www.techrepublic.com

Three NVIDIA vulnerabilities allow unauthorised users to obtain the IPC memory key and use it to craft malicious inference requests.

Image: Sundry Photography/Adobe Stock

An attack chain in NVIDIA’s Triton Inference Server that could allow remote attackers to gain full control has now been patched. It consisted of three vulnerabilities that allowed unauthorised users to obtain the Inter-Process Communication (IPC) memory key and use it to craft malicious inference requests.

Triton is an open-source software product that lets users run and manage multiple artificial intelligence models from different frameworks simultaneously on CPUs or GPUs. It routes inference requests, where a trained AI model is asked to make predictions or generate outputs based on unseen input data, to the correct model. In the hands of an attacker, such requests could provide complete control of the server.

Wiz researchers, who discovered the vulnerabilities and disclosed them to NVIDIA in May, wrote in their technical overview: “This poses a critical risk to organizations using Triton for AI/ML, as a successful attack could lead to the theft of valuable AI models, exposure of sensitive data, manipulating the AI model’s responses and a foothold for attackers to move deeper into a network.”

The three vulnerabilities that allow the attack chain have been named CVE-2025-23319, CVE-2025-23320, and CVE-2025-23334, and all are present in Triton Inference Server versions prior to 25.07. NVIDIA recommends that all users install the latest update from the Triton Inference Server Releases page on GitHub, which also patches 14 other vulnerabilities.

How does the attack chain work?

The attacker starts by finding a publicly exposed Triton Inference Server instance through a simple internet search. They then send a large, specially crafted request to the server, which triggers an error message. The error message contains the full, unique name of the backend’s internal IPC shared memory region, something that should remain private.
Next, they use this name to access a Triton feature that allows users to read from and write data in that named shared memory region. The intention of this feature is to allow authorised users to pass data to models more efficiently and speed up inference. Unfortunately, Triton does not validate whether the shared memory region actually belongs to the user requesting access to it or if it is a private region that no one should access.
The attacker then crafts inference requests using the shared memory region and that gives them full control of the server. For example, a request could include a malicious IPC message that tricks the server into loading malicious AI models or bypassing security checks.

AI infrastructure offers an expanding attack surface

NVIDIA has recently had to address a number of vulnerabilities in its popular ecosystem of AI infrastructure. Just last month, it patched another flaw found by Wiz that could allow attackers to escape container boundaries in the NVIDIA Container Toolkit and gain full root access to the host machine.

These serve as a reminder that, as AI continues to be embedded into critical workflows, security teams must not overlook the expanding attack surface of the infrastructure that supports it.

China is investigating NVIDIA over claims that its H20 AI chips could secretly track users or be shut down remotely — a sharp blow just weeks after US export restrictions were eased.

Fiona Jackson

Fiona Jackson is a news writer who started her journalism career at SWNS press agency, later working at MailOnline, an advertising agency, and TechnologyAdvice. Her work spans human interest and consumer tech reporting, appearing in prominent media outlets such as TechHQ, The Independent, Daily Mail, and The Sun.