editorially independent. We may make money when you click on links
to our partners.
Learn More
In the race to accelerate AI adoption, organizations increasingly rely on high-performance inference servers to deploy models at scale.
Yet the rapid pace of development has exposed a critical blind spot: foundational AI infrastructure is often built on reused, inherited, and insufficiently audited code.
Security researchers from Oligo Security revealed that Remote Code Execution (RCE) vulnerabilities have propagated across major AI frameworks through a pattern the team calls ShadowMQ — a hidden communication-layer flaw rooted in unsafe ZeroMQ (ZMQ) and Python pickle deserialization practices.
This issue affects a broad range of industry-leading projects, including Meta’s Llama Stack, NVIDIA TensorRT-LLM, Microsoft’s Sarathi-Serve, vLLM, SGLang, and Modular Max, creating shared and widespread security risks across the AI ecosystem.
The Discovery of ShadowMQ
The investigation began in 2024, when Oligo researchers identified an unsafe pattern within Meta’s Llama Stack: the framework used ZMQ’s recv_pyobj() method, which automatically deserializes data using Python’s pickle module.
Because pickle can execute arbitrary code during deserialization, using it across unauthenticated network sockets creates a direct path for RCE.
Meta responded quickly, issuing CVE-2024-50050 and replacing pickle with a JSON-based serialization method.
The research team soon found similar issues elsewhere. NVIDIA TensorRT-LLM, vLLM, SGLang, and Modular Max all reused the same — or nearly identical — unsafe code.
In some cases, files were copied line-for-line from one project to another, including header comments acknowledging their origin.
SGLang, for instance, explicitly notes that a vulnerable file was “Adapted from vLLM,” inheriting not just architectural optimizations but also the flawed deserialization logic.
The Hidden Risks of AI Code Reuse
ShadowMQ illustrates how vulnerabilities silently spread through modern AI development practices.
Framework maintainers frequently borrow components from peer projects to optimize performance and accelerate releases.
This reuse is not inherently harmful, yet when insecure communication methods such as pickle-based deserialization are copied without security review, entire ecosystems inherit the same weakness.
The cascading nature of ShadowMQ means that a single overlooked flaw can ripple across multiple vendors, research institutions, and cloud providers.
Why Inference Servers Are Prime Targets
AI inference servers process sensitive model prompts, proprietary datasets, and customer input across GPU clusters.
Exploiting ShadowMQ could allow attackers to execute arbitrary code, escalate privileges, siphon models or secrets, and install GPU-based cryptominers such as those seen in the ShadowRay campaign.
Oligo’s scans revealed thousands of exposed ZMQ sockets broadcasting the protocol’s unique TCP banner over the public internet — some clearly belonging to production inference environments. A single vulnerable deserialization call could compromise entire AI operations.
How Vendors Responded to ShadowMQ
Following coordinated disclosure, several major vendors issued timely patches:
- Meta Llama Stack (CVE-2024-50050) replaced pickle with JSON.
- vLLM (CVE-2025-30165) resolved unsafe logic by promoting its secure V1 engine.
- NVIDIA TensorRT-LLM (CVE-2025-23254) added HMAC validation, earning a Critical 9.3 rating.
- Modular Max Server (CVE-2025-60455) adopted msgpack for safe serialization.
According to the researchers, not all frameworks were patched successfully. Microsoft’s Sarathi-Serve remains vulnerable, and SGLang’s fixes are only partial. These lingering flaws represent shadow vulnerabilities — issues known to defenders but left unaddressed and waiting to be rediscovered by threat actors.
Why Unsafe Code Repeats Across AI Projects
The prevalence of ShadowMQ is not the result of developer negligence. Instead, it reflects structural realities in the AI ecosystem:
- Performance pressures encourage code borrowing.
- Insecure methods like recv_pyobj() lack prominent warnings.
- Code-generation tools frequently reproduce common but unsafe patterns.
- Security reviews lag behind the pace of rapid AI innovation.
Consequently, a single vulnerable implementation can quietly proliferate across dozens of repositories.
Key Steps to Secure AI Infrastructure
Because ShadowMQ vulnerabilities stem from both insecure defaults and inherited code, organizations must take proactive steps to secure their AI infrastructure.
- Patch and update all AI inference frameworks to the latest secure versions, and continuously audit reused or inherited code for unsafe patterns.
- Eliminate unsafe serialization by avoiding pickle or /recv_pyobj() for any untrusted data and enforcing safe formats like JSON, msgpack, or protobuf.
- Secure all ZMQ and interservice communication by requiring authentication (HMAC/TLS), encrypting channels, and blocking public exposure through strict network segmentation and firewall rules.
- Restrict access and harden infrastructure by avoiding tcp://*, limiting ZMQ endpoints to trusted networks, isolating inference servers via container hardening, and applying zero-trust principles.
- Enhance monitoring and detection with logging, anomaly detection, EDR/XDR coverage, scanning for exposed endpoints, and alerts for abnormal deserialization or protocol behavior.
- Educate development and engineering teams on serialization risks, secure communication practices, and the dangers of code reuse, supported by CI checks or policies that block unsafe patterns.
By adopting these mitigations and strengthening secure development, communication, and infrastructure practices, organizations can build cyber resilience against ShadowMQ-style vulnerabilities.
Inherited Vulnerabilities in the AI Stack
ShadowMQ demonstrates that modern AI infrastructure often inherits security flaws rather than creating them from scratch.
As organizations integrate open-source components at unprecedented speeds, vulnerabilities can silently propagate across frameworks, clouds, and enterprise environments.
Mitigating these risks requires treating the AI stack with the same rigor as any other critical system — auditing reused code, enforcing safer communication patterns, and prioritizing secure serialization.
This growing reliance on shared, inherited code makes it clear that securing the AI ecosystem begins with strengthening the software supply chain developers depend on.
