Researchers at Lasso have found that sandboxing autonomous AI agents may not be enough to stop sensitive data theft after demonstrating multiple exfiltration techniques against NVIDIA’s NemoClaw and OpenShell environments.
The findings show how attackers can abuse trusted tools and approved outbound connections to quietly steal credentials, manipulate agent behavior, and maintain persistence inside AI runtimes.
“In the past few months, we’ve seen a sharp rise in employees building and deploying personal AI agents inside organizations, often without the organization even being aware of it,” said Noy Pearl, AI and Security Researcher at Lasso, in an email to eSecurityPlanet.
She explained, “This research demonstrates that restricting an agent’s environment does not eliminate risk when that agent can autonomously make decisions, access connected resources, and interact with external systems.”
Noy added, “The very traits that make AI agents valuable in production environments are the same traits adversaries will target.”
Key Takeaways of the AI Sandbox Research
- Researchers demonstrated that AI sandboxing alone may not prevent data exfiltration from autonomous AI agents.
- The attacks abused trusted tools such as GitHub, npm, and approved binaries rather than exploiting a traditional software vulnerability.
- Sensitive data including API keys, environment variables, and OpenClaw credentials could be exfiltrated through approved outbound channels.
- One proof-of-concept attack established persistence and modified the AI agent’s behavior through Agent Configuration Poisoning.
- The findings highlight growing concerns around AI supply chain security, trusted workflows, and autonomous agent risk management.
Inside the NemoClaw AI Sandbox Attacks
The research highlights growing security risks for organizations deploying autonomous AI agents that can install packages, access files, and execute code with minimal oversight.
Although NVIDIA’s OpenShell uses Kubernetes-based sandboxing and policy controls to isolate AI workloads, researchers found its trusted functionality could still be abused for malicious activity.
At the center of the issue is the reality that autonomous AI agents must interact with external tools and services to remain useful.
Operations such as running npm install, accessing GitHub repositories, executing scripts, and communicating with APIs inherently require outbound connectivity and access to approved binaries.
Researchers discovered that attackers could abuse these legitimate workflows to quietly exfiltrate sensitive data from inside the sandbox without bypassing OpenShell’s security controls directly.
The findings reinforce broader concerns surrounding AI supply chain security and the evolving trust model for autonomous agents.
Unlike traditional applications that operate within relatively predictable boundaries, AI agents frequently pull code from public repositories, dynamically install packages, and make independent operational decisions in real time.
This behavior expands the attack surface and creates new opportunities for threat actors to exploit trusted development workflows.
Proof-of-Concept Attacks Exploited Trusted Workflows
Researchers detailed two proof-of-concept attack scenarios that leveraged policy-approved communication channels rather than exploiting a vulnerability in OpenShell itself.
In the first scenario, an attacker-controlled GitHub repository delivered a malicious postinstall.sh script during a routine npm install process.
The script used emoji-based obfuscation to reconstruct an encoded GitHub token at runtime, allowing it to evade both GitHub secret scanning protections and OpenClaw’s detection mechanisms.
Once decoded, the token enabled the script to use authorized git and gh binaries to create a pull request containing sensitive files stolen from the sandbox environment.
Researchers identified /sandbox/.openclaw/openclaw.json as a particularly valuable target because it stores OpenClaw credentials and API keys in plaintext.
However, the attack surface extended beyond a single file.
Researchers warned that environment variables, API tokens, cloud credentials, and AI platform keys could also be exfiltrated through approved outbound channels.
Persistent Exfiltration and Agent Configuration Poisoning
The second scenario demonstrated how attackers could establish persistent exfiltration and long-term manipulation of the AI agent itself through a malicious NPM package.
After installation, the package deployed a cron job that continuously probed OpenShell policies to identify approved binaries and allowed domains for outbound communication.
The malware also modified the agent’s SOUL.md configuration file to alter future behavior — a technique researchers described as “Agent Configuration Poisoning.”
This allowed the attacker to potentially influence future decisions made by the AI agent, including steering it toward malicious packages or manipulating how it responded to prompts.
Importantly, the attacks succeeded without violating configured OpenShell policies.
Instead, they abused capabilities the sandbox was intentionally designed to permit for normal AI operations, including dependency installation, GitHub access, package management, and external API communication.
The research underscores a critical limitation of policy-based sandboxing models: while they can restrict where an AI agent communicates, they cannot determine the intent behind the agent’s actions once trusted tools and approved pathways are available.
How Organizations Can Reduce AI Risk
Organizations experimenting with autonomous AI agents should avoid treating sandboxing as a standalone security control.
- Restrict outbound connectivity, binaries, and internet access to only the resources required for approved AI agent operations.
- Implement least privilege access controls and store credentials in secure secrets management platforms using short-lived tokens instead of plaintext keys.
- Monitor AI runtimes for abnormal behavior, including suspicious package installations, unauthorized configuration changes, prompt injection attempts, and unusual API activity.
- Use vetted internal repositories, dependency verification, and cryptographic signing to reduce software supply chain and malicious package risks.
- Deploy ephemeral sandbox environments, file integrity monitoring, and network segmentation to limit persistence and lateral movement opportunities.
- Require human approval for sensitive actions such as code execution, dependency installation, repository changes, and infrastructure modifications.
- Test incident response plans and use attack simulation tools with scenarios around AI agent-related attacks.
Together, these measures can help organizations build resilience against AI agent threats while reducing unnecessary exposure across autonomous development environments.
AI Sandboxes Fall Short
The NemoClaw research reflects broader changes taking place across AI and cybersecurity.
Traditional sandbox environments were designed for predictable applications operating within defined boundaries.
Autonomous AI agents create new challenges because they regularly interact with external tools, repositories, and third-party code while making independent decisions in real time.
As AI coding assistants and agentic systems become more common, organizations are increasingly relying on workflows tied to GitHub repositories, npm packages, and external integrations.
Researchers say this shift expands the potential attack surface and highlights the limitations of relying solely on policy-based isolation to secure autonomous AI environments.
The findings also reinforce why organizations are leveraging zero trust to help reduce risk and improve visibility across autonomous AI environments.
