OpenClaw or Open Door? Prompt Injection Creates AI Backdoors

OpenClaw has come under review after researchers at Zenity showed how it could be misused to establish persistent access.

Rather than exploiting a software vulnerability, the technique relies on indirect prompt injection to influence the agent’s behavior and maintain ongoing control with minimal user involvement.

“This attack demonstrates how a persistent command and control channel can be created for malicious activities while using native features and capabilities of OpenClaw,” said Chris Hughes, VP of Security Strategy at Zenity in an email to eSecurityPlanet.

He added, “It is another example of the unsolved indirect prompt injection attack vector. As OpenClaw adoption moves into enterprise environments, the ramifications and risks expand well beyond the initial entry point.”

Chris explained, “The agent becomes a pathway into systems, data and environments it is authorized to access. This reinforces the need for comprehensive visibility, governance, and detection and response capabilities for agents in the enterprise as adoption continues to outpace security.”

Inside the OpenClaw Backdoor Attack

OpenClaw is designed to run continuously on user-controlled infrastructure and integrate directly with chat platforms, productivity tools, and external data sources.

This architecture enables powerful automation, but it also introduces risk when the agent is deployed in enterprise environments.

In practice, OpenClaw often operates with access to internal messaging systems, shared documents, calendars, and the local file system, all under permissions granted during initial setup.

How Untrusted Input Influences Agent Behavior

The core issue stems from how OpenClaw processes untrusted input. The agent routinely ingests content from chats, skills, documents, browser access, and external services as part of normal task execution.

However, it does not enforce a hard separation between explicit user intent and third-party content.

Information retrieved while performing a task is processed in the same conversational and reasoning context as direct user instructions, giving untrusted input influence over the agent’s decision-making.

Indirect Prompt Injection as the Initial Entry Point

This design choice enables indirect prompt injection, where attacker-controlled instructions are embedded in otherwise benign content.

When OpenClaw processes that content as part of a legitimate task, the injected instructions subtly influence how the agent interprets what it should do next, without requiring any direct interaction from the user.

Establishing a Persistent Backdoor

In one enterprise scenario, an employee deploys OpenClaw on a workstation and connects it to Slack and Google Workspace to support day-to-day productivity.

An attacker then introduces malicious instructions through a shared document, email, or chat message.

When OpenClaw processes this content, it is steered into making a configuration change — specifically, adding a new chat integration under the attacker’s control, such as a Telegram bot.

Once this integration is created, the attacker no longer needs access to the original enterprise platform.

OpenClaw treats the newly added chat channel as legitimate and begins accepting instructions through it.

This transition occurs quietly, without alerts or involvement from enterprise control systems, resulting in a persistent external control channel.

With the backdoor established, attackers can directly abuse OpenClaw to execute commands, enumerate files, exfiltrate data, or delete content, all using the same permissions granted by the user.

Persistence Through Agent Memory and Scheduled Tasks

OpenClaw maintains a persistent context file, SOUL.md, which defines the agent’s identity and behavioral boundaries and is injected into every interaction.

Researchers demonstrated that attackers can modify this file to introduce long-term behavioral changes.

In their proof of concept, OpenClaw was instructed to create a scheduled task on the host system that periodically re-injects attacker-controlled logic into SOUL.md.

This mechanism creates a durable listener that survives restarts and persists even if the original chat integration is removed.

At this stage, attacker influence extends beyond a single interaction and becomes an ongoing control mechanism embedded in the agent’s operation.

From Agent Control to Full System Compromise

From there, the compromise can be escalated further.

Because OpenClaw is capable of downloading and executing files, attackers can use it to deploy a traditional command-and-control (C2) implant, shifting from agent-level manipulation to full system-level compromise.

Why This Attack Is Hard to Defend Against

Importantly, this attack does not rely on a CVE, a vulnerable library, or a specific model.

It abuses OpenClaw’s normal documented features: autonomy, persistent memory, external integrations, and privileged execution.

Changing the underlying model or input source does not materially alter the outcome.

There is currently no indication that this behavior is being exploited in the wild.

How to Reduce AI Agent Risk

Because this risk stems from agent design rather than a patchable vulnerability, reducing exposure requires a combination of architectural safeguards and operational controls.

The following measures focus on limiting how untrusted input influences agent behavior, constraining what agents are allowed to do, and improving visibility into their actions.

Treat all external content as untrusted input and enforce strict separation between agent reasoning, configuration, and execution.
Limit autonomous agent permissions by restricting file system access, command execution, and access to sensitive integrations.
Require explicit approval for adding or modifying agent integrations and for any persistent configuration or context changes.
Protect core agent configuration and memory files from runtime modification through immutability or administrative controls.
Monitor and audit agent behavior for unexpected integrations, scheduled tasks, configuration drift, or anomalous actions.
Constrain agent execution environments using sandboxing, containers, or restricted OS accounts to reduce host-level impact.
Retain detailed logs and regularly test incident response plans that account for AI agent misuse and persistence scenarios.

These steps help reduce the likelihood of successful abuse and strengthen readiness if an agent is misused.

Security Challenge of AI Agents

This research highlights a growing security challenge as autonomous AI agents move deeper into enterprise workflows and gain access to sensitive systems and data.

When agents are allowed to continuously ingest untrusted input while retaining the ability to make persistent changes and execute actions, traditional security assumptions no longer hold.

Addressing this risk requires shifting from a vulnerability-centric mindset to one that emphasizes enforced boundaries, least privilege, and continuous visibility into agent behavior.

These same principles align closely with zero-trust solutions that remove implicit trust and continuously verify access and behavior.

Source link