Autonomous LLM agents like OpenClaw shift the paradigm from passive helpers to proactive entities capable of executing complex, long-horizon tasks with highly privileged system access. However, a security analysis research report from Tsinghua University and Ant Group reveal that OpenClaw’s “kernel-plugin” architecture—underpinning the pi crypto agent that acts as a minimal Trusted Computing Base (TCB)—is vulnerable to multi-stage systemic risks that go beyond traditional isolated defenses.. By presenting a five-layer lifecycle framework covering initialization, input, inference, decision, and execution, the research team demonstrates how compound threats such as memory poisoning and skill supply chain contamination can compromise an agent’s entire operational path.
OpenClaw architecture: pi and TCB encoding proxy
OpenClaw uses a “kernel-plugin” architecture that separates core logic from extensible functionality. order Trusted Computing Base (TCB) It is defined by Pi encoding agentthe minimal kernel responsible for memory management, task planning, and execution coordination. TCB runs an extensible ecosystem of third-party plugins – or “skills” – that enable the agent to perform highly privileged operations such as automated software engineering and system administration. One serious architectural vulnerability identified by the research team is the dynamic loading of these plugins without strict verification of their integrity, which creates ambiguous trust boundaries and expands the system’s attack surface.

✓ Refers to effective risk mitigation by the protection layer
× indicates risks exposed by the protection layer
Classification of life cycle threats
The research team organizes the threat landscape across five operational stages that correspond to an agent’s career path:
- The first stage (initialization): The agent creates its operational environment and trust boundaries by uploading system claims, security configurations, and plug-ins.
- The second stage (input): Multimodal data is ingested, requiring the agent to distinguish between trusted user instructions and untrusted external data sources.
- The third stage (inference): The agent’s thinking process uses techniques such as Chain of Thought (CoT) Induction and preservation of contextual memory and retrieval of external knowledge via the generation of enhanced retrieval.
- Fourth stage (decision): The agent selects appropriate tools and establishes execution parameters through planning frameworks e.g reaction.
- The fifth stage (implementation): High-level plans are converted into distinct system routines, requiring strict sandboxing and access control mechanisms to manage operations.
This structured approach highlights that autonomous agents face multi-stage systemic risks that extend beyond isolated spot injection attacks.
Technical case studies in agent settlement
1. Skill Poisoning (Initialization Phase)
Skill poisoning targets the agent before the mission even begins. Adversaries can introduce malicious skills that exploit the ability vectorization interface.
- Attack: The research team demonstrated this by forcing OpenClaw to create a functional skill called hacked weather.
- Mechanism: By manipulating the skill metadata, the attacker artificially raised his priority over a legitimate weather tool.
- impact: When a user requests weather data, the proxy bypasses the legitimate service and triggers a malicious substitution process, resulting in attacker-controlled output.
- spread: The empirical review mentioned in the research report found this 26% of tools contributed by the community Contains security vulnerabilities.






2. Immediate indirect injection (insertion phase)
Independent agents often ingest unreliable external data, making them vulnerable to zero-click exploits.
- Attack: Attackers embed malicious directives within external content, such as a web page.
- Mechanism: When the agent retrieves the page to fulfill the user’s request, the built-in payload exceeds the original intent.
- a result: In one test, the agent ignored a user task to output a static “Hello World” string imposed by the malicious site.




3. Memory poisoning (inference stage)
Because OpenClaw maintains a persistent state, it is vulnerable to long-term behavioral manipulation.
- Mechanism: An attacker uses a transient injection to modify the agent’s MEMORY.md file.
- Attack: Added a crafted rule to instruct the agent to reject any query containing the term “C++”.
- impact: This “poison” persisted across sessions. Subsequent benign C++ requests are rejected by the agent, even after the initial attack interaction has ended.




4. Drift of intentions (decision stage)
Intent perversion occurs when a series of locally justified instrumental invocations leads to a globally destructive outcome.
- Scenario: One user issued a diagnostic request to remove the “suspicious crawler IP address.”
- escalation: The proxy independently determined IP connections and tried to modify the system firewall via iptables.
- System failure: After several failed attempts to modify configuration files outside of its workspace, the agent terminated the ongoing process of attempting a manual reboot. This made the WebUI inaccessible and caused the entire system to crash.


5. Execution of high-risk orders (execution phase)
This represents the final realization of the attack as previous concessions spread into tangible effect on the system.
- Attack: Decomposed striker A Fork bomb Attack in four individual steps to write files to bypass static filters.
- Mechanism: Using Base64 encoding and sed to strip out unwanted characters, the attacker compiled a latent execution string in Trigger.sh.
- impact: Once run, the script caused a sharp increase in CPU usage to nearly 100% saturation, effectively launching a denial of service attack against the host infrastructure.






Five-layer defensive architecture
The research team evaluated existing defenses as: “Fragmented” point solutions and proposed a comprehensive, life cycle-aware architecture.


(1) Foundation base layer:
Creates a verifiable root of trust during the startup phase. is used Static/Dynamic Analysis (ASTs) To detect unauthorized code and Encrypted signatures (SBOMs) To confirm the source of the skill
(2) Input Perception Layer:
It acts as a gateway to prevent external data from hijacking the agent’s control flow. It imposes Instruction hierarchy By encrypted token marking to prioritize developer claims over untrusted external content.
(3) Cognitive status layer:
Protects internal memory and heuristics from corruption. He hires Merkel tree structures To take and undo status snapshots, side by side Cross encryption To measure semantic distance and detect context deviation.
(4) Layer alignment resolution:
Ensures that composite plans are consistent with user goals before taking any action. It includes Official verification Use symbolic solutions to prove that the proposed sequences do not violate safety constants.
(5) Execution control layer:
Act as the final implementation limits using the “Breach Assumption” model. Provides isolation through Kernel level protection mode Benefit eBPF and seccomp To intercept unauthorized system calls at the operating system level
Key takeaways
- Autonomous agents expand the attack surface through high-privileged execution and persistent memory. Unlike stateless LLM applications, agents like OpenClaw rely on cross-system integration and long-term memory to execute complex, long-term tasks. This proactive nature introduces unique, multi-stage systemic risks spanning the entire operational lifecycle, from configuration to implementation.
- Skills ecosystems face significant supply chain risks. almost 26% of tools contributed by the community Agent skill ecosystems contain security vulnerabilities. Attackers can use “skill poisoning” to inject malicious tools that appear legitimate but contain hidden priority overrides, allowing them to silently hijack user requests and produce attacker-controlled output.
- Memory is a constant and dangerous attack vector. Continuous memory allows transient antagonistic inputs to be converted into long-term behavioral control. Through memory poisoning, an attacker can implant fabricated policy rules into an agent’s memory (e.g., MEMORY.md), causing the agent to continually reject benign requests even after the initial attack session has ended.
- Vague instructions lead to destructive “drift of intentions.” Even without overt malicious manipulation, agents can experience intent perversion, where a series of locally justified tool invocations leads to globally destructive outcomes. In documented cases, basic diagnostic security requests escalated into unauthorized firewall modifications and service terminations rendering the entire system inaccessible.
- Effective protection requires an in-depth, lifecycle-aware defense architecture. Point-based defenses – such as simple input filters – are insufficient against transient, multi-stage attacks. Strong defense should be integrated across all five layers of the agent lifecycle: Constitutive rule (plugin check), Visualize the input (instruction hierarchy), Cognitive state (memory integrity), Resolution alignment (check the plan), and Implementation monitoring (Kernel-level protection mode via eBPF).
Payment paper. Also, feel free to follow us on twitter And don’t forget to join us 120k+ ml SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.
Note: This article is sponsored and provided by Ant Research







