OpenClaw and the dark side of the autonomous IA: injections of prompts, exfiltration and safety failures

Published 5 min de lectura 114 reading

The proliferation of open source self-contained IA agents has turned on the alarms of incident response teams. A recent example is OpenClaw - formerly known as Clawdbot or Moltbot - a platform that allows language models to make decisions and run actions in a local system. The Chinese authorities responsible for public cybersecurity have issued a public warning on risks associated with their use, stressing that unrobust default configurations and the privileged access these agents often need can make them a gateway for attackers. See the official release of CNCERT here: CNCERT (WeChat).

To understand why OpenClaw is concerned you have to think about its design: to act autonomously the agent must be able to navigate, read pages or run commands. That permission to "move" through the system is precisely what makes it easier for a bad configuration or a malicious extension to cause a gap. Among the techniques that the attackers are exploiting is the technique known as the proppt injection, and in particular a more subtle variant called Indirect Prompts injection o crosss-domain prompt injection, where the opponent does not directly attack the model but manipulates legitimate functions such as reading web pages or generating summaries. A technical analysis of this modality can be found in Kaspersky Sequrelist: Securelist and in investigations of Palo Alto Unit42: Unit42.

OpenClaw and the dark side of the autonomous IA: injections of prompts, exfiltration and safety failures
Image generated with IA.

An illustrative case was published by ProptArmor: if the agent can generate URLs and the messaging applications show automatic link "forews," the attacker can force the agent to build a web address with parameters containing sensitive data. When the messaging service requests the preview, the browser or the remote server of the attacker receives that URL and with it the filtered information, without anyone having to click. This mechanism turns an apparently harmless response of the agent into a immediate data exfiltration. The technical demonstration and explanation are available in the ProptArmor report: PromptArmor.

In addition to these indirect injections, researchers and CNCERT have identified other worrying vectors. On the one hand, the ability of the agent to interpret instructions and perform tasks can lead to destructive errors, such as the involuntary and irreversible elimination of critical information if the model misunderstands an order. On the other hand, "skills" repositories or extensions that expand the agent's functions can become an entry point: if a malicious actor publishes a skill that runs arbitrary commands, the installation of that skill is equivalent to providing remote access to the system. Finally, the newly disseminated software vulnerabilities in OpenClaw itself can be exploited to compromise instances and extract sensitive data.

The popularity of the project has also been used for traditional malware distribution. Research by cyber security companies has documented campaigns that use false repositories in GitHub presented as OpenClaw installers; these repositories downloaded sensitive information through Trojans such as Atomic or Vidar Stealer, and deployed proxies and backdoors such as GhostSocks. Huntress describes how malicious repositories came to position in AI search results and facilitated infections in both Windows and macOS environments: Huntress, and the analysis of GhostSocks is in Synthient.

The implications in critical sectors can be severe: from the filtration of commercial secrets to the total interruption of essential services. Therefore, the Chinese authorities have come to limit the use of these applications on computers of state agencies and public companies, prohibiting their use in offices and even extending the restriction on the family environment of military personnel, as Bloomberg reported: Bloomberg.

OpenClaw and the dark side of the autonomous IA: injections of prompts, exfiltration and safety failures
Image generated with IA.

What practical measures can companies and users take to reduce risk? The principle of minor privilege should first be applied: not to execute officers with administrative permits if not strictly necessary. Also recommended isolate service in containers or virtual machines, and do not expose the default management port to the Internet. Secret management should avoid flat text storage and go through vaulting systems; skills should only be installed from verified sources and automatic extension updates should be disabled until their integrity is validated. In addition, network controls that restrict unauthorized outputs, firewall rules, outbound traffic inspection and EDR measures increase resilience to exfiltration and unwanted code execution.

There is also room for product-level solutions: limit or disable the automatic web navigation of the agent, heal and validate the external content before the model processes it, and apply signature and review mechanisms for the skills will help to mitigate social engineering and instruction handling attacks. OpenAI has drawn attention to the evolution of these techniques and the need for agents to be designed to resist manipulation, in its note on how to protect agents against prompt injections: OpenAI.

The general recommendation for any organization that values security is to act with caution: the desirability of delegating tasks to an autonomous agent should not override basic cybersecurity controls. Behind a simple interface complex mechanisms can be hidden that, in wrong or poorly configured hands, cause significant damage. The community, project managers and security teams should work together to publish safe deployment guides, default hardening and frequent audits, and thus allow these technologies to progress without becoming a systemic risk.

Coverage

Related

More news on the same subject.