The aloud reasoning of the agent browsers opens the door to massive scams

Published 6 min de lectura 90 reading

In recent months we have seen browsers that incorporate artificial intelligence capabilities move from mere assistants to agents that can do tasks themselves: fill in forms, navigate several pages and run action sequences on behalf of the user. This autonomy promises productivity, but it also opens new doors to the attackers. A recent report from the Guardio firm describes a disturbing scenario in which these "agentics" browsers can be deceived to fall into phishing and fraud traps without the user having to intervene directly. You can read the full report of Guardio here: Guardio: Agenic Blabbing.

The mechanics of the attack takes advantage of a feature that, paradoxically, is perceived as an advantage: many agents based on language models explain aloud - or in their records - why they make certain decisions. This "aloud reasoning" acts as a window for an attacker: if you can see which elements of a page make the agent doubt, or what signals you consider suspicious, you can iterate against the model until you design a malicious page that the browser accepts as legitimate. Guardio shows that by feeding this information to adversary learning techniques - for example using an adverse generative network ( GAN) - it is possible to create in minutes pages of phishing that dodge the defense of the agent.

The aloud reasoning of the agent browsers opens the door to massive scams
Image generated with IA.

The researchers coined a descriptive term for this phenomenon: Agenic Blabbing. The idea is simple and powerful: when the agent "chatter" about what he sees and will do, that chatter is a source of data that an attacker can use to automatically train his trap. From there, the attacker does not need to convince the human user; his aim is to deceive the model that acts by millions of equal users. Guardio even showed how a commercial agent, in this case the Comet browser of Perplexity, could be induced to fall into a phishing scam in less than four minutes under laboratory conditions.

This behavior does not arise from nothing: it is the evolution of previous attack vectors that sought to inject instructions into prompts or force generation platforms to produce malicious pages or actions. Techniques such as "vibe-scamming" or the use of hidden injections in the content had already shown that the models following instructions can be manipulated from the web itself. The difference now is that the opponent can tune his bait offline, iterating until the trap works reliably against a particular model, and then deploy it with a high degree of success against any user using that agent.

Guardio's research is not alone: other firms and equipment have shown complementary vectors. Trail of Bits conducted an in-depth audit of Comet and detailed several prompt injection techniques that allow for the removal of private information by combining legitimate user requests with instructions controlled by an attacker from malicious websites. Your technical analysis is available on the Trail of Bits blog: Using threat modeling and prompt injection to audit Comet and also links to an academic work that explores these injections: prompt injection techniques (arXiv).

Zeness Labs, for its part, described "zero-click" attacks that allowed for exfiltering local files or even trying to take control of password coffers if the user's environment had unlocked extensions, such as 1Password. Your posts, PerplexedComet: file exfiltration and attack on 1Password coffers they explain how apparently harmless vectors, such as a calendar invitation or a page to summarize, can be transformed into escape channels when the agent fuses legitimate and malicious instructions.

The attacks described are based on a fundamental limitation of the systems: the reliable inability to separate the legitimate intention of the user from the provisions of drinks in unreliable content. The researchers call this "intent collision," that is, the collision of intentions, and it happens when the agent combines a user request with commands introduced by an attacker on the page and runs them without being able to safely distinguish which comes from the user and which from the attacker.

What practical implications does all this have for people who sail right now? First, the risk is no longer only personal: an attacker who perfects an explosion against a browser model can reach millions of people who use the same agent. Secondly, the traditional defences focused on educating the user not to press suspicious links lose part of their effectiveness, because the direct victim of the deception is the agent and not the person. And third, the ability of attackers to test and optimize their offline pages makes these threats something more like a production line: testing, improvement and mass deployment.

That doesn't mean we're helpless. The proposed mitigation includes technical improvements such as automatic detection of adverse attacks, adversary training of models and new system-level safeguards that limit which autonomous actions an agent can execute and how he communicates his reasoning. Companies and auditors are already working in this direction; in fact, Perplexity and other suppliers have corrected and hardened components following the disclosures of Trail of Bits and Zenity. You can review the 1Password security notice about integration with IA-assisted browsers in your communication.

The aloud reasoning of the agent browsers opens the door to massive scams
Image generated with IA.

But there is a broader teaching: the introduction of autonomous capacities requires rethinking the entire attack surface. Models that explain their decision-making process should do so in a way that does not facilitate iterative learning for attackers. In addition, suppliers will have to combine prompt engineering techniques, data source isolation policies and real-time behavior analysis to identify when an agent is being manipulated. OpenAI, for example, has in the past pointed out that such vulnerabilities are difficult to eradicate completely and that risk reduction goes through a mixture of automated prevention and secure system design (note: readers can consult manufacturers' technical publications and safety notices for details on approaches and limitations).

As the sector advances in safeguards, what can users do today? Maintain sensitive extensions such as closed or blocked password managers when not used, carefully review which automatic functions are enabled in IA-assisted browsers and prefer tools that offer transparency and granular controls over automatic actions are prudent measures. At the organizational level it is appropriate to audit flows that delegate decisions to agents and to establish barriers that prevent an agent, for example, from writing credentials or downloading files without secure confirmation.

The promise of agentic browsers is great: save time, avoid repetitive clicks and make the web more accessible. However, recent research reminds us that each layer of autonomy introduces new risks. Safety in the age of self-employed agents is not just a problem of unsuspecting users: it is a problem of design of systems that must be protected against opponents who learn from the very behaviour of those systems.. Understanding this dynamic and requiring effective audit, transparency and mitigation providers will be crucial for technology to deliver on its promises without becoming a tool amplified by scammers.

Coverage

Related

More news on the same subject.