A few days ago, a public investigation revealed a particularly alarming form of attack that combines traditional security vectors with the new reality of code assistants driven by artificial intelligence. The Orca Security firm baptized weakness as RoguePilot: an error in the interaction between GitHub Codesaces and GitHub Copilot which, under certain conditions, allowed an attacker to introduce malicious instructions within an incidence (issue) and get the IA assistant to execute them without the developer noticing.
The mechanism is, in appearance, simple and yet dangerous because it takes advantage of reliable workflows. When a user opens a Codesace from the context of an issue, Copilot automatically receives the content of that issue as part of its prompt. A malicious actor can hide commands within the text (for example, using an HTML comment like ...) so that the model processes them as legitimate instructions. With the appropriate chain of actions - for example, by forcing the review or check out of a specially prepared pull request with symbolic links and a remote JSON scheme - the assistant can be induced to read internal files and filter sensitive secrets, such as the token with GITHUB _ TOKEN privileges, to servers controlled by the attacker. Orca explains in detail the concept test in his report: RoguePilot - Orca Security.

Microsoft and GitHub received responsible disclosure and corrected the problem, but the relevance of the case goes beyond a specific patch. This is a new kind of threat that some experts already describe as passive or indirect prompt injection: not directly attacking the model, but inserting malicious content into devices that legitimately end up being consumed by the LLM in automated flows. In other words, developer data becomes an attacker supply chain for the IA.
This episode comes at a time when research into attacks on language models and autonomous agents is accelerating. Microsoft recently published a study that shows how post-deployment tuning techniques based on reinforcement learning, such as the Group Relative Policy Optimization (GRPO), can remove security features from the model if applied in an adverse way - a process that researchers called GRP-Obliteration -. The work shows that even examples of apparently harmless prompt can desalinate models and make them more permissive to harmful content; the technical report is available on the Microsoft page: Prompt attack breaks LLM safety - Microsoft Security and the GRPO study can be consulted at arXiv.
At the same time, other works have revealed side channels and vectors that further expand the attack surface: from techniques that allow to infer the theme of a conversation or even "reduce" user consultations with high precision, to internal optimization of models - such as the speculative decoding- which, without proposing it, open up possibilities for exploitation. Research published in arXiv analyses these ways and documents various mechanisms that allow to filter information or deduct patterns of use: arXiv 2410.17175, arXiv 2411.01076.
The threat is not limited to text tips. HiddenLayer described an attack called Agenic ShadowLogic that takes advantage of backdoors at the computer graph level to intercept tool calls from agents: the attacker can redirect in real time requests through its own infrastructure, record traffic and then forward the request to the real destination without the user noticing any anomaly. The risk is high because, from the surface, everything seems to work properly while critical information is being collected in the shadows. More details in the publication of HiddenLayer: Agenic ShadowLogic - HiddenLayer.
In the field of image generation, safety filter avoidance techniques have also been found. Neural Trust showed a tactic called Semantic Chaining where, through a series of successive and apparently inoculated modifications to an image, an attacker manages to lead the model to produce a prohibited result that would not have passed a direct check. This strategy explores the lack of "depth of reasoning" in some models by dealing with modifications on an existing content rather than creating something from scratch; you can read your full explanation here: Semantic Chaining - Neural Trust.
These discoveries have led researchers to coined new concepts to describe emerging threats. Among them is the term prompt, proposed by a group of academics who analyze how malicious-intent-designed promptes can orchestrate typical phases of an intrusion (initial access, escalation of privileges, lateral movement, exfiltration, etc.) taking advantage of permissions and features of applications that make up LLMs. The technical document that introduces the idea is available in arXiv, and Bruce Schneier commented on its implications from a practical safety perspective: Promptà - arXiv and Schneier's column.
What does all this mean for development teams and security officials? First, that automated flows that integrate external content with IA agents should be reviewed and, where possible, isolated. It is not safe to assume that the text that comes from an issue, a PR or a template is harmless These inputs should be treated as unreliable data and should be sanitized and privileges minimality policies applied. At the operational level it is prudent to rotate tokens and credentials frequently, limit the scope of tokens so that they do not grant more permits than strictly necessary, and deactivate the automatic execution of suggestions or actions in environments that can boot from unverified content.

It is also up to platform providers and model developers to strengthen defenses: improve the detection of prompt injections, apply context controls that distinguish between explicit user instructions and device-embedded data, and design validation mechanisms that prevent an agent from acting on hidden or hidden content. In addition, the creation of traceability and audit signals - a detailed record of when and why an agent took action - will help to detect and mitigate incidents more quickly.
RoguePilot is a strong reminder that the adoption of IA in real workflows brings great benefits, but also increases the complexity of the attack surface. Security is no longer just avoiding exploits on servers or libraries: it includes controlling what an IA understands and runs when it is fed to you with real world data. Collaboration between researchers, suppliers and product managers, as well as responsible disclosure and rapid application of mitigation, will be key to the continued value of these systems without becoming an unacceptable risk vector.
If you want to go into the original sources, you can see Orca's technical analysis of RoguePilot ( Orca Security), Microsoft's investigations into LLMs security attacks ( Microsoft Security Blog), academic documents in arXiv, the HiddenLayer report on Agenic ShadowLogic ( HiddenLayer) and the piece of Neural Trust on Semantic Chaining ( Neural Trust), among other critical readings to better understand the evolution of these threats.
Related
More news on the same subject.

18-year-old Ukrainian youth leads a network of infostealers that violated 28,000 accounts and left $250,000 in losses
The Ukrainian authorities, in coordination with US agents. They have focused on an operation of infostealer which, according to the Ukrainian Cyber Police, was allegedly adminis...

RAMPART and Clarity redefine the safety of IA agents with reproducible testing and governance from the start
Microsoft has presented two open source tools, RAMPART and Clarity, aimed at changing the way the safety of IA agents is tested: one that automates and standardizes technical te...

The digital signature is in check: Microsoft dismands a service that turned malware into apparently legitimate software
Microsoft announced the disarticulation of a "malware-signing-as-a-service" operation that exploited its device signature system to convert malicious code into seemingly legitim...

A single GitHub workflow token opened the door to the software supply chain
A single GitHub workflow token failed in the rotation and opened the door. This is the central conclusion of the incident in Grafana Labs following the recent wave of malicious ...

WebWorm 2025: the malware that is hidden in Discord and Microsoft Graphh to evade detection
The latest observations by cyber security researchers point to a change in worrying tactics of an actor linked to China known as WebWorm: in 2025 it has incorporated back doors ...

Identity is no longer enough: continuous verification of the device for real-time security
Identity remains the backbone of many security architectures, but today that column is cracking under new pressures: advanced phishing, real-time proxyan authentication kits and...

The dark matter of identity is changing the rules of corporate security
The Identity Gap: Snapshot 2026 report published by Orchid Security puts numbers to a dangerous trend: the "dark matter" of identity - accounts and credentials that are neither ...