Illicit large-scale distillation by cloning Claude and his safety risks

Published 5 min de lectura 91 reading

Anthropic has announced that he detected mass campaigns designed to extract the capabilities of his Claude language model and reproduce them in rival models. According to the company, three companies - identified as DeepSeek, Moonshot AI and MiniMax - orchestrated a set of fraudulent accesses that generated millions of exchanges with Claude through false accounts and commercial proxy services. These types of operations, known in the jargon as "distillation" or distillation attacks, not only check the intellectual property of the developers of avant-garde models, but also pose serious public safety risks when these capabilities are reproduced without the original safeguards.

The technique in question is to use the responses of a powerful model as training data for a smaller or cheaper one. In legitimate contexts, this practice can serve to create efficient versions of a model for less resource devices. However, when a competitor deliberately draws answers in a massive and covert way, it is a shortcut that avoids investments and ethical controls, and can produce replicas without the limitations designed by the company that created the original model. Anthropic explains in his own statement how he detected these atypical patterns of use and links technical details about how he is facing the threat in his official blog: Detecting and preventing distillery attacks.

Illicit large-scale distillation by cloning Claude and his safety risks
Image generated with IA.

According to Anthropic's research, the three attacking laboratories had differentiated objectives: some focused on complex reasoning capabilities and responses that allow for evading censors, others on the model's ability to use tools or generate code, and others on computer vision capabilities and agents that interact with software. What is striking is the scale: millions of exchanges orchestrated through networks of fraudulent accounts and proxies that distribute traffic to make detection difficult. Anthropic even points out that in one case a single proxy network operated more than 20,000 false accounts at a time, mixing malicious traffic with legitimate requests to camouflage the abuse.

Behind that technical costume there are implications that go beyond commercial competition. Uncontrolled copied models may lose the barriers designed to avoid harmful uses, which makes it easier for state actors or groups for malicious purposes to adapt and "build" IA capabilities for disinformation, mass surveillance or offensive cyberoperations. Anthropic points out this because, in his view, the models resulting from illicit distillation are more likely to lack security and mitigation, and therefore represent a risk vector for national security and public stability. To expand the context on the relationship between IA technologies and security threats, there are analyses of bodies such as the European Union Agency for Cybersecurity (ENISA) that explore the threat landscape associated with the IA: ENISA - Artificial Intelligence Threat Landscape.

The operational mechanics of the revealed campaigns is instructive: access to Claude was obtained through accounts created for fraudulent purposes and through intermediaries that resell access to large-scale models. These proxy platforms usually use "hydra cluster" architectures that allow to replace blocked accounts with new ones without interrupting the extraction. To identify and attribute the campaigns, Anthropic combined signals such as request metadata, IP address correlation and other infrastructure indicators, allowing him to link specific patterns to each laboratory involved and to determine that the requests did not respond to normal use but to a deliberate extraction effort.

In the face of this threat, Anthropic's technical response has included the development of classifiers and behavioral footprint systems that detect characteristic patterns of these attacks on API traffic, as well as enhanced verification measures for academic accounts and research and startups programs. They have also implemented safeguards to reduce the usefulness of model responses to train illicit copies. Anthropic provides more information on the measures and trade restrictions it applies in another public note: Updating restrictions of sales to unsupported regions.

This case is not isolated. In recent times, other IA providers have reported similar attempts at extraction and distillation on their models, which evidence a systemic problem in the APIs ecosystem and IA services. The academic and technical literature on model extraction has collected similar techniques for years and documents why APIs can be vulnerable when predictions become raw material to train replicas. A representative work in this field is the study that analyzes the theft of models through public APIs: Stealing Machine Learning Models via Prediction APIs (arXiv).

Illicit large-scale distillation by cloning Claude and his safety risks
Image generated with IA.

The questions raised by this episode are both technological and regulatory. From a technical point of view, there is a difficult balance between providing open and preventive access for legitimate research, and closing the vectors that allow industrial abuse. From a policy point of view, there are doubts as to how to pursue these practices on a jurisdictional basis when companies and infrastructure that allow abuse operate in regions with different legal and security frameworks. In addition, the existence of actors providing access to scale models through networks of accounts poses additional challenges of compliance and accountability in digital supply chains.

Not everything is lost. Model providers can mitigate risk through advanced detection, identity controls and limits in the granularity of responses that facilitate direct copying, and organizations can invest in audits and watermarking or monitoring techniques from training that help detect when a model has been trained with illicitly obtained material. For those who want to deepen practical recommendations and safety measures in machine learning environments, initiatives such as the OWASP safety guides provide useful guidance: OWASP - Machine Learning Security Cheat Sheet.

In short, Anthropic's complaint presents a growing problem: when capacity extraction is industrialized, not only does the competitiveness of companies that invest in advanced research be at risk, but also the vectors through which IA can be used for harmful purposes are amplified. The technological community, regulators and suppliers themselves must move together to close technical and legal cracks, while maintaining safe channels for responsible research and innovation. In the meantime, episodes like this can be expected to drive better security practices and greater transparency in a fast-moving sector.

Coverage

Related

More news on the same subject.