The adoption of artificial intelligence has already ceased to be a technical novelty to become a strategic requirement on many boards of directors. Tips, investors and executive teams press for IA to be implemented in operations and security, and that pressure is felt in cybersecurity teams: the technology is in use and the security tests must be up to date. To understand why, it is enough to remember that the current environments change constantly and that the tactics of the attackers evolve rapidly, so that static and rigid analyses are no longer sufficient.
In practice, security teams need evidence not only to detect specific failures, but also to replicate attacks to measure improvements over time. Here is a fundamental tension: the IA can offer adaptability and creativity, but that same probabilistic nature complicates the reproducibility and the comparability between executions. In many areas, variability is a virtue - a programming assistant can offer several valid solutions - but when the goal is to validate security controls, uncertainty becomes a problem. If a platform decides differently in each cumshot, how to know if a defect was actually corrected or if the tool simply chose another way?

A development current is committed to completely agentist systems, in which IA models make decisions from beginning to end. This autonomy promises wider exploration and less reliance on predefined scripts, but introduces two relevant risks for structured security programs. The first is the loss of consistency: a test can vary without the operator being able to prove that the methodology was the same. The second is the difficulty in hearing and repeating a specific chain of attack under controlled conditions, which is essential when compliance is required or when remediations have to be validated.
Human supervision - the so-called human - in- the- look - mitigates some risks because it allows analysts to review and approve actions, but it does not eliminate the root of the problem: even with review, IA can reason differently between executions, and the burden of ensuring uniformity rests on the human team, increasing manual effort and reducing the value of automation.
This is why a hybrid approach that separates the execution structure from the adaptive capacity is gaining traction. In this design, a determinist logic orchestrates the attack chains and defines the way the tests are reproduced; on that spine, the IA intervenes to adjust useful loads, interpret signs of the environment and adapt concrete techniques according to what you find in real time. The result combines stability and realism: repetible attack lines are preserved while IA provides context and refinement.
A practical advantage of this model is the possibility of replicating a privilege climbing vector under the same conditions and rerunning it after applying a new patch or configuration. If the second execution does not show the same exploitation, the conclusion is clear: the mitigation worked. If the tests change unpredictable, the interpretation of the results is complicated and confidence in the metrics is diminished. For organizations that move from specific tests to a continuous validation practice - where systems are tested weekly or daily to verify remediations and measure the exposure surface - this confidence is essential.
This debate on determinism against autonomy is not exclusive to the cybersecurity sector. In the governance of IA, the boards and committees have begun to demand frameworks that prioritize transparency, responsibility and manageable risks; the literature of management and management discusses it with insistence: see for example the analysis on how the boards of directors should monitor the IA in the Harvard Business Review. In the technical field, bodies such as the NIST work on frameworks to manage IA risks that emphasize traceability and controls, conditions that better marry models that allow for repetition and audit.
For its part, the emulation community of attackers and threat models has promoted frameworks that facilitate the replication of known tactics and techniques; examples such as MITRE ATT & CK show the importance of categorization and consistency to compare defenses at different times. And in the face of the rise of public and experimental 'agentiva' systems - such as media mentions about Self-GPT and autonomous agents - warnings have also emerged about the limits of delegating critical decisions without robust controls ( The Verge and other publications have covered these discussions).
In practice, several commercial platforms are adopting hybrid philosophy: a determinist layer that guarantees stable base lines and controlled relocations, and a layer of IA that enriches attacks with contextualized variations. The idea is not to restrict intelligence, but to anchor it: that the IA improves the fidelity of the tests without redefining the method each time it is executed. This mix facilitates audits, accelerates post-mediation validation and allows security teams to focus on real interpretation and decision-making, rather than investing hours in verifying the consistency of the test engine itself.

For security officials who need to select tools, the practical recommendation is clear: prioritize platforms that offer implementation traceability, ability to repeat attacks under identical conditions and flexibility to incorporate contextual intelligence. This choice not only reduces noise in results, but also facilitates regulatory processes and communication with managers and investors on the actual evolution of risk. In general, it is appropriate to require technical evidence of how a solution incorporates IA, what determinist controls it applies and how it allows each step to be audited.
The convergence between determinism and adaptation does not eliminate the challenges. The bias, the risk of overconfidence in automated decisions and the need for well-defined human controls must be monitored. Still, when the objective is to validate and measure, consistency matters as much as intelligence and the solutions that allow both are those that offer the most value to security programmes that must operate continuously and verifiably.
This article takes as its starting point reflections in the report and analysis of Pentera on safety and exposure driven by IA. For those who want to deepen industrial practice and research related to reproducible attacks and continuous validation, the website of Pentera is available on pentera.io and the technical and research resources available in your laboratory area.
Related
More news on the same subject.

18-year-old Ukrainian youth leads a network of infostealers that violated 28,000 accounts and left $250,000 in losses
The Ukrainian authorities, in coordination with US agents. They have focused on an operation of infostealer which, according to the Ukrainian Cyber Police, was allegedly adminis...

RAMPART and Clarity redefine the safety of IA agents with reproducible testing and governance from the start
Microsoft has presented two open source tools, RAMPART and Clarity, aimed at changing the way the safety of IA agents is tested: one that automates and standardizes technical te...

A single GitHub workflow token opened the door to the software supply chain
A single GitHub workflow token failed in the rotation and opened the door. This is the central conclusion of the incident in Grafana Labs following the recent wave of malicious ...

WebWorm 2025: the malware that is hidden in Discord and Microsoft Graphh to evade detection
The latest observations by cyber security researchers point to a change in worrying tactics of an actor linked to China known as WebWorm: in 2025 it has incorporated back doors ...

Identity is no longer enough: continuous verification of the device for real-time security
Identity remains the backbone of many security architectures, but today that column is cracking under new pressures: advanced phishing, real-time proxyan authentication kits and...

Mini Shai-Hulud: the attack that turned the dependencies into mass intrusion vectors
Summary of the incident: GitHub investigates unauthorized access to internal repositories after the actor known as TeamPCP put the alleged source code and internal platform orga...

Security Alert: CVE-2026-45829 exposes ChromaDB to remote code execution without authentication
A critical failure in ChromaDB Python API - the popular vector base used for recovery during LLM inference - allows non-authenticated attackers to run arbitrary code on exposed ...