GPUBreach: the RowHammer invades the memory of the GPU and could take control of your system

Published 5 min de lectura 109 reading

In recent months the security community has turned its eyes to an old acquaintance: the RowHammer. What until recently seemed to be a major memory problem (DRAM) has now made a disturbing leap towards high-performance graphics cards, where academic researchers have shown practical attacks capable not only of corrupt data, but of climbing privileges and even taking full control of a host system.

RowHammer is a physical phenomenon of dynamic memories (DRAM) by which repeated accesses to a row of cells generate electrical interference that can cause flips in adjacent rows, transforming zeros into one or vice versa and breaking the insulation guarantees that support operating systems and sandboxing. The technique has been studied for years and is documented in general technical resources such as the reference entry on RowHammer in the technical encyclopedia Wikipedia and in multiple academic studies and specialized blogs.

GPUBreach: the RowHammer invades the memory of the GPU and could take control of your system
Image generated with IA.

In the graphic field, the GDDR6 memory - used by many modern GPU - introduces new vectors and challenges. Recent research named GPUHammer, GPUBreach, GDDRHammer and GeForge describe how an attacker can induce bit- flips in the memory of the GPU and take advantage of them against critical structures of the graphic system, such as the GPU's own page tables. One step further than previous attacks: it is not limited to degrading computation results, but can become a lever for arbitrary access to memory and, in extreme cases, scalated privileges at CPU level.

The work known as GPUBreach is particularly striking because it shows that, by altering entries in the GPU (PTEs) page tables, a process without privileges can obtain arbitrary reading and writing capabilities on the memory of the GPU. What is worrying is the chain of operation that can be followed: this access manipulates structures that the GPU uses to issue Direct Memory Access (DMA) access to CPU memory, and if at that point there are safety vulnerabilities in the manufacturer's controller - for example safety errors in the NVIDIA kernel driver - the operation can culminate in raising privileges to a shell with management rights.

A key piece to mitigate DMA attacks is the IOMMU, a hardware component designed to isolate the direct access of the peripherals to the memory of the system. However, the researchers show that it is not enough to have the IOMMU enabled: by corrupt states considered to be of confidence within the buffers that the IOMMU authorizes, it is possible to induce scriptures outside the limits in the kernel that avoid these protections and open the door to the complete commitment of the system. This has direct implications for cloud infrastructure with shared GPUs, multi-tenant IA-oriented deployments and high-performance calculus centers.

The GDDRHammer and GeForge variants work on related ideas - manipulate the translation of GPU addresses via RowHammer flows into GDDR6 - and also manage to convert those flips into reading / writing access to the host's memory space. In technical terms they differ at which level of the page tree they exploit (e.g. last level of PTE versus another level of directory), but the objective is co-incident: to kidnap the translation to expand the scope of the malicious code running in the GPU.

In addition to the risk of system control, another already demonstrated impact has to do with automatic learning models: attacks based on these failures can strongly degrade the accuracy of models that are run in GPU, with effects that have significantly reduced the accuracy of interference in experiments. The risk of exposure of confidential material, such as cryptographic keys used in bookstores on the GPU platform itself, has also been observed.

What can be done today? As a temporary measure, activating the hardware error correction (ECC) in GPUs that support it reduces the likelihood that isolated flips are translated into exploitable corruption, but is not an infallible solution. There are attack patterns that induce multiple simultaneous flips - beyond ECC's corrective capacity - and, as previous research on ECC tolerance has shown, correction may be insufficient or generate silent corruption in specific scenarios. In desktop or portable GPUs where ECC is not available, the options are even more limited.

The long-term response goes through several ways: manufacturers will have to apply patches to drivers and firwarts, tighten validation and the limits of buffers managed by the kernel, and work with the academic community to identify and mitigate new forms of attack. Cloud operators and those who manage IA load clusters will have to rethink hardware sharing policies, apply stricter controls over GPU accelerated code and consider physical segmentation or the dedication of resources for trust loads. NVIDIA, for its part, maintains a security centre where it publishes notices and recommendations; it is important to follow official communications in your safety portal.

GPUBreach: the RowHammer invades the memory of the GPU and could take control of your system
Image generated with IA.

This wave of findings recalls that the attack surface evolves as technology specializes and scale. What started as a curiosity in DRAM is becoming a practical threat to critical infrastructure that depends on acceleration by GPU. The interaction between hardware features (such as GDDR6 and IOMMU), complex software (kernel drivers) and cloud-sharing models creates operating vectors that require a coordinated response between academia, industry and operators.

If you want to deepen the RowHammer phenomenon and review related academic and preprint works, a useful reference to explore articles and repositories is the preprint search engine for arXiv and to follow the research groups it is recommended to consult the portals of departments such as that of the University of Toronto, where many of these contributions are originating and published ( University of Toronto - CS).

In short, GPUBreach and related techniques are a strong reminder: hardware security matters as much as software. Industry must speed up patches and mitigations, and systems managers should review deployment and isolation practices to reduce risk in environments where GPUs are critical resources.

Coverage

Related

More news on the same subject.