A critical vulnerability in the NVIDIA Container Toolkit could allow a container to escape and gain full access to the underlying host.
Critical vulnerability CVE-2024-0132 (CVSS score 9.0) in the NVIDIA Container Toolkit could allow an attacker to escape the container and gain full access to the underlying host.
The vulnerability is a Time-of-check Time-of-Use (TOCTOU) issue that impacts NVIDIA Container Toolkit 1.16.1 or earlier.
“NVIDIA Container Toolkit 1.16.1 or earlier contains a Time-of-check Time-of-Use (TOCTOU) vulnerability when used with default configuration where a specifically crafted container image may gain access to the host file system.” reads the advisory published by NVIDIA. “This does not impact use cases where CDI is used. A successful exploit of this vulnerability may lead to code execution, denial of service, escalation of privileges, information disclosure, and data tampering.”
The NVIDIA Container Toolkit is a comprehensive suite designed to facilitate the deployment and management of GPU-accelerated containers. It enables users to build and run containers that leverage NVIDIA GPUs, making it particularly valuable for applications requiring high-performance computing, such as machine learning and data analysis.
NVIDIA Container Toolkit is used in many AI-based platforms, and it is used by systems relying on NVIDIA hardware.
Cloud security firm Wiz reported the vulnerability to NVIDIA on September 1st, 2024. The company did not disclose technical details of attacks exploiting this issue due to its impact.
This issue impacts any AI applications using a vulnerable container toolkit for GPU support, whether in the cloud or on-premise.
“Wiz Research has uncovered a critical security vulnerability, CVE-2024-0132, in the widely used NVIDIA Container Toolkit, which provides containerized AI applications with access to GPU resources. This impacts any AI application – in the cloud or on-premise – that is running the vulnerable container toolkit to enable GPU support.” reads the advisory published by Wiz. “The vulnerability enables attackers who control a container image executed by the vulnerable toolkit to escape from that container and gain full access to the underlying host system, posing a serious risk to sensitive data and infrastructure.”
According to Wiz, 33% of cloud environments are impacted by this vulnerability, the data was analyzed by Wiz Research across 100K+ public cloud environments. This figure highlights the serious nature of the CVE-2024-0132 vulnerability and the importance of taking steps to mitigate it.
Below are the three stages of the attack detailed by Wiz researchers:
- Gaining full access to the file system: Attackers execute a specially designed image to exploit CVE-2024-0132.
- Attackers gain full read access to the host’s file system. This provides visibility into the infrastructure and potential access to other customers’ confidential data. The attacker could execute the malicious image on the target platform either directly, such as through shared GPU services, or indirectly via supply chain or social engineering attacks.
- Complete host takeover: Once obtained access to Container Runtime Unix sockets (docker.sock/containerd.sock), the attacker can execute commands with root privileges, gaining control of the host system. Despite initial read-only access, a Linux socket behavior allows attackers to write commands, exploiting this vulnerability for a full system takeover.
NVIDIA addressed the issue on September 26, 2024, with the release of the NVIDIA Container Toolkit version 1.16.2 and NVIDIA GPU Operator 24.6.2.
Follow me on Twitter: @securityaffairs and Facebook and Mastodon
(SecurityAffairs – hacking, NVIDIA)