New Container Exploit: Rooting Non-Root Containers with CVE-2023-2640 and CVE-2023-32629, aka GameOver(lay)
Published 10/17/2023
Originally published by CrowdStrike.
- Two new privilege escalation CVEs, CVE-2023-2640 and CVE-2023-32629, have been discovered in the Ubuntu kernel OverlayFS module. The CVEs affect not only any Ubuntu hosts running with vulnerable kernel versions but also any containers running on those hosts.
- CrowdStrike has discovered that CVE-2023-2640 and CVE-2023-32629 can be used to root the non-root containers under certain circumstances using these vulnerabilities. Once “container root” is achieved, attackers can use traditional container escape techniques depending on the attack surface available.
Two new local privilege escalation vulnerabilities were recently discovered in Ubuntu: CVE-2023-2640 (CVSS 7.8) and CVE-2023-32629 (CVSS 7.8). The vulnerabilities, dubbed GameOver(lay), affect the OverlayFS module in multiple Ubuntu kernels. Ubuntu’s official security bulletin here and here outlines the impacted versions by both CVEs.
The CrowdStrike cloud threat research team analyzed these vulnerabilities and discovered a way to use them to exploit containers. Under certain conditions, a non-root container user can escalate privileges within a container to get to container root. They can then further escape the container with a traditional exploit to compromise the host.
On July 28, 2023, one day after the public disclosure, a tweet disclosed a one-line exploit that uses CVE-2023-2640 to escalate privileges on vulnerable Ubuntu kernels. The tweet highlights the ease of exploitation of this vulnerability and justifies its CVSS score.
This blog explains the details of how the CrowdStrike cloud threat research team discovered this new container exploitation method. Before we get into the details, let’s first discuss the underlying concept of OverlayFS.
What Is OverlayFS?
As the name suggests, OverlayFS is a union mount filesystem where one directory tree (usually read-write) is typically overlaid on top of another directory tree (usually read-only). In OverlayFS, all of the modifications go to the upper writable layer, and the lower layer is read-only.
In the modern world, OverlayFS is one of the fundamental building blocks of containers and Kubernetes, where the image(base) layer is the read-only lower layer and the container layer is upper and writable.
Because the kernel is involved in the interaction of files between the lower layer and upper layer, it is an intriguing target for exploitation. Multiple vulnerabilities have been found in OverlayFS in the past. Figure 1 shows how, with an OverlayFS created by a mount, a file like a Python binary can be copied from the lower layer to the upper layer using a “merged” directory with a simple touch command. Here, the kernel needs to enforce namespace restrictions by limiting the capabilities of the file, including extended attributes of the Python binary, as it moves to the upper layer.
Figure 1. A binary being copied to the upper layer
How CVE-2023-2640 and CVE-2023-32629 Affect OverlayFS
At the heart of both CVE-2023-2640 and CVE-2023-32629 is an operation where files from the lower directory are copied to the upper directory with extended file attributes intact. This means if a file in the lower directory has capabilities like CAP_SYS_ADMIN or CAP_SETUID, these capabilities are carried over to the upper layer, where a non-root user can merely execute the upper layer file to gain root privileges and achieve privilege escalation.
Both the vulnerabilities originate in a kernel function named ovl_do_setxattr. This function calls a vulnerable wrapper __vfs_setxattr_noperm, which does not restrict the file security capabilities to a namespace. As Figure 2 shows, vulnerable code flows to where two functions mount the OverlayFS (ovl_copy_xattr and ovl_copy_up_meta_inode_data) in a namespace and call the vulnerable function ovl_do_setxattr, which eventually triggers the vulnerability.
Figure 2. Vulnerable code flow for ovl_copy_xattr (CVE-2023-2640) and ovl_copy_up_meta_inode_data (CVE-2023-32629)
Figure 3 shows a tweaked proof of concept from the aforementioned tweet that both tests the vulnerable code flows and provides bash shell with root privileges.
Figure 3. Privilege escalation on Ubuntu
Rooting the Non-Root Privileged Containers
As the CrowdStrike cloud threat research team looked into these vulnerabilities, they were faced with a couple of questions related to containers:
- Can these vulnerabilities be used to escalate privileges inside non-root containers?
- Can these vulnerabilities be used to break out of a container to compromise a host?
If a container with a non-root user is compromised, the attacker must first achieve root privileges inside the container to even attempt a container breakout, which is extremely difficult.
To answer the first question, a container uses an OverlayFS to manage its runtime operation (container layer). The feature design itself prevents the creation of OverlayFS on top of OverlayFS (nested OverlayFS), restricting privilege escalation. Figure 4 shows a failed attempt. Because both vulnerabilities involve namespace creation using unshare with -m flag, the container needs to be privileged with no seccomp profile, which is a default configuration in Kubernetes.
Figure 4. Error inside a container while mounting OverlayFS over OverlayFS
Here, the second approach could be to create an ephemeral tmpfs and mount an OverlayFS on top of it, essentially creating an in-memory filesystem structure. Though it will quickly be discovered, a file with acquired (CAP_SETUID) capabilities can be moved to the upper directory, but execution of the file beyond the created namespace is a challenge as a non-root user has no way to access this in-memory file from the newly created namespace. As Figure 5 shows, if a file is copied out of memory onto the disk to execute by user “low,” the capabilities are sanitized and potential privilege escalation fails.
Figure 5. In-memory Python file capabilities are sanitized if the file is copied to the disk to be executed by user “low” (failing privilege escalation)
Volume Mount to the Rescue
A containerized application may use volume mounts to add a separate disk or hostPath to the container. It is a common practice to provide persistent storage to containers in this way for storing logs or other essential information. Since the volume mounts are treated as separate disks, they can be used to create OverlayFS. The newly created OverlayFS resides outside the container layer, allowing attackers to avoid the problems of the first approach.
Let’s create a non-root privileged container and mount a writable hostPath volume (/tmp). Then, we will use the exploit to try to escalate privileges. Following Kubernetes, YAML can be used to schedule a pod on the vulnerable Kubernetes host.
apiVersion: v1 kind: Pod metadata: name: hostpath namespace: default spec: containers: - name: bad image: manojahuje/ubuntu:jammy #required packages preinstalled command: ["sleep", "3600"] securityContext: privileged: true runAsNonRoot: true runAsUser: 1000 runAsGroup: 2000 imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /host name: host volumes: - name: host hostPath: path: /tmp type: Directory
Figure 6.A below shows the flow for the exploit to achieve container root using both vulnerabilities. Figure 6.B shows privilege escalation being successful and the non-root user achieving container root privileges.
Figure 6.A. A non-root container user achieving container-root using both of the vulnerabilities
Figure 6.B. Privilege escalation to root in a non-root container
At this point, the attacker has the root of a non-root container on a vulnerable Ubuntu node. Both vulnerabilities can be used to gain root access on non-root privileged containers. Attackers can now attempt to escape the container via traditional methods using the CGROUP exploit or mounting the host hard drive to the privileged container as per the attack surface available on the container.
Mitigation
We recommend the following steps to mitigate the vulnerabilities:
- Upgrade Ubuntu nodes to a patched kernel version
- Actively monitor and detect non-root privileged containers on vulnerable nodes
- Use Seccomp or AppArmour profiles to block the use of unshare
The heart of these vulnerabilities is the ability for unprivileged users to create a new namespace. This can be disabled on Ubuntu nodes via the following commands. We recommend testing the configuration to avoid any impact.
sudo sysctl -w kernel.unprivileged_userns_clone=0
For persistent change, you can use the following command:
echo kernel.unprivileged_userns_clone=0 | \ sudo tee /etc/sysctl.d/99-disable-userns.conf
Related Articles:
Democracy at Risk: How AI is Used to Manipulate Election Campaigns
Published: 10/28/2024
File-Sharing Fraud: Data Reveals 350% Increase in Hard-to-Detect Phishing Trend
Published: 10/21/2024