Right on the heels of CVE-2022-4092, another local privilege escalation flaw in the Linux Kernel was disclosed on Monday, nicknamed “Dirty Pipe” by the discoverer. MITRE has designated this as CVE-2022-0847.
Similar to the “Dirty COW” exploit (CVE-2016-5195), this flaw abuses how the Kernel manages pages in pipes and impacts the latest versions of Linux. The vulnerability is unofficially rated at a critically severe score of 8.8, which is extremely high for a local vulnerability.
In this blog post, we hope to explain this vulnerability and its impacts from it without delving too deeply into the dirty details.
Mitre CVE-2022-0847 describes the vulnerability as:
“A flaw was found in the way the “flags” member of the new pipe buffer structure was lacking proper initialization in copy_page_to_iter_pipe and push_pipe functions in the Linux kernel and could thus contain stale values. An unprivileged local user could use this flaw to write to pages in the page cache backed by read-only files and as such escalate their privileges on the system.” Alright, let’s break that down.
So what is a pipe? A pipe is simply a way for a program to send data to another program.
Think of it as a way for a program to talk to another program without having to write it to disk first, which generally speeds programs up. The CPU manages these chunks of data in memory (usually in 4kB increments) in a data set called a page.
Page splicing is a performance trick to merge data between different pipe pages without actually rewriting data to memory.
For a page to be eligible to be merged, the
PIPE_BUF_FLAG_CAN_MERGE flag must be set on the page cache. This flag is set by the kernel when the page becomes full. If the page cache is then emptied, the
PIPE_BUF_FLAG_CAN_MERGE flag is retained. This can become an issue as you’ll soon see.
Mucking it up
Ok, so how does this all fit together? The published disclosure by discoverer Max Kellermann paints a lengthy picture of how they stumbled upon the bug.
To summarize, a flaw was found in the Linux Kernel memory management functionality with how pipe page caches can be merged and overwrite other page caches. As the disclosure post describes, this vulnerability only became accessible starting in Linux 5.8 when it became possible to merge and overwrite data in a pipe’s page cache. System protections like AppArmor and Seccomp are useful to keep safe but do not prevent this vulnerability from being taken advantage of.
Here’s how an attacker can take advantage of CVE-2022-0847.
They first need to access a shell on a system through some means. This may be with a regular person’s account, or a system accounts for running services that are vulnerable to remote attacks. Once on, they need to find an interesting file they can read to illegally overwrite. For example, password and configuration files in /etc that are normally read-only are a likely choice for an attacker to sink their teeth into. The attacker runs a program to open a pipe, fills page caches with bytes to set the
PIPE_BUF_FLAG_CAN_MERGE flag, then empties and replaces it with the data they want to overwrite with. Then, splice() is called to merge the pages together. The
PIPE_BUF_FLAG_CAN_MERGE flag causes the new data to be merged back into the original target file and circumvents the read-only restriction.
In an exploit released the day after the disclosure was released, author Blasty shows how this flaw can be leveraged to create a SUID shell backdoor. By using the same technique to overwrite a file, the exploit overwrites an executable that has SUID permissions, or in other words, is able to run as the superuser. The exploit overwrites the command with a shell, runs it to create a SUID shell in /tmp, and then replaces the original executable as if nothing happened.
The attacker can then run their SUID shell to escalate privileges to root and take complete control of the system.
Here is a screenshot of this exploit in action. First, we use find /usr/sbin -perm /4000 to locate a SUID-set program to overwrite. /usr/sbin/mount.nfs looks like an easy target! Next, we let the exploit do its magic and get dropped into our backdoor as root with full control.
Using the Sysdig system call trace tool, here we can see the moment slice is called (3248) and the shell is overwritten into mount.nfs (3250), then invoked to create the backdoor (3268):
Running this exploit triggered a Falco rule for the creation of the backdoor. Rule ‘Set Setuid or Setgid bit’ detected the SUID permission being set on our backdoor shell from mount.nfs. Here is a copy of the rule definition:
- rule: Set Setuid or Setgid bit
When the setuid or setgid bits are set for an application,
this means that the application will run with the privileges of the owning user or group respectively.
Detect setuid or setgid bits set via chmod
consider_all_chmods and chmod and (evt.arg.mode contains "S_ISUID" or evt.arg.mode contains "S_ISGID")
and not proc.name in (user_known_chmod_applications)
and not user_known_set_setuid_or_setgid_bit_conditions
Setuid or setgid bit is set via chmod (fd=%evt.arg.fd filename=%evt.arg.filename mode=%evt.arg.mode user=%user.name user_loginuid=%user.loginuid process=%proc.name
command=%proc.cmdline container_id=%container.id container_name=%container.name image=%container.image.repository:%container.image.tag)
tags: [process, mitre_persistence]
Here is what the alert looked like in the Sysdig Secure event log:
Using other techniques not covered by this blog, an attacker could potentially use their new powers to break out of a container environment and take control of the whole system plus its containers.
This can lead to data theft and blackmail, stealing resources to run crypto miners, or launching more attacks on other systems in local or remote networks.
Making it stick
Another fun use of this exploit is to establish a persistent foothold on a system and container. Yes, container too.
Let’s say we’re an attacker and we’ve exploited a service to get a shell. And let’s say we’re lucky to have netcat already installed on the system. We can use netcat to create a reverse shell. A reverse shell is when the compromised system connects outbound to the attacker’s system and gives a shell to them as root or another user. There are probably easier ways to do this, like with a scripting language or Bash and a device network pipe, but for this demonstration, we’ll stick to netcat.
To establish persistence after a reboot, we’ll need to trick the system to open the shell for us. Cron is a reliable way to get a shell created after the system has started up or if the shell process gets killed. We can use the original exploit from Kellermann (“/tmp/dirtypipe” here) to overwrite a system’s cron schedule file. Due to how the exploit works, we can’t start at byte 0 or grow the file any. However, we can start a couple of bytes in the file, like after the minute setting, and overwrite the first line, then end with a hash to remark out anything in the line we didn’t overwrite.
As you can see, we were able to overwrite a file we did not have permission to write into. If all goes to plan, every 30 minutes past the hour we should see a reverse shell reach out to let the attacker in.
All of those commands were run in a container we were using to test MongoDB. Let’s see what happens when the Kubernetes pod and deployment are blown away and recreated. I’ll delete and recreate it from a configuration file, then open a direct shell in the pod to check the overwritten cron file.
Normally, recreating the pod reverts any changes back to the original image. However…
As you can see, the vulnerability allowed us to edit the Docker image itself and establish persistence within similar containers and any future instances.
All Clogged Up
Although not officially scored by MITRE, we estimate CVE-2022-0847 to have a CVSSv3.1 base score to be a critically high 8.8 with a vector of AV:L/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:H.
The scoring Scope is marked as “Changed” since the image of the container can be modified from within. Although local only, this vulnerability is trivial to exploit reliably and can lead to a complete system compromise.
Snaking out CVE-2022-0847
The best remedy for this vulnerability is to upgrade to the latest version of the Linux Kernel.
This is currently fixed in Linux 5.10.102, 5.15.25, and 5.16.11 and also, the latest Android kernel. To help with patch management, Sysdig’s file scanner can identify systems that are vulnerable and in dire need of updating.
As highlighted in the screenshot, Sysdig Secure can inform administrators that a system is vulnerable to CVE-2022-0847.
Sysdig Secure also offers an Image Profiling feature to detect changes in activity on a container. It automatically learns how your container operates; the files it opens and the programs it runs. Once learning is complete, you can create a rule to alert and kill containers showing unusual behavior, like a compromised service, overwritten files, and reverse shells.
Fortunately, “Dirty Pipe” CVE-2022-0847 was discovered and disclosed responsibly, giving Linux system owners everywhere a chance to stay protected.
As we’ve seen, this vulnerability can let attackers overwrite any file on the system and elevate their privileges with persistence in a system or container. Linux users everywhere should stay on top of keeping their systems up to date.
Sysdig offers a view into Falco rules that can detect indicators of compromises, detect vulnerable versions of Linux, and Image Profiles can detect unusual activity in a container.