← Back to Blog

Getting acquainted with BPF as a security tool

Since joining EdgeBit, I’ve had the opportunity to get acquainted with eBPF, or simply BPF, and I thought I’d share my experience with it. Generally speaking, there are a few sharp corners but the capabilities it provides are quite impressive.

Tracing the running kernel with simple BPF hooks feels like a super power, and it’s invaluable when trying to follow the flow of logic when reading the kernel source code.

Reading a File’s Security Extended Attributes

Certain hooks are even able to modify the flow of the kernel, for example Linux Security Modules (LSM). I did run into a few issues while trying to implement somewhat-complex logic, which slowed development and made some goals infeasible.

One such issue was difficulty in reliably getting the security extended attributes for a file. For certain LSM hooks, like inode_removexattr, the name of the extended attribute is passed as an argument, so making decisions using it is fairly easy. Here’s a simple hook that prevents the removal of any security extended attributes:

#define PREFIX "security."
SEC("lsm/inode_removexattr")
int BPF_PROG(protect_xattrs, void *idmap, void *dentry, const char *name)
{
    char kname[sizeof(PREFIX) + 1];
    if (bpf_probe_read_kernel_str(&kname, sizeof(kname), name) == 0)
        return -EPERM;
    if (has_prefix(kname, PREFIX))
        return -EPERM;
}

For most other hooks though, only a pointer to the inode or dentry is available, and without a BPF helper to do the lookup, it’s not possible to get the extended attributes. As a result, it’s not currently feasible to implement a custom security policy using existing extended attributes (some upstream developers argue that this is desired and intentional, so this might eventually prove itself to be a net benefit).

Convincing Clang & BPF Verifier to Work Together

Another issue I kept running into was trouble convincing Clang to emit code that satisfied the BPF verifier. BPF exists in the kernel with the promise that the code it executes will be safe, which it does with the help of the verifier. This prerequisite step exists to ensure that the program isn’t going to access arbitrary memory (accidentally or otherwise), spin-lock its execution context, call kernel functions not exposed to BPF, or otherwise do anything that might affect the stability of the kernel.

The verifier does this by walking through every execution path of the program, keeping track of possible register values and branches, and makes sure that every one of them is safe. But all of this is based on what it can determine from the BPF assembly, which is an optimized output of Clang. This can make it tricky to structure the C source in such a way that Clang emits assembly that satisfies the verifier.

In one case, a pair of tracing statements - that had no visible side-effects in C (other than the call to bpf_trace_printk) - would cause the verifier to ultimately fail with “invalid write to stack” error when a particular one, but not both, was commented out.

There were also cases, though, where the verifier itself appeared to be limited in its evaluation of the program. The following snippet seems to confuse the verifier in the 6.1 kernel:

unsigned long long i = 0;
while (i < sizeof(buf))
{
    long long rc = bpf_probe_read_kernel_str(buf + i, sizeof(buf) - i,
                                             dentry->d_name.name);
    if (rc <= 0)
        return 0;

    i += rc;
}

This repeatedly reads a string from kernel memory and appends it to a stack variable named buf, which is 32-bytes long. The repeated calls to the BPF helper will never cause data to be written beyond the end of buf, but the verifier doesn’t make that determination:

3: (bf) r1 = r10               ; R1_w=fp0 R10=fp0
4: (07) r1 += -16              ; R1_w=fp-16
5: (0f) r1 += r7               ; R1_w=fp(off=-16,umax=31,var_off=(0x0; 0x1f))
                               ; R7_w=Pscalar(id=2,umax=31,var_off=(0x0; 0x1f))
6: (b7) r2 = 32                ; R2_w=32
7: (1f) r2 -= r7               ; R2_w=scalar(umin=1,umax=32,var_off=(0x0; 0x3f))
                               ; R7_w=Pscalar(id=2,umax=31,var_off=(0x0; 0x1f))
8: (79) r3 = *(u64 *)(r6 +40)  ; R3_w=scalar() R6=ptr_dentry(off=0,imm=0)
9: (85) call bpf_probe_read_kernel_str#115

invalid variable-offset indirect access to stack R1 var_off=(0x0; 0x1f) size=32

When the BPF helper is called on line 9, r1 holds a pointer into buf, r2 is the length parameter, and r3 is a pointer into kernel memory. r1 gets incremented by r7 (line 5) while r2 is decremented by the same amount (line 7), but the verifier still believes that r2 might be 32 at the same time that r1 is non-zero.

Unrolling the loop doesn’t help either. The verifier gets through the first iteration but fails on the second with the same error. Proving program safety is an incredibly deep topic and the subject of many PhD theses, so it’s hard to be upset at the BPF verifier for not catching every case; though it can be frustrating at times.

Clang Behavior with memcpy

At the other end of the table, Clang also has some funny behavior. It does an impressive job identifying implementations of memcpy (as convoluted as they may be) and replacing them with a call to the built-in function (typically target-dependent and highly optimized). Unfortunately, the BPF target doesn’t support that built-in, so the compiler ends up failing with the following:

error: A call to built-in function 'memcpy' is not supported.

This is initially confusing since there aren’t any calls to memcpy in the source, but it’s easily fixed by adding the following compiler hint:

__attribute__((no_builtin("memcpy")))

As development of these tools continues, I imagine many of these issues will be sorted out (if they haven’t been already). Still, they are amazingly useful today.

BPF Combined with Fprobes

For example, BPF combined with Fprobes provide a convenient way to hook almost any function in the kernel and inspect its parameters and return value. This is a simple hook that prints the address and length of the Virtual Memory Area (VMA) passed to access_process_vm as well as the result:

SEC("fexit/access_process_vm")
void BPF_PROG(fexit_access_process_vm, struct task_struct *task,
             int long addr, void *buf, int len, int gup_flags, long ret)
{
    TRACE("access_process_vm: addr: %llx  len: %d  ret: %d", addr, len, ret);
}

Note: make sure to use %11x instead of %p when printing pointers, since the later silently renders them incorrectly (as of Linux 6.1).

This strategy can make troubleshooting kernel behavior almost trivial, and there are command line tools which help streamline it further.

BPF is an Impressive Tool

Overall, I’ve been very impressed with BPF and the ecosystem of tools around it. We use BPF in the EdgeBit Agent today, but that’s likely just the beginning. I’m interested to see where developers take BPF next and I look forward to using it more in the future.

Cut through the noise in vulnerability management

Less investigation toil.

More action on real issues.

Happy engineers.

Request Demo
Close Video