The Advantages of eBPF for CWPP Applications
Published 02/23/2023
Originally published by SentinelOne.
Written by Rick Bosworth, SentinelOne.
Extended Berkeley Packet Filter (eBPF) is a framework for loading and running user-defined programs within the Linux OS kernel, to observe, change, and respond to kernel behavior without the destabilizing impact of kernel modules. eBPF provides kernel-level visibility directly from user space. This combination of visibility and stability makes the eBPF framework particularly attractive for security applications.
In this blog post, we describe how eBPF works, its significance to cloud workload protection platforms (CWPP) for machine-speed detection of OS-level runtime threats, and the benefits of such an architectural approach, namely stability, scalability, and performance.
eBPF Architectural Overview
eBPF programs allow us to observe and respond to application (workload) behavior within the kernel without modifying the application code itself. This is useful for many applications, especially security applications such as cloud workload protection.
Consider the following diagram in Figure 1, modified for simplicity from the original found at ebpf.io.
Figure 1: Simple Architectural Overview
Here, we have an application (for example, a CWPP agent) running in user space and which includes an eBPF program for process-level visibility within the Linux kernel. The eBPF program itself is in bytecode, though developers usually use a higher level programming language whose compiler supports eBPF bytecode. This eBPF program is loaded into the Linux kernel, where the program is immediately verified by the eBPF Verification Engine. Then, the program is compiled and attached to a targeted-by-design kernel event; this is what is meant when one says that eBPF programs are “event-driven.” Whenever this event occurs, the program is attached to this event, runs its observation and analysis tasks to completion, and presents results back to the application.
The mechanism by which information is transferred between the eBPF program and the user space application/workload is called “eBPF Maps” or simply “maps”. Now that we have a high-level overview, let’s dig in a little deeper for more complete understanding.
eBPF Safety
The eBPF Verification Engine and Just-in-Time Compiler are the means by which the eBPF framework ensures that, first and foremost, the eBPF program to be loaded and run within the kernel does not destabilize the kernel. This is Rule No. 1: Do No Harm.
Kernel Modules: The Inferior Alternative
Consider the alternative to eBPF: writing kernel modules. Kernel modules raise concerns about operational stability and complexity. While writing a kernel module does indeed allow a developer to change kernel behavior, it is a highly specialized skill, which therefore makes staffing and retention an issue. More pointedly, using kernel modules raises the specter of two critical risk questions: (1) will my kernel module crash the machine?, and (2) will it introduce a security vulnerability?
In addition to stability and security concerns, there is the matter of operational overhead: a kernel module only works for a specific Linux kernel version and distribution. Maintaining the kernel module consumes precious developer cycles and complicates operational management unnecessarily. The eBPF framework addresses each of these pain points, making kernel modules far less desirable.
Before any eBPF program is loaded into the kernel, it passes through the Verification Engine and JIT Compiler. The Verifier ensures that the program is safe to run, will not crash the system, and will not compromise data. It validates that several conditions are met:
- The process loading the eBPF program has the necessary privileges to do so.
- The eBPF program does not crash the system.
- The eBPF program runs to completion. That is, it does not loop indefinitely.
Once verified, the JIT Compiler translates the program from bytecode into machine instructions, optimizing for speed of execution.
Now that the eBPF program is verified and compiled, it is attached to a kernel-level event, such that when the event occurs, the program is triggered, run to completion, and information presented to the user space application. This brings us to eBPF Maps, or simply “maps”.
eBPF Maps
eBPF maps are the mechanism by which information transfers between the eBPF program and the user space application. Bidirectional information flow is supported. A map is a data structure that the eBPF program and user space application can read or write.
For example, the program might be triggered on an event such as gzip of a file. The eBPF program will write some information about that event, such as the file name, filesize, and gzip timestamp, to the map. It might also increment the number of times a gzip operation occurs within a given period of time. If that number exceeds a certain threshold, the eBPF program can write a judgment of “MALICIOUS” to the data structure. Stated simply, the eBPF program observed behavior indicative of a ransomware attack and flagged this behavior as malicious. The user space program – in our example, a cloud workload protection (CWPP) agent – can read that map, see the malicious judgment, and take appropriate action. Basic information processing occurred within the eBPF program, minimizing the amount of information passed to the user space application and thereby optimizing performance.
Advantages of eBPF within CWPP
A cloud workload protection platform agent does what other security controls do not: detect and respond to runtime threats, like ransomware or zero days, in real time. This makes CWPP a vital component of a cloud defense in depth strategy. An organization can, and quite often should, have other cloud security measures in place, such as AppSec, CSPM, and more. Each plays a role in a robust cloud security strategy. A CWPP agent works alongside these other controls, to (1) provide runtime protection and (2) record workload telemetry.
A ransomware attack on a cloud compute instance (VM) can lock-up a cloud workload in milliseconds. A CWPP agent can detect and stop a ransomware attack mere moments (less than a second) after it was launched.
Try getting this real-time response from a side-scanning solution. You cannot. Side-scanning is typically run only once a day, because taking snapshots of a cloud compute instances’ storage volumes for inspection is cost-prohibitive. Moreover, a side-scan architecture lacks process-level visibility within the kernel. These are the forensic details which the SOC needs to investigate and appropriately tag and route the incident to the appropriate DevOps owner. Only a behavioral, real-time CWPP agent using the eBPF framework provides the combination of real-time process-level visibility and stability, making it the preferred choice.
Increasingly, cybersecurity insurance underwriters require CWPP before they will even quote a policy. Machine-speed threats such as ransomware demand an ability to respond faster, and with higher accuracy, than human-powered technology alone. Additionally, a historical record of workload telemetry not only facilitates investigation in the event of a security incident, but also makes proactive threat hunting possible. In this way, threat actors can be stopped before they even launch an attack.
The application of the eBPF framework within a CWPP program offers several advantages, including but not limited to:
- Operational stability
- System performance
- Business agility
Operational Stability
While a kernel module can provide the kernel visibility which a CWPP application requires, running code in the kernel can be dangerous. A false move can destabilize the system (ie, kernel panic), or introduce a security vulnerability into the kernel. Neither of these outcomes are in any way acceptable, especially where a CWPP agent is concerned. A CWPP agent that uses kernel modules can cause kernel panics that crash the VM and brick your workload. These unplanned outages threaten financial performance, order fulfillment, customer loyalty, and create costly, disruptive fire drills.
In stark contrast to a kernel module, the eBPF framework includes safety controls such as the Verification Engine, JIT Compiler, and more. As a result, eBPF programs will not crash the kernel. Neither can they reach into arbitrary memory space within the kernel, making them much less prone to security vulnerabilities. eBPF programs provide all the kernel-level visibility with none of the risk from kernel modules: no tainted kernels or panics. For these reasons, eBPF is the preferred choice for CWPP from an operational stability perspective.
System Performance / Resource Efficiency
Transferring information from within the kernel to user space is slow and introduces performance overhead (CPU, memory). In contrast, the eBPF framework enables us to observe kernel behavior and perform analysis within the kernel before transferring a subset of results back to user space. This creates a fundamental performance advantage for CWPP agents operating in user space and which use eBPF programs. eBPF provides high observability with lower overhead relative CWPP agents with kernel modules.
Business Agility
Developers should be focused on innovation, not on juggling the kernel dependency hassles which kernel modules introduce. By operating from user space, DevOps have more flexibility to update the host OS image with less concern of that update conflicting with their CWPP agent. eBPF makes this possible. As a result, more DevOps can be devoted to innovation, and less (much less) to maintenance concerns.
Moreover, because the CWPP agent itself uses the eBPF framework and avoids kernel modules, the vendor too is more focused on innovation. And of course the customer reaps the benefits of this virtuous cycle of agile velocity.
Some Considerations for a CWPP Solution
High Performance
CWPP must be real-time if it is to defend cloud workloads from runtime attack and ensure business continuity. Machine-speed attacks spread evil at machine speed. Delayed detections give the adversary the time needed – literally, only a matter of seconds – to bring a cloud workload to a grinding halt. And if not ransomware, then it’s malware quietly spreading throughout your cloud footprint. In broad brushstrokes, the wider the spread, the larger the remediation effort. Delays cost. The fewer the delays, the better.
Resource Efficiency
Infrastructure & Operations carry the costs of operating an agent, even if those costs eventually are transferred internally to the lines of business. Any application, be it a CWPP agent or otherwise, requires compute and memory resources to function, and those resources come at a cost. For deployment within a fixed and sunk cost infrastructure such as a data center, such apps take away resources that would otherwise be available for the primary business workloads; while it’s not an incremental operational expense, there is the opportunity cost of resources. For cloud IaaS however, resources used are metered and paid for on-demand; deploying a CWPP agent may necessarily increase the size of the cloud compute instance (e.g., from a t4g.medium to a t4g.large), and thereby incrementally raise its operational expense. It’s a necessary expense, to be sure, but an incremental expense nonetheless.
Therefore, CPU and memory utilization are important to consider.
DevOps Friendly
Organizations went to the cloud to go faster, not slower. Innovate swiftly, operate securely. Solve the agility/security paradox by simplifying deployment, automating scalability with workload demand, and operating entirely in user space with a CWPP solution.
Summary
The advantages of the eBPF framework make it the preferred choice for cloud workload protection. Superior system performance translates to lower operational costs than alternatives relying on kernel modules. Operational stability aspects provide for better business continuity.
Related Articles:
A Wednesday in the Life of a Threat Hunter
Published: 11/27/2024
Bringing the Security vs. Usability Pendulum to a Stop
Published: 11/26/2024
Cyber Essentials vs. Cyber Essentials Plus: Key Differences
Published: 11/26/2024
What Can We Learn from Recent Cloud Security Breaches?
Published: 11/26/2024