For some time, I am learning and experimenting with eBPF. I would like to share results of an experimental code I have completed recently. This article describes a Python script that measures how much time a command spends while calling a given standard C library function.
Introduction
Let us start by some definitions which we will use later in the article
All core commands on Linux are developed in C and use standard C library functionality. They call functions such as
strlen, fopen, etc.
from the C library.eBPF is a Linux kernel utility which can run tiny programs in a privileged kernel context. It allows developers to write custom code that is loaded into the kernel dynamically on runtime and it changes how the kernel behaves. It lets the developer observe how the underlying kernel is used. Meaning that the developer can trace how an application spends time, what it calls, where it connects from the viewpoint of kernel.
There are several frameworks to utilize eBPF code which are listed in this project landscape page. I am using bcc toolkit, because it is highly accessible and lets you combine eBPF C code and Python language.
bcc, provides decent Python interfaces that lets you transfer tracing results to Python environment easily. So, after transfering results to Python area, it easy to analyze and visualize the results. In their github repo, project provides a wide range of examples and API documentation which makes it easier to dive into the subject. This is why I choose bcc framework.
The script
I have developed a Python script that utilizes bcc’s interface to eBPF and prints results to the terminal.
Basically, it measures how much time a command spends while calling a standard C library function.
Usage
sudo python3 cmdperf.py -s fopen -d 2 -c 'ls -la /'
The examle command above measures how much time ls
command spends while calling fopen
function for 2 seconds.
-s
is the name of symbol in C library-d
is the duration of measurement-c
is the command to be measured
Warning: Be careful with command you provide because you should call the Python script with sudo
command because eBPF code can only run with superuser privileges.
Code
You can find the whole script in this link, I will explain only how it works.
If you check the script it is a polygot code file meaning that, it is also a valid C file. You can check by changing the syntax highlightning of your editor to C.
Python code loads the file itself as a eBPF C file in the line below
b = BPF(src_file="cmdperf.py", cflags=["-DMAX_CPUS=%s" % str(num_cpus)])
This line creates bcc BPF main object to be used in the later parts of code by reading and compiling Python scripts itself as a C file because as mentioned it is also a valid C file. Compilation flag DMAX_CPUS
are used to initiate performance counters in the C code.
Then, the command provided by the user is called by the line
proc = Popen(
options.command.split(),
stdin=None, stdout=DEVNULL, stderr=None, close_fds=True
)
This is the point where the main calculation happens
total_time = 0
def get_data(cpu, data, size):
nonlocal total_time
e = b["output"].event(data)
if e.pid >> 32 == proc.pid:
total_time += e.time_delta
b["output"].open_perf_buffer(get_data)
This opens a connection to the compiled eBPF code and by this interface each time a BPF event occurs get_data
is called and output values of eBPF code is captured by get_data
event handler.
It compares the PID of the source of event, it it matches the PID of our command then total time is accumulated.
At the end it prints the total time to the terminal.
Example
sudo apt-get install -y bpfcc-tools libbpfcc libbpfcc-dev linux-headers-$(uname -r)
wget -O cmdperf.py https://codeberg.org/odemir/substack_code/raw/branch/main/cmdperf/cmdperf.py
sudo python3 cmdperf.py -s fopen -c 'ls -la /'
Outputs on my system
108.28 us
You should visit bcc’s documentation to install it on your system.
Warning: Be careful with command you provide because you should call the Python script with sudo
command because eBPF code can only run with superuser privileges.