Permalink to this article – https://ift.tt/VeYs7P5
In the previous two articles, whether the eBPF program developed in C language or the eBPF program developed in Go is hello world level, it may be useful, but it is not very practical.
Generally speaking, a practical eBPF program has data exchange between its kernel mode part and user mode part. With this data exchange, eBPF can exert greater power. To make the eBPF program more practical, the eBPF MAP is a mechanism that cannot be bypassed .
In this article about eBPF program development, let’s take a look at how to use Go to implement bidirectional data exchange between kernel mode and user mode of eBPF program based on BPF MAP .
1. Why BPF MAP?
Never forget that BPF bytecode is code that runs in OS kernel mode, which means that it has a “clear” boundary from user mode. We know that if the user mode wants to access the data in the kernel mode, it can usually only be achieved by trapping in the kernel mode through a system call. Therefore, various variable instances created in a BPF kernel-mode program can only be accessed by kernel-mode code.
So how do we return the useful data obtained by the BPF code in the kernel state to the user state for monitoring, calculation, decision-making, display, and storage? How does the user mode code pass data to the kernel mode at runtime to change the running strategy of the BPF code?
The Linux kernel BPF developers then introduced the BPF MAP mechanism . BPF MAP provides a two-way data exchange channel between kernel mode and user mode of BPF program . At the same time, because the bpf map is stored in the memory space allocated by the kernel and is in the kernel state, it can be shared by multiple BPF programs running in the kernel state, and can also be used as a mechanism for multiple BPF programs to exchange and share data.
2. BPF MAP is not a narrow map data structure
What exactly is a BPF MAP? It is not a data structure of a hash map in our narrow sense, but a general data structure that can store different types of data . In the words of Andrii Nakryiko , a well-known kernel BPF developer, MAP is a concept in BPF that represents an abstract data container .
As of now, there are 20+ MAP types supported by kernel BPF. The following are the currently supported MAP types listed in bpf.h in libbpf:
// libbpf/include/uapi/linux/bpf.h enum bpf_map_type { BPF_MAP_TYPE_UNSPEC, BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_ARRAY, BPF_MAP_TYPE_PROG_ARRAY, BPF_MAP_TYPE_PERF_EVENT_ARRAY, BPF_MAP_TYPE_PERCPU_HASH, BPF_MAP_TYPE_PERCPU_ARRAY, BPF_MAP_TYPE_STACK_TRACE, BPF_MAP_TYPE_CGROUP_ARRAY, BPF_MAP_TYPE_LRU_HASH, BPF_MAP_TYPE_LRU_PERCPU_HASH, BPF_MAP_TYPE_LPM_TRIE, BPF_MAP_TYPE_ARRAY_OF_MAPS, BPF_MAP_TYPE_HASH_OF_MAPS, BPF_MAP_TYPE_DEVMAP, BPF_MAP_TYPE_SOCKMAP, BPF_MAP_TYPE_CPUMAP, BPF_MAP_TYPE_XSKMAP, BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, BPF_MAP_TYPE_QUEUE, BPF_MAP_TYPE_STACK, BPF_MAP_TYPE_SK_STORAGE, BPF_MAP_TYPE_DEVMAP_HASH, BPF_MAP_TYPE_STRUCT_OPS, BPF_MAP_TYPE_RINGBUF, BPF_MAP_TYPE_INODE_STORAGE, BPF_MAP_TYPE_TASK_STORAGE, BPF_MAP_TYPE_BLOOM_FILTER, };
There are many types of data structures here, but they are not the focus of this article, so we will not introduce them one by one. The BPF_MAP_TYPE_HASH type is the first MAP data structure supported by BPF. This type can be understood as the hash mapping table that we are in daily contact with, indexing data in the form of key-value pairs. We will use this type of MAP in subsequent examples.
So how can BPF MAP share data in kernel mode and user mode? What is the principle?
From the description of the bpf system call , we can find clues. The following is the function prototype of the bpf system call:
// https://man7.org/linux/man-pages/man2/bpf.2.html #include <linux/bpf.h> int bpf(int cmd, union bpf_attr *attr, unsigned int size);
From the prototype of bpf, it seems relatively simple. But bpf is actually a “rich call”, that is, it can do more than one thing. The value passed in through cmd is different, and it can accomplish many things around BPF. The main function is to load the bpf program (cmd=BPF_PROG_LOAD), followed by a series of operations around MAP, including creating MAP (cmd=BPF_MAP_CREATE), MAP element query (cmd=BPF_MAP_LOOKUP_ELEM), MAP element value update (cmd=BPF_MAP_UPDATE_ELEM) Wait.
When cmd=BPF_MAP_CREATE, that is, after bpf performs the operation of creating a MAP, the bpf call will return a file descriptor fd, through which the newly created MAP can be operated subsequently . Access map through fd, this is very unix !
Of course, such a low-level system call is generally not needed by BPF user mode developers. For example, libbpf wraps a series of map operation functions. These functions do not expose map fd to users, simplifying the usage method and improving the usage experience.
Let’s first take a look at how to use the C language to realize the data exchange between the map-based BPF user mode and the kernel mode.
3. An example of using map based on libbpf using C
This example is adapted from the helloworld example . The original helloworld example outputs a kernel log when the execve system call is called (can be viewed in /sys/kernel/debug/tracing/trace_pipe), and the user-mode program does not do any data exchange with the kernel-mode program.
In this new example ( execve_counter ), we still keep track of the system call execve , the difference is that we count the calls to execve and store the technique in the BPF MAP. The user mode part of the program reads the count in the MAP and outputs the count value regularly.
Let’s first take a look at the source code of the BPF kernel mode part:
// https://github.com/bigwhite/experiments/tree/master/ebpf-examples/execve-counter/execve_counter.bpf.c #include <linux/bpf.h> #include <bpf/bpf_helpers.h> typedef __u64 u64; typedef char stringkey[64]; struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 128); //__type(key, stringkey); stringkey* key; __type(value, u64); } execve_counter SEC(".maps"); SEC("tracepoint/syscalls/sys_enter_execve") int bpf_prog(void *ctx) { stringkey key = "execve_counter"; u64 *v = NULL; v = bpf_map_lookup_elem(&execve_counter, &key); if (v != NULL) { *v += 1; } return 0; } char LICENSE[] SEC("license") = "Dual BSD/GPL";
Unlike the helloworld example, we define a map structure execve_counter in the new example, which is marked as a BPF MAP variable through the SEC macro.
This map structure has four fields:
- type: The BPF MAP type used (see the previous bpf_map_type enumeration type), here we use BPF_MAP_TYPE_HASH, which is a hash table structure;
- max_entries: the maximum number of key-value pairs in the map;
- key: A pointer to the key memory space. Here we have customized a type stringkey (char[64]) to represent the type of each key element;
- value: A pointer to the value memory space, where the value element is of type u64, a 64-bit integer.
The implementation of the kernel mode function bpf_prog is also relatively simple: query the key “execve_counter” in the above map, and if found, add 1 to the value in the memory pointed to by the obtained value pointer.
Let’s take a look at the program source code of the user mode part of the execve_counter example:
// https://github.com/bigwhite/experiments/tree/master/ebpf-examples/execve_counter/execve_counter.c #include <stdio.h> #include <unistd.h> #include <sys/resource.h> #include <bpf/libbpf.h> #include <linux/bpf.h> #include "execve_counter.skel.h" typedef __u64 u64; typedef char stringkey[64]; static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args) { return vfprintf(stderr, format, args); } int main(int argc, char **argv) { struct execve_counter_bpf *skel; int err; libbpf_set_strict_mode(LIBBPF_STRICT_ALL); /* Set up libbpf errors and debug info callback */ libbpf_set_print(libbpf_print_fn); /* Open BPF application */ skel = execve_counter_bpf__open(); if (!skel) { fprintf(stderr, "Failed to open BPF skeleton\n"); return 1; } /* Load & verify BPF programs */ err = execve_counter_bpf__load(skel); if (err) { fprintf(stderr, "Failed to load and verify BPF skeleton\n"); goto cleanup; } /* init the counter */ stringkey key = "execve_counter"; u64 v = 0; err = bpf_map__update_elem(skel->maps.execve_counter, &key, sizeof(key), &v, sizeof(v), BPF_ANY); if (err != 0) { fprintf(stderr, "Failed to init the counter, %d\n", err); goto cleanup; } /* Attach tracepoint handler */ err = execve_counter_bpf__attach(skel); if (err) { fprintf(stderr, "Failed to attach BPF skeleton\n"); goto cleanup; } for (;;) { // read counter value from map err = bpf_map__lookup_elem(skel->maps.execve_counter, &key, sizeof(key), &v, sizeof(v), BPF_ANY); if (err != 0) { fprintf(stderr, "Lookup key from map error: %d\n", err); goto cleanup; } else { printf("execve_counter is %llu\n", v); } sleep(5); } cleanup: execve_counter_bpf__destroy(skel); return -err; }
The map is created in execve_counter_bpf__load, you will find the trace code (refer to the libbpf source code), and finally call the bpf system call to create the map.
Different from the helloworld example, before the attach handler, we use the bpf_map__update_elem packaged by libbpf to initialize the key in the bpf map (initialized to 0, if there is no such step, the first time the bpf program is executed, it will prompt that the key cannot be found ).
Then after attaching the handler, we query the value of key=”execve_counter” through bpf_map__lookup_elem every 5s in a loop and output it to the console.
The reason why user-mode programs can use map directly is because execve_counter.skel.h generated by bpftool based on execve_counter.bpf.c contains various information of map.
Next, we execute make to compile the ebpf program, then execute and observe the output:
$sudo ./execve_counter libbpf: loading object 'execve_counter_bpf' from buffer libbpf: elf: section(3) tracepoint/syscalls/sys_enter_execve, size 192, link 0, flags 6, type=1 libbpf: sec 'tracepoint/syscalls/sys_enter_execve': found program 'bpf_prog' at insn offset 0 (0 bytes), code size 24 insns (192 bytes) libbpf: elf: section(4) .reltracepoint/syscalls/sys_enter_execve, size 16, link 22, flags 0, type=9 libbpf: elf: section(5) .rodata, size 64, link 0, flags 2, type=1 libbpf: elf: section(6) .maps, size 32, link 0, flags 3, type=1 libbpf: elf: section(7) license, size 13, link 0, flags 3, type=1 libbpf: license of execve_counter_bpf is Dual BSD/GPL libbpf: elf: section(13) .BTF, size 898, link 0, flags 0, type=1 libbpf: elf: section(15) .BTF.ext, size 176, link 0, flags 0, type=1 libbpf: elf: section(22) .symtab, size 744, link 1, flags 0, type=2 libbpf: looking for externs among 31 symbols... libbpf: collected 0 externs total libbpf: map 'execve_counter': at sec_idx 6, offset 0. libbpf: map 'execve_counter': found type = 1. libbpf: map 'execve_counter': found key [9], sz = 64. libbpf: map 'execve_counter': found value [13], sz = 8. libbpf: map 'execve_counter': found max_entries = 128. libbpf: map 'execve_c.rodata' (global data): at sec_idx 5, offset 0, flags 480. libbpf: map 1 is "execve_c.rodata" libbpf: sec '.reltracepoint/syscalls/sys_enter_execve': collecting relocation for section(3) 'tracepoint/syscalls/sys_enter_execve' libbpf: sec '.reltracepoint/syscalls/sys_enter_execve': relo #0: insn #15 against 'execve_counter' libbpf: prog 'bpf_prog': found map 0 (execve_counter, sec 6, off 0) for insn #15 libbpf: map 'execve_counter': created successfully, fd=4 libbpf: map 'execve_c.rodata': created successfully, fd=5 execve_counter is 0 execve_counter is 0 execve_counter is 9 execve_counter is 23 ... ...
Note: If you don’t know how to compile the execve_counter example, please go to “Developing a Hello World-level eBPF Program from Scratch Using C Language” to understand its construction principle.
The bpftool tool provides a map view feature, which we can use to view the map created by the example:
$sudo bpftool map 114: hash name execve_counter flags 0x0 key 64B value 8B max_entries 128 memlock 20480B btf_id 120 116: array name execve_c.rodata flags 0x80 key 4B value 64B max_entries 1 memlock 4096B frozen
We can also dump the entire map:
$sudo bpftool map dump id 114 [{ "key": "execve_counter", "value": 23 } ]
We see that there is only one key-value pair (key=”execve_counter”) in the whole map, and its value is consistent with the output of the user-mode part of the example.
Well, with the C example as a base, let’s see how to implement this example based on Go.
4. Using Go to implement execve-counter example based on cilium/ebpf
It is much easier to develop BPF userland parts of the program in Go, and the packages provided by cilium/ebpf are very simple to use. If you don’t know how to use Go to develop ebpf user mode, please go to the article “Developing eBPF Programs in Go” to learn about it.
The essential raw material for the Go language example is execve_counter.bpf.c. The only difference between this C source file and the execve_counter.bpf.c in the above execve_counter example is that the include header file is changed to common.h:
$diff execve_counter.bpf.c ../execve-counter/execve_counter.bpf.c 1,2c1,2 < < #include "common.h" --- > #include <linux/bpf.h> > #include <bpf/bpf_helpers.h>
Based on the raw material execve_counter.bpf.c, the bpf2go tool will generate the Go source code required for the user mode part, for example: the bpf map instance contained in bpfObject:
// bpfMaps contains all maps after they have been loaded into the kernel. // // It can be passed to loadBpfObjects or ebpf.CollectionSpec.LoadAndAssign. type bpfMaps struct { ExecveCounter *ebpf.Map `ebpf:"execve_counter"` }
Finally, we can directly use these generated Go functions related to bpf objects in the main function of the main package. The following is the partial source code of main.go:
// https://github.com/bigwhite/experiments/tree/master/ebpf-examples/execve-counter-go/main.go // $BPF_CLANG, $BPF_CFLAGS and $BPF_HEADERS are set by the Makefile. //go:generate bpf2go -cc $BPF_CLANG -cflags $BPF_CFLAGS -target bpfel,bpfeb bpf execve_counter.bpf.c -- -I $BPF_HEADERS func main() { stopper := make(chan os.Signal, 1) signal.Notify(stopper, os.Interrupt, syscall.SIGTERM) // Allow the current process to lock memory for eBPF resources. if err := rlimit.RemoveMemlock(); err != nil { log.Fatal(err) } // Load pre-compiled programs and maps into the kernel. objs := bpfObjects{} if err := loadBpfObjects(&objs, nil); err != nil { log.Fatalf("loading objects: %s", err) } defer objs.Close() // init the map element var key [64]byte copy(key[:], []byte("execve_counter")) var val int64 = 0 if err := objs.bpfMaps.ExecveCounter.Put(key, val); err != nil { log.Fatalf("init map key error: %s", err) } // attach to xxx kp, err := link.Tracepoint("syscalls", "sys_enter_execve", objs.BpfProg, nil) if err != nil { log.Fatalf("opening tracepoint: %s", err) } defer kp.Close() ticker := time.NewTicker(5 * time.Second) defer ticker.Stop() for { select { case <-ticker.C: if err := objs.bpfMaps.ExecveCounter.Lookup(key, &val); err != nil { log.Fatalf("reading map error: %s", err) } log.Printf("execve_counter: %d\n", val) case <-stopper: // Wait for a signal and close the perf reader, // which will interrupt rd.Read() and make the program exit. log.Println("Received signal, exiting program..") return } } }
In the main function, we directly access the map instance through objs.bpfMaps.ExecveCounter, and can directly operate the map through its Put and Lookup methods. It should be noted here that the key type must be consistent with the key type (char[64]) in execve_counter.bpf.c to keep the memory layout consistent, and the string type cannot be used directly, otherwise the following error will be reported during execution:
init map key error: can't marshal key: string doesn't marshal to 64 bytes
Compiling and executing execve-counter-go is the same as helloworld-go:
$make $go run -exec sudo main.go bpf_bpfel.go 2022/07/17 16:59:52 execve_counter: 0 2022/07/17 16:59:57 execve_counter: 14 ^C2022/07/17 16:59:59 Received signal, exiting program..
V. Summary
This paper introduces the main method for data exchange between the eBPF kernel-mode part and the user-mode part: the BPF MAP mechanism. The MAP here is not a hash table in the narrow sense, but an abstract data structure container. Currently, it supports more than 20 data structures. You can choose the appropriate structure according to your needs (you can check the manual for the characteristics of various data structures. ).
The MAP is also created by the bpf system call in essence. The bpf program only needs to declare the key, value, type and other composition information of the map. User mode can use the fd operation map returned by the bpf system call. libbpf and cilium/ebpf encapsulate the fd operation, which simplifies the use of the API.
The update operation of the map in the kernel is not atomic, so when multiple bpf programs access a map concurrently, synchronization operations are required. bpf provides bpf_spin_lock to synchronize map operations. We can add bpf_spin_lock to the value type to synchronize changes to the value, as in the following example (example from the book “Linux Observability with BPF” ):
struct concurrent_element { struct bpf_spin_lock semaphore; int count; } struct bpf_map_def SEC("maps") concurrent_map = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(int), .value_size = sizeof(struct concurrent_element), .max_entries = 100, }; int bpf_program(struct pt_regs *ctx) { intkey=0; struct concurrent_element init_value = {}; struct concurrent_element *read_value; bpf_map_create_elem(&concurrent_map, &key, &init_value, BPF_NOEXIST); read_value = bpf_map_lookup_elem(&concurrent_map, &key); bpf_spin_lock(&read_value->semaphore); read_value->count += 100; bpf_spin_unlock(&read_value->semaphore); }
The code involved in this article can be downloaded here .
6. References
- “Demystifying the past and present of the BPF map” – https://ift.tt/2lju0tv
- “Edge Network eBPF Super Capability: Principle and Performance Analysis of eBPF Map” – https://ift.tt/XVUMOW1
- bpf system call description – https://ift.tt/8OK0yQh
- Official bpf map reference manual – https://ift.tt/412smPC
- bpftool Reference Manual – https://ift.tt/BVqXWvl
- “Building BPF applications with libbpf-bootstrap” – https://ift.tt/qhgyTLm
“Gopher Tribe” Knowledge Planet aims to create a high-quality Go learning and advanced community! High-quality first published Go technical articles, “three-day” first published reading rights, analysis of the current situation of Go language development twice a year, reading the fresh Gopher daily 1 hour in advance every day, online courses, technical columns, book content preview, must answer within 6 hours Guaranteed to meet all your needs about the Go language ecosystem! In 2022, the Gopher tribe will be fully revised, and will continue to share knowledge, skills and practices in the Go language and Go application fields, and add many forms of interaction. Everyone is welcome to join!
I love texting : Enterprise-level SMS platform customization development expert https://51smspush.com/. smspush : A customized SMS platform that can be deployed within the enterprise, with three-network coverage, not afraid of large concurrent access, and can be customized and expanded; the content of the SMS is determined by you, no longer bound, with rich interfaces, long SMS support, and optional signature. On April 8, 2020, China’s three major telecom operators jointly released the “5G Message White Paper”, and the 51 SMS platform will also be newly upgraded to the “51 Commercial Message Platform” to fully support 5G RCS messages.
The famous cloud hosting service provider DigitalOcean released the latest hosting plan. The entry-level Droplet configuration is upgraded to: 1 core CPU, 1G memory, 25G high-speed SSD, and the price is 5$/month. Friends who need to use DigitalOcean can open this link : https://ift.tt/kbBoJtP to open your DO host road.
Gopher Daily Archive Repository – https://ift.tt/tq6wYb3
my contact information:
- Weibo: https://ift.tt/mPF9hxn
- Blog: tonybai.com
- github: https://ift.tt/g0PSm36
Business cooperation methods: writing, publishing books, training, online courses, partnership entrepreneurship, consulting, advertising cooperation.
© 2022, bigwhite . All rights reserved.
This article is reprinted from https://tonybai.com/2022/07/25/bidirectional-data-exchange-between-kernel-and-user-states-of-ebpf-programs-using-go/
This site is for inclusion only, and the copyright belongs to the original author.