Using BPF to increase the throughput of Go network programs by 8 times

Original link: https://colobu.com/2022/06/05/use-bpf-to-make-the-go-network-program-8x-faster/

The classic bpf (classical Berkeley Packet Filter) is a very useful technology, which can greatly improve performance in some special Go underlying network programming situations.

background

I have previously developed a Go UDP application, the client and server communicate through the UDP program and through the raw socket. The purpose of the program is quite special, here I will introduce a simple program as an example.

In fact, I am not strict when I say that I use the rawsocket method, I do not implement the socket and communicate in the following way (link layer method):


1
2
3
4
5
6
7

fd, err:= syscall.Socket(syscall.AF_PACKET, syscall.SOCK_RAW,syscall.ETH_P_ALL)
if (err != nil ) {
fmt.Println( “Error: “ + err.Error())
return ;
}
fmt.Println( “Obtained fd “ , fd)
defer syscall.Close(fd)

Nor does it use the following rawsocket method (IP layer method):


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

func main() {
var err error
fd, e := syscall.Socket(syscall.AF_INET, syscall.SOCK_RAW, syscall.IPPROTO_UDP)
if e != nil {
fmt.Println( “Problem@location 1” )
}
addr := syscall.SockaddrInet4{
Port: 27289 ,
Addr: [4 ] byte {127 , 0 , 0 , 1 },
}
p := pkt()
err = syscall.Sendto(fd, p, 0 , &addr)
if err != nil {
log.Fatal( “Sendto:” , err)
}
}

Instead, it directly uses the encapsulated method net.ListenPacket("ip4:udp", addr) in the Go standard library to send and receive packets at the IP layer.

By encapsulating the data structure of the custom UDP layer, I realize the sending and receiving of custom packets, and monitor the network.

Some people may say that using the standard library’s UDPConn is not enough. If it is an ordinary UDP program, there is no problem. If there are some special requirements, such as monitoring 1000 UDP ports, and tens of thousands of nodes regularly send monitoring data, we are unlikely to establish 1000*10000 UDPConn, so Here I use the rawsocket communication method.

RawSocket is a part of the standard Berkeley socket. When we use the standard library of Go to develop network programs, most of the scenarios use the encapsulated datagram type (UDPConn) or stream type (TCPConn), but if you want to do something more low-level For network programming, you need to use RawSocket, such as lower-level TCP, UDP network control, ICMP, ARP and other protocols. Different operating systems may implement different RawSockets. Here we take the Linux environment as an example.

The Linux man manual gives a detailed introduction to the relevant knowledge of RawSocket: socket(2) , packet(7) , raw(7) , this article will not repeat it, and this is not the focus of this article.

According to the Linux documentation, packets received in the Linux server = will be passed to both the kernel network module and RawSocket. So you sometimes need to pay attention when using RawSocket. For example, when you are processing TCP packets, the network program of the Linux kernel may have already processed this packet.

Raw sockets may tap all IP protocols in Linux, even protocols like ICMP or TCP which have a protocol module in the kernel. In this case, the packets are passed to both the kernel module and the raw socket(s). This should not be relied upon in portable programs, many other BSD socket implementations have limitations here.

If there are no special requirements, we directly use net.ListenPacket to implement a RawSocket program. The signature of this method is as follows:


1

func ListenPacket(network, address string ) (PacketConn, error)

The first parameter network can be udp , udp4 , udp6 , unixgram , or ip , ip4 , ip6 plus a colon and a protocol number or protocol name, such as ip:1 , ip:icmp , you can also handle any protocol .

demo program

server program

For the server program, we use conn, err := net.ListenPacket("ip4:udp", *addr) to listen to all UDP packets on the local address and start a goroutine to process them. There should be another judgment in the handler, which is to check whether the UDP port is the port we are dealing with, because here net.ListenPacket listens to all local UDPs, and there may be many useless UDP packets passed into the user mode program. .

Here we use gopacket to define the packets of various protocol layers to facilitate parsing (or creating) the network protocols of each layer of TCP/IP.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

package main
import (
“flag”
“log”
“net”
“github.com/google/gopacket”
“github.com/google/gopacket/layers”
“github.com/smallnest/go-network-programming/codec”
“golang.org/x/net/bpf”
“golang.org/x/net/ipv4”
)
var (
addr = flag.String( “s” , “localhost” , “server address” )
port = flag.Int( “p” , 8972 , “port” )
)
var (
stat = make ( map [ string ] int )
lastStatTime = int64 (0 )
)
func main() {
flag.Parse()
conn, err := net.ListenPacket( “ip4:udp” , *addr)
if err != nil {
panic (err)
}
cc := conn.(*net.IPConn)
cc.SetReadBuffer (20 * 1024 * 1024 )
cc.SetWriteBuffer (20 * 1024 * 1024 )
handleConn(conn)
}
func handleConn(conn net.PacketConn) {
for {
buffer := make ([] byte , 1024 )
n, remoteaddr, err := conn.ReadFrom(buffer)
if err != nil {
log.Fatal(err)
}
buffer = buffer[:n]
packet := gopacket.NewPacket(buffer, layers.LayerTypeUDP, gopacket.NoCopy)
// Get the UDP layer from this packet
if udpLayer := packet.Layer(layers.LayerTypeUDP); udpLayer != nil {
udp, _ := udpLayer.(*layers.UDP)
if app := packet.ApplicationLayer(); app != nil {
data, err := codec.EncodeUDPPacket(net.ParseIP( “127.0.0.1” ), net.ParseIP( “127.0.0.1” ), uint16 (udp.DstPort), uint16 (udp.SrcPort), app.Payload() )
if err != nil {
log.Printf( “failed to EncodePacket: %v” , err)
return
}
if _, err := conn.WriteTo(data, remoteaddr); err != nil {
log.Printf( “failed to write packet: %v” , err)
conn.Close()
return
}
}
}
}
}

client program


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

package main
import (
“fmt”
“log”
“net”
“github.com/google/gopacket”
“github.com/google/gopacket/layers”
“github.com/smallnest/go-network-programming/codec”
)
func main() {
conn, err := net.ListenPacket( “ip4:udp” , “127.0.0.1” )
if err != nil {
panic (err)
}
data, err := codec.EncodeUDPPacket(net.ParseIP( “127.0.0.1” ), net.ParseIP( “127.0.0.1” ), 8972 , 0 , [] byte ( “hello” ))
if err != nil {
log.Printf( “failed to EncodePacket: %v” , err)
return
}
remoteAddr := &net.IPAddr{IP: net.ParseIP( “127.0.0.1” )}
if _, err := conn.WriteTo(data, remoteAddr); err != nil {
log.Printf( “failed to write packet: %v” , err)
conn.Close()
return
}
buffer := make ([] byte , 1024 )
n, _, err := conn.ReadFrom(buffer)
if err != nil {
log.Fatal(err)
}
buffer = buffer[:n]
packet := gopacket.NewPacket(buffer, layers.LayerTypeUDP, gopacket.NoCopy)
// Get the UDP layer from this packet
if udpLayer := packet.Layer(layers.LayerTypeUDP); udpLayer != nil {
if app := packet.ApplicationLayer(); app != nil {
fmt.Printf( “reply: %s\n” , app.Payload())
}
}
}

The client program is simplified here, writes a hello , and reads the return from the server. When we are doing performance testing, we will use a loop to continuously write a seq number and check whether the server returns this seq in order to calculate the packet loss performance. In addition, a current limiter is used to limit the current to test the packet loss rate under a certain RPS.

helper method

The following is the EncodeUDPPacket method, which is used to generate a UDP packet data.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33

package codec
import (
“net”
“github.com/google/gopacket”
“github.com/google/gopacket/layers”
)
func EncodeUDPPacket(localIP, remoteIP net.IP, localPort, remotePort uint16 , payload [] byte ) ([] byte , error) {
ip := &layers.IPv4{
Version: 4 ,
TTL: 128 ,
SrcIP: localIP,
DstIP: remoteIP,
Protocol: layers.IPProtocolUDP,
}
udp := &layers.UDP{
SrcPort: layers.UDPPort(localPort),
DstPort: layers.UDPPort(remotePort),
}
udp.SetNetworkLayerForChecksum(ip)
buf := gopacket.NewSerializeBuffer()
opts := gopacket.SerializeOptions{
ComputeChecksums: true ,
FixLengths: true ,
}
err := gopacket.SerializeLayers(buf, opts, udp, gopacket.Payload(payload))
return buf.Bytes(), err
}

performance issues

Although the above program runs well, there will be some problems in the case of large concurrency.

Above, we started a goroutine to read the packet. This is a performance bottleneck. In the end, the server can only use one core to process the RawSocket packet.

Even creating multiple goroutines to read this PacketConn is useless, because this PacketConn is unique, it is a bottleneck, and multiple goroutines are sometimes better than one goroutine to read.

So can you call net.ListenPacket("ip4:udp", *addr) multiple times to generate multiple RawSockets for concurrent processing?

It seems to be possible, but in fact, these multiple RawSockets will read the same UDPPacket, rather than load balancing across multiple Sockets. Therefore, multiple RawSockets are not only useless, but also consumes more server resources.

The actual test can also reach a throughput of 20,000 to 30,000, and no matter how high the concurrency is, there will be packet loss.

But is there no way?

Neither. Here we can see that the main performance bottleneck is that the above program cannot achieve load balancing and use the ability of multiple cores to process concurrently. The second performance bottleneck is that the program monitors all the UDP packets of the local machine and hands it to the user-mode program for filtering and processing. There are many unnecessary packets in it.

Both of these performance issues we can handle with BPF.

BPF for packet filtering

The classic BPF appeared as early as 1994, and although everyone is talking about the extended BPF (eBPF), the classic BPF can still exert its power.

You may not have used BPF in programming, but I believe you must have had some experience with it in practice.

For example, when you use tcpdump to monitor network transmission, you often add filtering methods. For example, the following command only listens to port 8080 of the tcp protocol:


1

tcpdump -nn -vvvv -i any “tcp port 8080”

In fact, tcpdump generates a filter for tcp port 8080 , filters packets in the kernel, and only filters out the filtered packets.

In fact, you can view the compiled filtered code with the following command:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one

[root@lab ~] # tcpdump -d “tcp port 8080”
( 000 ) ldh [ 12 ]
( 001 ) jeq #0x86dd jt 2 jf 8
( 002 )ldb[ 20 ]
( 003 ) jeq #0x6 jt 4 jf 19
( 004 ) ldh [ 54 ]
( 005 ) jeq #0x1f90 jt 18 jf 6
( 006 ) ldh [ 56 ]
( 007 ) jeq #0x1f90 jt 18 jf 19
( 008 ) jeq #0x800 jt 9 jf 19
( 009 )ldb[ 23 ]
( 010 ) jeq #0x6 jt 11 jf 19
( 011 ) ldh [ 20 ]
( 012 ) jset #0x1fff jt 19 jf 13
( 013 ) ldxb 4 *([ 14 ] & 0xf)
( 014 ) ldh[x+ 14 ]
( 015 ) jeq #0x1f90 jt 18 jf 16
( 016 ) ldh[x+ 16 ]
( 017 ) jeq #0x1f90 jt 18 jf 19
( 018 ) ret #262144
( 019 ) ret #0

What does this mean? BPF defines a limited number of instructions that can filter packets in the VM.
The first line is the offset of the loaded package (offset 12 bytes), the second line is to check whether it is IPV6, if it is, jump to 002 , if not, jump to 008 . Let’s focus on IPV4.
The line 008 is to judge whether it is ipv4, if it is, jump to 009 . 009 loads a byte at offset 23, which is ip proto, line 010 determines whether ip proto is TCP, and if so, jumps to 011 .
Next, flags are judged to determine the address of the data.
Lines 014 and 016 read the source port and destination port in the TCP protocol. If it is equal to 8080 ( 0x1f90 ), the maximum return packet size is 262144 bytes, otherwise the packet is discarded.

Of course, the code generated by tcpdump is quite strict. When we actually write it, if it is determined to be an ipv4 package and the package has no expansion, the code written is simpler than this. But when we actually apply BPF, we might as well use the code generated by tcpdump without error.

Use -dd to display c code snippets, use -ddd to display decimal numbers. Let’s take a look at the effect of -dd , as this result we can use to convert to Go code:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one

[root@lab ~]# tcpdump -dd “tcp port 8080”
{ 0x28, 0 , 0 , 0x0000000c } ,
{ 0x15, 0 , 6 , 0x000086dd } ,
{ 0x30, 0 , 0 , 0x00000014 } ,
{ 0x15, 0 , 15 , 0x00000006 } ,
{ 0x28, 0 , 0 , 0x00000036 } ,
{ 0x15, 12 , 0 , 0x00001f90 } ,
{ 0x28, 0 , 0 , 0x00000038 } ,
{ 0x15, 10 , 11 , 0x00001f90 } ,
{ 0x15, 0 , 10 , 0x00000800 } ,
{ 0x30, 0 , 0 , 0x00000017 } ,
{ 0x15, 0 , 8 , 0x00000006 } ,
{ 0x28, 0 , 0 , 0x00000014 } ,
{ 0x45, 6 , 0 , 0x00001fff } ,
{ 0 xb1, 0 , 0 , 0 x0000000e },
{ 0x48, 0 , 0 , 0x0000000e } ,
{ 0x15, 2 , 0 , 0x00001f90 } ,
{ 0x48, 0 , 0 , 0x00000010 } ,
{ 0x15, 0 , 1 , 0x00001f90 } ,
{ 0x6, 0 , 0 , 0x00040000 } ,
{ 0x6, 0 , 0 , 0x00000000 } ,

In fact, x/net/bpf provides corresponding methods to make it easier to write BPF programs, serialize and deserialize. For example, to write a filter to filter out all packets whose destination port is equal to 8972, we can simply write it in the following format (considering the simple form, we only consider the form of IPV4 and ordinary IP packets):


1
2
3
4
5
6
7
8
9

type Filter []bpf.Instruction
var filter = Filter{
bpf.LoadAbsolute{Off: 22 , Size: 2 }, //Load the destination port to the register
bpf.JumpIf{Cond: bpf.JumpEqual, Val: 8972 , SkipFalse: 1 }, // If the value is equal to 8972, execute the next line, otherwise skip the next line
bpf.RetConstant{Val: 0 xffff}, // return the data of up to 0xffff bytes of this packet
bpf.RetConstant{Val: 0 x0}, // return zero bytes, i.e. ignore this packet
}

We can write a program to convert the code generated by tcpdump into the RawInstruction instruction of bpf:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33
34
35
36
37
38

func parse(data string ) (raws[]bpf.RawInstruction) {
lines := strings.Split(data, “\n” )
for _, line := range lines {
line = strings.TrimSpace(line)
if line == “” {
continue
}
line = strings.TrimPrefix(line, “{“ )
line = strings.TrimSuffix(line, ” },” )
items := strings.Split(line, “,” )
// assert len(items) == 4
raw := bpf.RawInstruction{
Op: uint16 (numToInteger(items [0 ])),
Jt: uint8 (numToInteger(items [1 ])),
Jf: uint8 (numToInteger(items [2 ])),
K: uint32 (numToInteger(items [3 ])),
}
raws = append (raws, raw)
}
return raws
}
func numToInteger(s string ) int {
s = strings.TrimSpace(s)
if strings.HasPrefix(s, “0x” ) {
s := strings.Replace(s, “0x” , “” , -1 )
result, _ := strconv.ParseInt(s, 16 , 64 )
return int (result)
}
result, _ := strconv.ParseInt(s, 10 , 64 )
return int (result)
}

Well, all of this is ready, the background knowledge is introduced, and the performance bottleneck of the current RawSocket program is also introduced, so what if the performance bottleneck is solved.

For the first performance bottleneck, we can generate multiple goroutines, each of which is responsible for filtering a part of the packets, thus achieving load balancing. For example, filtering according to the client’s IP, or the server listening on 1000 ports, each goroutine is only responsible for a part of the ports. And can filter according to the client’s source port and so on. Always, through BPF filtering, a goroutine is only responsible for a part of the packet, realizing multi-core processing.

The second bottleneck is solved along with the first problem. Because BPF only filters our specific ports, UDP packets of other ports will not be copied from kernel mode to user mode, reducing the processing of useless packets.

To set up BPF filtering for PacketConn of the standard library, there are also various methods, such as calling syscall.SetsockoptInt to set it. But golang.org/x/net/ipv4 provides the SetBPF method, we can directly convert the standard library PacketConn to ipv4.PacketConn, and then set it.

For example, in the above server program, we can modify it to use BPF to filter in kernel mode:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

package main
import (
“flag”
“log”
“net”
“github.com/google/gopacket”
“github.com/google/gopacket/layers”
“github.com/smallnest/go-network-programming/codec”
“golang.org/x/net/bpf”
“golang.org/x/net/ipv4”
)
var (
addr = flag.String( “s” , “localhost” , “server address” )
port = flag.Int( “p” , 8972 , “port” )
)
var (
stat = make ( map [ string ] int )
lastStatTime = int64 (0 )
)
func main() {
flag.Parse()
conn, err := net.ListenPacket( “ip4:udp” , *addr)
if err != nil {
panic (err)
}
cc := conn.(*net.IPConn)
cc.SetReadBuffer (20 * 1024 * 1024 )
cc.SetWriteBuffer (20 * 1024 * 1024 )
pconn := ipv4.NewPacketConn(conn)
var assembled []bpf.RawInstruction
if assembled, err = bpf.Assemble(filter); err != nil {
log.Print(err)
return
}
pconn.SetBPF(assembled)
handleConn(conn)
}
func handleConn(conn net.PacketConn) {
for {
buffer := make ([] byte , 1024 )
n, remoteaddr, err := conn.ReadFrom(buffer)
if err != nil {
log.Fatal(err)
}
buffer = buffer[:n]
packet := gopacket.NewPacket(buffer, layers.LayerTypeUDP, gopacket.NoCopy)
// Get the UDP layer from this packet
if udpLayer := packet.Layer(layers.LayerTypeUDP); udpLayer != nil {
udp, _ := udpLayer.(*layers.UDP)
if app := packet.ApplicationLayer(); app != nil {
data, err := codec.EncodeUDPPacket(net.ParseIP( “127.0.0.1” ), net.ParseIP( “127.0.0.1” ), uint16 (udp.DstPort), uint16 (udp.SrcPort), app.Payload() )
if err != nil {
log.Printf( “failed to EncodePacket: %v” , err)
return
}
if _, err := conn.WriteTo(data, remoteaddr); err != nil {
log.Printf( “failed to write packet: %v” , err)
conn.Close()
return
}
}
}
}
}
type Filter []bpf.Instruction
var filter = Filter{
bpf.LoadAbsolute{Off: 22 , Size: 2 }, // load the destination port
bpf.JumpIf{Cond: bpf.JumpEqual, Val: uint32 (*port), SkipFalse: 1 }, // if Val != 8972 skip next instruction
bpf.RetConstant{Val: 0 xffff}, // return 0xffff bytes (or less) from packet
bpf.RetConstant{Val: 0 x0}, // return 0 bytes, effectively ignore this packet
}

This article is reprinted from: https://colobu.com/2022/06/05/use-bpf-to-make-the-go-network-program-8x-faster/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment