2022 Go ecosystem rpc framework Benchmark

Original link: https://colobu.com/2022/07/31/2022-rpc-frameworks-benchmarks/

It has been a whole year since the last test of the Go ecosystem rpc framework benchmark in 2021 . In the past year, various RPC frameworks have also made great progress, and rpcx has also made some beneficial optimizations and simplifications with the support of many netizens, so it is time to do a performance comparison of several commonly used frameworks in China.

Every time the performance result is released, it will inevitably attract the attention of many people, and it will also attract more controversy. This is normal. A benchmark does not represent the evaluation of the pros and cons of a framework, and a benchmarkakr test is not enough to test all application scenarios. Under different business forms, huge differences in the number of connections, different message formats, different message sizes, and different deployment environments may lead to better results for a certain framework and poor results for a certain framework in some scenarios. We can discuss and focus on what else can be improved in performance. The optimization points of a framework may bring some optimization hints to other frameworks.

In addition, when actually using the microservice framework, most of the performance bottlenecks lie in the business code, not the framework itself, so it is also very important to focus on optimizing the business code.

This test was tested against five common rpc frameworks:

rpcx , one of the earliest microservice frameworks in the Go ecosystem, used by Sina, TAL and other companies
kitex , a microservice framework produced by ByteDance
arpc : An excellent rpc framework by lesismal
grpc : An open source rpc framework initiated by Google, which supports cross-language and is widely used. The system is based on HTTP/2 protocol transmission and uses Protocol Buffers as the interface description language.
rpc/std_rpc of the standard library: the rpc framework that comes with the Go standard library, currently under maintenance

Tested with the latest version of each framework:

rpcx : v1.7.8
kitex : v0.3.4
arpc : v1.2.9
grpc : v1.48.0
std_rpc : v1.8.4

In order to maintain consistent testing as much as possible, all tests are performed in the same environment, the same test logic, and the same test message:

Test environment (two old machines that have been in service for 6 years, one for server and one for client)
- CPU Model: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz / 2pcs
- Physical Cores: 6; Logical Cores: 12 Cores
- Memory: 64G
- Enable network card multi-queue
Both use protobuf as the encoding method of the message body, and the size of the encoded message body is 581 bytes.
Statistics on the client side, statistical throughput (throughputs) and latency (lantency, including the maximum value, average value, median value, and p99.9 value, the minimum value is less than 1 millisecond, so not counted)
- For latency, we also need to pay attention to its long tail, so we need to pay attention to the median and p99.9. p99.9 means that 999 out of 1000 requests have latency less than a certain time.

It is mainly divided into three scenarios for testing.

The number of TCP connections is 10, and the connection is long. The cases where the number of concurrency is 100, 200, 500, 1000, 2000, and 5000 mainly test the maximum capabilities of each framework under high concurrency, including throughput and latency.
The number of TCP connections is 10, the concurrent number is 200, and the connection is long. The delay in the case of throughput of 100,000/second, 150,000/second, and 180,000/second is mainly measured in the case of consistent throughput, the delay and long tail of each framework. A large tail of delay Hu Zhe means that it is not appropriate to use this throughput rate.
The number of TCP connections is 1000, the number of concurrency is 1000, and the connection is long.
It mainly tests the performance of each framework when the number of large connections is relatively high. The main purpose is to target the self-developed netpull of the byte kitex to see if it is more suitable for such a large number of connections.

The number of concurrency here refers to the number of goroutines started by the client, and multiple goroutines may share the same client (connection).

All the test code of each framework is placed in rpcx-benchmark . The test command is very simple, as shown in the introduction below, so you can download it to test and verify it yourself, and welcome to provide optimization patches, add more rpc frameworks, and so on.

Scenario 1: The number of TCP connections is 10, and the connection is long. When the number of concurrency is 100, 200, 500, 1000, 2000, 5000

Server startup: ./server -s xxx.xxx.xxx.xxx:8972 // xxx.xxx.xxx.xxx is the IP address the server listens to
Client test: ./client -c 2000 -n 1000000 -s xxx.xxx.xxx.xxx:8972 // The number of concurrency here is 2000, and the test sends a total of 1 million requests

Test raw data:

Throughput

The Y-axis is throughput/sec, the higher the better. It can be seen that the throughput of the arpc framework is very good. Last year’s test was also the better performance of arpc, followed by rpcx, standard rpc, kitex, and grpc.

It can be seen that in the case of a relatively large amount of concurrency, except for grpc, all others can achieve a throughput of 180,000 requests/second.

average delay

The Y axis is the average latency, the lower the better, you can see that the latency of grpc is higher. Latency is the opposite of throughput above.

P99.9 Delay

Another case where we may be more concerned about the long tail, we can choose the median or p99.9 observation. Below is the case for p99.9.

Of course, the comparison is meaningless when the number of concurrency is 5000, because when the number of concurrency is 5000, the throughput of each framework is different, and the delay will be large when the throughput is high.
So in order to observe the long-tail situation in a fair situation, I designed the following test.

Scenario 2: The number of TCP connections is 10, the concurrent number is 200, and the connection is long. Latency at throughputs of 100,000/sec, 150,000/sec, and 180,000/sec

In order to observe the delay, we need to ensure that the throughput is the same as possible, and it is more scientific to observe the delay and the long tail under the same throughput.

Therefore, the second scenario is to observe the delay of the framework and the delay of P99.9 for three scenarios of throughput of 100,000/second, 150,000/second, and 180,000/second when the number of concurrency is 200.

Test raw data:

average delay

The average delay seems to be relatively high for grpc, because it can’t reach the throughput of 180,000/sec, so its last value is not there. kitex is relatively high, this depends on whether kitex has some specific tuning. Basically it is negatively correlated with throughput.

P99.9 Delay

In the case of the long tail, the following is the case of p99.9.

The P99.9 latency of arpc, rpcx and standard rpc is still relatively good.

Scenario 3: The number of TCP connections is 1000, the number of concurrency is 1000, and the connection is long.

Observe how each framework performs (throughput and latency) under a large number of connections. This scenario does not choose a huge number of connections, but chooses a more common number of connections of 1000 for testing, connecting great long connections.

Arpc is obviously better than other frameworks when the number of connections is large, followed by rpcx and standard rpc frameworks.

The average time and P99.9 time are also in line with this situation, refer to the specific raw data

Overall, the performance of rpcx is still good. I hope that in the future, refer to the brother rpc framework for further optimization.
hWelcome to discuss in the message area of this article.

This article is reprinted from: https://colobu.com/2022/07/31/2022-rpc-frameworks-benchmarks/
This site is for inclusion only, and the copyright belongs to the original author.

Scenario 1: The number of TCP connections is 10, and the connection is long. When the number of concurrency is 100, 200, 500, 1000, 2000, 5000

Throughput

average delay

P99.9 Delay

Scenario 2: The number of TCP connections is 10, the concurrent number is 200, and the connection is long. Latency at throughputs of 100,000/sec, 150,000/sec, and 180,000/sec

average delay

P99.9 Delay

Scenario 3: The number of TCP connections is 1000, the number of concurrency is 1000, and the connection is long.

Leave a Comment Cancel Reply