Some recent optimizations for rpcx and some optimization attempts

Original link: https://colobu.com/2022/08/25/some-small-optimizations-of-rpcx/

Recently, before the 2022 Go ecosystem rpc framework Benchmark , it took a week to optimize rpcx. This article records several important optimization points for your reference.

Add handler method to avoid using reflect on the server

In the previous rpcx implementation, the reference is to the registration method of the standard library rpc. The registration of a service is as follows:


1

rpcxserver.RegisterName( “Hello” , new (Hello), “” )

It actually traverses this rcvr by reflection, finds its service method and parameter type, and caches it:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

func (s *Server) register(rcvr interface {}, name string ) ( string , error) {
s.serviceMapMu.Lock()
defer s.serviceMapMu.Unlock()
service := new (service)
service.typ = reflect.TypeOf(rcvr)
service.rcvr = reflect.ValueOf(rcvr)
sname := reflect.Indirect(service.rcvr).Type().Name() // Type
service.name = sname
// Install the methods
service.method = suitableMethods(service.typ, true )
if len (service.method) == 0 {
}
s.serviceMap[service.name] = service
return sname, nil
}

Then when processing the request, according to the called service and method, find the corresponding type, use reflect to generate the corresponding value for the request type and return type, and then use the function.Call of reflect to execute the method call, although the pooled method is used in the middle. technology, but the reflection function is still widely used internally, and the performance suffers a lot.

The implementation of the Go standard library http router inspired me, and the application and excellent performance of lessmal/arpc prompted me to decide to add a more efficient service processing method to rpcx, which is the same as configuring http handler:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33

func hello(ctx *server.Context) error {
msg := &proto.BenchmarkMessage{}
err := ctx.Bind(msg)
if err != nil {
return err
}
msg.Field1 = “OK”
msg.Field2 = 100
if *delay > 0 {
time.Sleep(*delay)
} else {
runtime.Gosched()
}
return ctx.Write(msg)
}
var (
host = flag.String( “s” , “127.0.0.1:8972” , “listened ip and port” )
delay = flag.Duration( “delay” , 0 , “delay to mock business processing by sleep” )
)
func main() {
flag.Parse()
rpcxserver := server.NewServer()
rpcxserver.AddHandler( “Hello” , “Say” , hello)
rpcxserver.Serve( “tcp” , *host)
}

Service method signature such as func Xxx(ctx *server.Context) error , it will pass in a specially made Context, through this Context, you can get the request parameters, and you can also write back the response.

Decode the request parameters through ctx.Bind :


1
2
3
4
5
6

msg := &proto.BenchmarkMessage{}
err := ctx.Bind(msg)
if err != nil {
return err
}

Write the response back through ctx.Write(msg) .

Reflection will not be used here. It will be more efficient if your codec uses something other than reflection.

use goroutine pool

By default, rpcx uses at least one goroutine for each request. In this way, in the case of high concurrency, there will be very huge goroutines on the server side. Although Go supports tens of thousands of goroutines, when there are many goroutines, (memory) resources are occupied very much. Garbage collection will also have a certain impact, so in high concurrency scenarios, using goroutine pools will improve performance to a certain extent.

Speaking of thread pools, there are already no less than ten implementations of goroutine pools (or worker pools) on the Internet. rpcx uses alitto/pond , and there is not much performance consideration, but from the convenience of use.

If you want to use goroutine pool, you can use server.WithPool(100, 1000000) :


1

rpcxserver := server.NewServer(server.WithPool (100 , 1000000 ))

The first parameter is the number of goroutines (workers), and the second parameter capacity is the largest pending request. If capacity is exceeded, new requests will be blocked.

The parsed request will be submitted to the goroutine pool.


1
2
3
4
5
6
7

if s.pool != nil {
s.pool.Submit( func () {
s.processOneRequest(ctx, req, conn, writeCh)
})
} else {
go s.processOneRequest(ctx, req, conn, writeCh)
}

Adjust process priority

Sometimes, your program may run mixed with other programs. Even if there are no other business programs, there may be some applications that come with the system. If we set the priority of the rpcx program higher, it will have more May be scheduled by the Linux system program, which will also bring more priority.

Each process in Linux has a nice value between -20 and 19. The lower the value, the more chance of getting Linux scheduling. By default, the nice value of a process is 0.

The nice value of a process can be modified by the nice command and the renice command to adjust the running order of the process.

nice [-n NI值] 程序starts the program with the specified priority.

The renice command can modify the NI value of a process while it is running, thereby adjusting the priority. renice [优先级] PID .

Of course, you can also dynamically set the priority of the process through the program: syscall.Setpriority(syscall.PRIO_PROCESS, 0, -20) , this command sets the priority of this process to -20 .

This brings a 10% to 20% performance improvement.

I also want to try to only set the listener.Accept goroutine, so that there are more opportunities to get the chance to process. But there is no system call to call to set it up. Tried runtime.LockOSThread() to let listener.Accept lock a thread for processing, which is useless.

Further optimization ideas

Of course, there are some optimization methods, but they have not been verified.

For example, redis uses a single-threaded (main logic) method to process requests, and for example, nginx uses multiple workers to process connections independently. This reduces concurrent lock requests.
rpcx can be changed to a single worker mode, and then start multiple workers to listen and process requests, which may also improve performance. However this has not been tested further.

This article is reprinted from: https://colobu.com/2022/08/25/some-small-optimizations-of-rpcx/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment