Run a Lua virtual machine serially with multiple threads

Original link:https://blog.codingnow.com/2022/09/multithread_lua_vm.html

ltask is a library that functions like skynet. I mainly use it for client environments. Since it was designed later than skynet, I experimented with many new ideas on it.

A requirement came across recently: a third-party C/C++ module embedded in our game engine provides several interfaces in the form of callbacks. When these modules are embedded in the ltask service, these callback functions are difficult to use all ltask features. For example, our IO operations are all in an independent service, and when the engine reads the file, it is likely to load the data asynchronously and remotely through the network. These third-party modules usually do not consider asynchronous IO operations, and they all provide a callback function for reading files in synchronous IO mode for users to fill in.

So, how can we suspend the current task in this C callback and wait for the asynchronous completion of IO?

At first sight it is impossible. The framework is not designed to support asynchronous mechanisms, and it is difficult for us to yield and suspend the current task from the C function. In this way, it is impossible for the same service to respond to external messages before the callback ends, and it is also impossible to use ltask’s scheduler to complete asynchronous IO work.

The Lua virtual machine is not (and need not) designed to be multi-thread safe, and we cannot put the runtime environment of this C module and its Lua bindings in a separate thread environment. Therefore, blocking the calling IO request in the callback means the blocking of the entire service.

A relatively simple solution is to open an additional channel for the IO service, without taking the messages inside the ltask. In C callback, communicate with IO service through this additional channel.


But I thought about this problem carefully and found that when the C callback does not return, the entire service may not necessarily be blocked along with it. In fact, as long as the ltask framework provides a little support, we can still make the service continue to work.

The current task scheduling model is: the scheduler assigns a worker thread to run a small task of a service (usually using lua resume to run a coroutine in lua vm a fragment), during this period, the service is in a busy state, other work The thread cannot get the service. This ensures that individual services are not reentrant.

When this period of task execution is completed and the message queue of the service is empty, the scheduler sets the service to an idle state. After a new message is delivered to the service’s message queue, the idle service will return to the scheduler.

If we give the service a new state: suspended in the C function. So, in fact, we can still make the service re-enter the scheduler. From the perspective of the scheduler of ltask, it freezes an entire worker thread, but the Lua virtual machine in the service is actually in a stagnant state (because it is stopped on the C side), as long as it does not return to the Lua side, the Lua virtual machine still works. All the scheduler does is cancel the busy state of the service. In this way, other worker threads can schedule it normally.

But the framework still needs a little more work. Because the working coroutine of the Lua VM is also busy, we cannot continue to run new code in this coroutine. So at this point, a new coroutine should be created in the Lua VM and replace the service’s working coroutine. Then let C callback wait for a condition variable of the operating system; until the business layer thinks that the matter has been completed and needs to return to the C Callback that was suspended at the beginning, then restore the environment, wake up the condition variable of the operating system, and return the execution flow.

During this process, the Lua VMs in the service are never in parallel, so it is safe. But a little weird is that Lua VM does work in a multi-threaded environment, with two different operating system threads running different Lua coroutines independently. In this way, it is guaranteed that one of the Lua coroutines can always keep its own C call stack.

This way of running the Lua VM, I call it multithreaded serial operation. Note that some constraints that were originally valid in Lua are no longer valid. For example, in a single-threaded environment, if you run a Lua function, the function does not modify a global state, and the function does not yield, then it can be considered that the global state will not change; but now, this function may be in the C side Suspended while another OS thread might modify this state from another lua coroutine.

In our preliminary practice, it is found that the corresponding C/C++ module can also be designed to be multi-thread safe. Although under this model, C code does not have parallel problems, but there will be many more reentrancy problems. That is, a C function that runs halfway through may hang and be re-entered by another system thread. Generally non-thread-safe C modules are likely to be ill-conceived in this regard. (Frameworks with callbacks in interfaces are more prone to reentrancy bugs)

I implemented this feature in the taskrun branch of ltask .

This article is reprinted from:https://blog.codingnow.com/2022/09/multithread_lua_vm.html
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment