In this post, I’m going to start by looking at what js has to offer to achieve computationally intensive tasks, and extend that to processes and threads, and then to golang, in order to get a deeper understanding of concurrency.
The requirement is this, a Nodejs web backend that needs to perform a data processing step that takes place in memory and is very time consuming. The project started four services with pm2, and with pm2’s observe command you can see that each scenario fills up the cpu of one service, and running it four times fills up all four services, and the whole service is paralyzed behind it, unable to process new requests.
As we all know, js is based on an event loop mechanism to achieve non-blocking IO effects, and js is single-threaded. This means that JS can only handle one thing at a time. If a task occupies the js thread for too long, js can’t handle subsequent requests, which means the service hangs. Thus, cpu-intensive tasks are a fatal problem in js.
worker threads
和child process
Since this is all a single-threaded pot, it would be nice to have a new thread or process to handle this cpu-intensive task, freeing up the main js thread so that the service can handle the request properly. So, introduce two tools in js that handle concurrent computation:worker threads
和child process
. As the name suggests, the former is a new thread and the latter is a new process.
Here’s a quick refresher on the relationship between threads and processes. Just remember the phrase.Processes are larger than threads, processes are isolated from each other, and threads share memory with each otherA Nodejs process, by default, has only one main thread. A Nodejs process, by default, has only one main thread, but there are other threads as well, such as I/O threads and, as we’ll talk about soon, worker threads (worker threads
)。
Without further ado, let’s get right to the code. Here we define a plain and simple cpu-intensive task, i.e., a 1e10-level accumulation task:
// CPU 密集型任务:计算 1 到 N 的总和
function heavyComputation(n) {
let sum = 0;
for (let i = 1; i <= n; i++) {
sum += i;
}
return sum;
}
worker threads
worker
The code for this is as follows:
// worker.js
const { parentPort, workerData } = require("worker_threads");
// CPU 密集型任务:计算 1 到 N 的总和
function heavyComputation(n) {
let sum = 0;
for (let i = 1; i <= n; i++) {
sum += i;
}
return sum;
}
// 计算并发送结果回主线程
const result = heavyComputation(workerData);
parentPort.postMessage(result);
// -----------------
// main.js
const { Worker } = require("worker_threads");
function runWorker(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker("./worker.js", { workerData });
worker.on("message", resolve); // 接收子线程的结果
worker.on("error", reject); // 子线程发生错误
worker.on("exit", (code) => {
if (code !== 0) {
reject(new Error(`Worker stopped with exit code ${code}`));
}
});
});
}
// 调用 worker 执行 CPU 密集型任务
(async () => {
try {
console.time("1");
console.log("Main thread: Starting CPU-intensive task...");
const result = await runWorker(1e10); // 计算 1 到 100000000 的总和
console.log(`Main thread: Task result is ${result}`);
console.timeEnd("1");
} catch (error) {
console.error("Error:", error);
}
})();
console.log("other task");
Implementation results:
Main thread: Starting CPU-intensive task...
other task
Main thread: Task result is 50000000000067860000
1: 11.216s
Note that the above approach wraps the worker in a promise, so that js treats runWorker as an asynchronous event and puts it in the event queue. When resolve is called, that means the worker has finished executing, and js takes the callback event resolve out of the task queue and executes it. In the code above, theworker.on("message", resolve)
tantamountworker.on("message", (result)=>{resolve(result)})
。
Back to the worker, it’s a thread that is started to perform cpu-intensive tasks, freeing up the main thread.
child process
// compute.js
process.on("message", (n) => {
// CPU 密集型任务:计算 1 到 N 的总和
function heavyComputation(n) {
let sum = 0;
for (let i = 1; i <= n; i++) {
sum += i;
}
return sum;
}
const result = heavyComputation(n);
process.send(result); // 将结果发送回主进程
});
// ------------------------
// main.js
const { fork } = require("child_process");
// 创建子进程
const computeProcess = fork("./compute.js");
// 监听子进程消息
computeProcess.on("message", (result) => {
console.log(`Main process: Task result is ${result}`);
console.timeEnd("1");
computeProcess.kill(); // 任务完成后杀死子进程
});
// 发送任务数据到子进程
console.time("1");
console.log("Main process: Starting CPU-intensive task...");
computeProcess.send(1e10); // 计算 1 到 100000000 的总和
console.log("other task");
Print results:
Main process: Starting CPU-intensive task...
other task
Main process: Task result is 50000000000067860000
1: 11.116s
As you can see, both execute roughly the same events.
both sides compare
Processes, because of their isolation from each other, are suitable for use when executing external programs or scripts, or in scenarios where standard input/output is required for message passing.
Threads, on the other hand, are preferred for cpu-intensive tasks because of their low context switching overhead.
Nodejs vs Go
Go is a compiled language and has exceptional performance. Write the above accumulation task in go:
package main
import (
"fmt"
"time"
)
func main() {
s := 0
start := time.Now()
for i := 0; i < 1e10; i++ {
s += i
}
end := time.Since(start)
fmt.Println("1", end)
}
It only took 6 seconds! That’s half the speed of node! So how fast would it be to call a go-compiled binary via child_process?
Change the main.js of the child process above to:
const { spawn } = require("child_process");
// 定义计算任务参数
const target = 1e10;
// 创建子进程调用 Go 编译的二进制文件
console.time("1");
console.log("Main process: Starting CPU-intensive task...");
const computeProcess = spawn("./compute.exe", [target.toString()]); // 假设编译后的文件名为 `compute`
computeProcess.stdout.on("data", (data) => {
console.log(`Main process: Task result is ${data}`);
});
computeProcess.stderr.on("data", (err) => {
console.error(`Main process: Error occurred: ${err}`);
});
computeProcess.on("close", (code) => {
console.timeEnd("1");
console.log(`Main process: Subprocess exited with code ${code}`);
});
console.log("other task");
Write the go program at the same time:
package main
import (
"fmt"
"os"
"strconv"
)
func main() {
if len(os.Args) < 2 {
fmt.Fprintln(os.Stderr, "Error: Missing argument")
os.Exit(1)
}
// 将输入参数转换为整数
n, err := strconv.ParseInt(os.Args[1], 10, 64)
if err != nil {
fmt.Fprintln(os.Stderr, "Error: Invalid number")
os.Exit(1)
}
// 计算从 1 到 n 的总和
var sum int64
for i := int64(1); i <= n; i++ {
sum += i
}
// 输出结果
fmt.Println(sum)
}
Compile compute.go into a binary file by running the following command in a terminal:
go build -o compute.exe compute.go
node main.js
Run the main program and the result:
Main process: Starting CPU-intensive task...
other task
Main process: Task result is -5340232216128654848
1: 3.379s
Main process: Subprocess exited with code 0
Significantly reduced the time. The accumulation result was incorrect, though, because an integer overflow occurred.
Disclaimer: The above test data may not be correct and is for reference only.
summarize
Nodejs has long had a reputation for being better at IO-intensive tasks than cpu-intensive ones. This article confirms that claim with detailed examples, and compares it to Go, which is much more efficient than Nodejs.
Reference Article: