My JavaScript is faster than your Rust

Josh Urbane is a long-time software architect who loves sharing technical insights on social media. Recently, he wrote an article about his experience winning a bet with a rookie developer, and the conclusion that “my JavaScript is faster than your Rust” comes from this bet. His story may illustrate the importance of operational strategy in R&D practice.

For me, the most enjoyable part of being a software architect is guiding developers to understand the latest concepts and influencing their technical judgment. Aren’t some developers very arrogant, then use theory and reality to slap them in the face; architects are also responsible for creating a learning atmosphere that is entertaining and entertaining to help young and vigorous developers grow up and mature.

The thing that makes me feel most uncomfortable is the technical advice of a stunned young developer who suddenly jumped out and wanted to challenge me (from a developer’s point of view, architects are a bunch of people who are always making “wrong” suggestions. idiot), and bet all of your money in insisting your own way is better.

The thing is, I’ve been doing this for a long time, and I know what the correct answer is without verifying. So come on, let’s see the real chapter under our hands. I recorded this story and organized it into today’s article a few years later.

Stud is a kind of “wisdom”

To be honest, it’s been a few years since I’m going to talk about it, so I can’t remember many details. The general situation is to combine the knowledge reserve of the team at that time, the available tool library and the original technical debt, the suggestion I give is to let everyone use Node.js.

A new junior developer is very confident in his computer science bachelor’s degree, and wants to use “show skills” to frustrate my spirit. They heard that I was a computer science minor, so they thought I didn’t understand the underlying principles of computers at all. In fact, when I first graduated, I thought I knew it very well, but after working in this business for a long time, I feel more and more that computer systems are like magic…

His confidence is not without reason. This conclusion is like “C++ is faster than JavaScript”, which is basically an industry consensus. But as a typical architect, I still insist on “it depends”.

More specifically, “well-optimized C++ does run faster than JavaScript with the same level of optimization”, after all, JavaScript has an unavoidable execution overhead (even then, we can compile the code into static programs to get high close to C++ performance). Anyway, that’s it, let’s go.

Surprisingly, the JavaScript code is indeed a bit faster than the C++ version, and from an architectural design point of view, the JS version can be maintained by the current team without resorting to the technical capabilities of other departments.

It’s okay, I’m not 100% sure I’m right, but considering that the memory object size in this use case may be dynamic, and the young developer is really inexperienced, I’m willing to take a bet Bundle.

JS is faster than C++, how to achieve it?

I guess most developers don’t understand such a result. This goes against the basic principle that “compiled” languages are faster than “interpreted” languages, and “static” programs are faster than “VM” programs. But note that these are experiences, not truths.

As I mentioned before, “optimization” is what determines speed. After all, even if the performance advantage of the C++ language itself is strong, poor writing quality will make the program bog down. On the other hand, Node.js (using the C++/C-based V8 and libuv libraries) has more room for optimization, so the actual speed is not bad. It can even be said that the performance of JS may be better for the same poor quality JS and C++ programs. But this is just a macro discussion, let’s look at the details below.

memory is the key

Most developers should be familiar with the concept of stack and heap, but this understanding is basically only superficial – for example, only know that the stack is linear, and the heap is a “clump” with pointers (not a strict term, you can understand it).

More importantly, the concepts of stack and heap correspond to multiple implementations and methods. The underlying hardware doesn’t know what a “heap” is, because the way memory is managed is defined by software, and choices in memory management are bound to have a huge impact on the final performance of a program.

You can also dig deeper on this issue, which is very meaningful and valuable. Modern hardware and kernels are quite complex and often contain a number of special-purpose optimization mechanisms, such as more efficient use of advanced memory layouts. This means that software can (or must) borrow memory management capabilities provided by hardware. Then there are the effects of virtualization… I won’t go into too much detail here.

The Heart of Magic: Garbage Collection

Yes, the Node.js solution will definitely take longer to start because it requires a JIT compiler to load and run the script. But once loaded, Node.js code actually has a mysterious advantage – garbage collection.

In C++ programs, however, applications tend to create dynamically sized objects in the heap and then delete them later. This means that the program’s allocator has to allocate and free memory in the heap over and over again. This operation is inherently slow, and the actual performance is largely determined by the algorithm in the allocator. In most cases, dealloc is extremely slow, and even a reduced alloc is not much better.

For Node.js programs, the trick is that the program only runs once and then exits. Node.js also runs the script and allocates the necessary memory, but subsequent deletions are deferred by the garbage collector picking idle time.

Granted, garbage collection is not inherently better or worse than other memory management strategies (everything is a trade-off), but in this particular program we’re betting on, garbage collection does improve performance significantly because this program simply doesn’t. really worked. We just stuff a bunch of objects into memory and discard them all at once on exit.

Garbage collection definitely has a price, and the memory capacity occupied by Node.js processes is significantly larger than that of C++ programs. This is the classic dilemma of “saving cpu = spending memory” and “saving memory = spending cpu”, but my goal is to slap that kid in the face, so it doesn’t matter if you spend some memory.

And I was able to win because the opponent chose a naive strategy. In fact, the best way for him to win is to add a memory leak, deliberately keeping all allocations in memory. The memory footprint of the C++ program is still smaller, but the speed is much faster than before. Alternatively, he can use designs such as allocating buffers to the stack to further improve performance, which is often used in actual production.

There is also the question of how to choose a performance benchmark. Generally speaking, what everyone compares is the number of operations per second. JS here is a good example of C++, proving that “understanding the overall performance cost before making a choice” is often more reliable. In software architecture, we must always pay attention to the “total cost of ownership” at the resource level.

Into the Modern: Rust Comes to Play

Rust is one of my current favorite languages. It offers many modern features, is fast, has a good memory model, and generates fairly safe code.

Rust certainly isn’t perfect, it takes longer to compile and involves a lot of weird semantics, but overall it’s worth recommending. You can flexibly control how memory is managed in Rust, but its “stack” memory always follows the ownership model, which is the basis for the high security performance it is proud of.

One of the projects I’m currently working on is a FaaS (Function as a Service) host written in Rust that executes WASM (WebAssembly) functions. It can execute various isolated functions quickly and securely, minimizing the operational overhead of FaaS. It’s also fast, capable of handling 90,000 simple requests per second per core. More importantly, its total memory footprint is only around 20 MB, which is quite exaggerated.

But what does this have to do with the Node.js vs. C++ gamble?

In short, I regard Node.js as a “reasonable” performance benchmark (Go is a “dream” benchmark, and its performance is definitely not comparable to those languages designed for web services, so don’t reduce dimensionality here. ), after all, the performance of the early C++ version of our program was really poor, and the only benefit was that the memory footprint was less than one-tenth of the Node.js version.

While there’s nothing wrong with getting the code running first and then optimizing it, losing out to JavaScript in a “fast” language like C++ is definitely frustrating. And the reason why I dare to stud on the spot is based on the basic judgment of the obvious bottleneck. The bottleneck is memory management.

Each guest function is allocated to an array of memory, but allocating memory within the function, and copying data between function memory and host memory certainly has a significant performance overhead. With dynamic data being thrown around, the allocator is literally getting punched in all directions. As for the solution, cheat!

Add heaps, two heaps, three heaps…

Essentially, the heap represents the portion of memory that the allocator uses to manage the map. The program requests N memory cells, the allocator searches the available memory pool for these cells (or asks the host for more memory) and stores which cells are occupied, and then returns a pointer to the location of that memory. When the program runs out of memory, the allocator is told, and the allocator updates the map to see which cells are now available again. Pretty simple, right?

But trouble comes when we need to allocate a large number of memory cells with different lifetimes and sizes. This must create a lot of fragmentation, which in turn magnifies the cost of allocating new memory. Then the performance penalty starts to occur, after all, the function of the allocator is too simple, it is just looking for an available storage location.

There is obviously no good solution to this problem. Although there are many optional allocation algorithms, they still have their own trade-offs and require us to choose the most suitable method according to the characteristics of the use case (or, like most developers, directly use the default options) .

Then there’s cheating. There is more than one way to cheat: for FaaS, we can release dealloc for each run and clear the entire heap after each run; we can also use different allocators at different stages of the function’s life cycle, such as explicitly distinguishing Initialization phase and run phase. This way, whether it is a clean function (which is reset to the same initial memory state on each run) or a stateful function (which retains state between runs), you get a corresponding and optimized memory policy.

In our FaaS project, you end up building a dynamic allocator that chooses an allocation algorithm based on usage, and the actual selection persists between runs.

For “low usage” functions (that is, most functions), just use a simple stack allocator with a pointer to the next free slot. When calling dealloc, if the unit is the last unit on the stack, the pointer is rolled back; if it is not the last unit, there is no operation. When the function completes, the pointer will be set to 0 (equivalent to Node.js exiting before garbage collection). If the function’s dealloc failures and usage reach a certain threshold, other allocation algorithms are used for the rest of the calls. As a result, this scheme can significantly speed up memory allocation in most cases.

Another “heap” is used in the runtime – the host (or function shared memory). It uses the same dynamic allocation strategy and allows to bypass the copy step in earlier C++ versions and write directly to function memory. This allows I/O to copy guest functions directly from the kernel, bypassing the host runtime, significantly increasing throughput.

Node.js vs Rust

After optimization, the Rust FaaS runtime ended up being more than 70% faster than our Node.js reference implementation, with less than one-tenth the memory footprint.

But the key here is “optimized”, and its initial implementation is actually slower. Our optimizations also require some restrictions on WASM functions, which are completely transparent during compilation, and incompatibilities are rare.

The biggest advantage of the Rust version is that the memory footprint is small, and the saved RAM can be used for other purposes such as caching or distributed memory storage. This means that I/O overhead is further reduced and production runs more efficiently, even more so than pulling up the CPU configuration.

We have more optimization plans in the future, but mainly to address some issues in the host layer that have significant security implications. Although it has nothing to do with memory management or performance, it supports the view of the “Rust is faster than Node” party after all.

Summarize

In fact, I can’t come to a specific conclusion when I write the whole text. Here are just a few superficial views:

Memory management is interesting, and every approach is a trade-off. With the right strategy, any language can get huge performance gains.
I still recommend that you use Node.js and Rust flexibly based on your actual goals, so I won’t make a judgment call here. JavaScript is indeed more portable and is especially suitable for cloud-native development scenarios; but if you are particularly concerned about performance, then Rust may be a better choice.
I’m talking about JavaScript throughout, but I’m actually referring to TypeScript here.

In the final analysis, everyone has to choose the most suitable technical solution according to the actual situation. The more we understand the different characteristics of different stacks, the easier it is to choose.

Original link:

https://medium.com/@jbyj/my-javascript-is-faster-than-your-rust-5f98fe5db1bf

The text and pictures in this article are from InfoQ

This article is reprinted from https://www.techug.com/post/my-javascript-is-faster-than-your-rust.html
This site is for inclusion only, and the copyright belongs to the original author.