I rewrote my own C++ project in Rust: both languages ​​are torture!

Author | Strager

Translator | Ma Kewei

Planning | Chu Xingjuan

C++ is notorious for its long build times, and the “my code is compiling” joke in programming circles is one that C++ keeps alive.

Projects at the scale of Google’s Chromium can take up to an hour to build on new hardware and up to six hours on older hardware. Although there are also a large number of adjustments that can speed up the build speed, and there are many shortcuts that reduce the build content but are extremely error-prone, coupled with thousands of dollars in cloud computing power, the build time of Chromium is still close to ten minutes. I can’t accept this at all. How do people work every day?

Some say the same is true for Rust, where build times are just as painful. But is that the case, or is this just an anti-Rust propaganda ploy? Who really beats Rust or C++ when it comes to build times?

Build speed and runtime performance are very important to me. The shorter the cycle time to build tests, the more productive and happier I can program. I will stop at nothing to make my software faster and my clients happier. So, I decided to see for myself how fast Rust builds are, and the plan is as follows:

  1. Find a C++ project

  2. Take out part of the project by itself

  3. Rewrite C++ code to Rust line by line

  4. Optimized builds for C++ and Rust projects

  5. Compare the build and test times of the two projects

My conjectures are as follows (an educated guess, but not a conclusion):

  1. Rust has fewer lines of code than C++. Most functions and methods in C++ need to be declared twice: once in the header and once in the implementation file. But Rust doesn’t, so there will be fewer lines of code.

  2. The full build time for C++ is longer than Rust (Rust is better). In each .cpp file, it is necessary to recompile the #include function and template of C++. Although they all run in parallel, parallelism does not mean perfection.

  3. Incremental build times for Rust are longer than for C++ (and C++ is even better). Rust is compiled per crate (independent compilable unit), but C++ is compiled per file. So every time the code changes, Rust has to read more than C++. ·

What do you think about this? Here are my votes on Twitter:

42% think C++ will win, 35% agree “it depends”, and another 17% think Rust will surprise us.

So what was the result? Now let’s get to the point.

Writing test objects for C++ and Rust

find a project

Given that I’m going to spend the next month rewriting code, what’s the best way to do it? I think the following must be met:

  1. Little or no third-party dependencies (the standard library can be used);

  2. Works on Linux and macOS (I don’t care about build times on Windows);

  3. A large number of test suites (otherwise I would not be able to determine the correctness of the Rust code);

  4. FFI (Foreign Function Interface), pointers, standard or custom containers, functional classes and functions, I/O, concurrency, generics, macros, SIMD (Single Instruction Multiple Data), inheritance, etc. are all used to some extent.

In fact, the answer is also very simple, just find the projects I have been working on in the past few years. I’m using a JavaScript lexer, the quick-lint-js project.

quick-lint-js mascot Dusty

Intercept C++ code

The C++ part of the quick-lint-js project has more than 100,000 lines of code. It will take me half a year to change all of these to Rust. Why not just focus on the JavaScript lexical analysis part, which involves:

  • diagnostic system

  • Translation system (for diagnostics)

  • Various memory allocators and containers (e.g. bump allocator, SIMD-friendly strings)

  • Various functional class functions (e.g. UTF-8 decoder, SIMD intrinsic wrapper)

  • Auxiliary code for testing (such as custom assertion macros)

  • C API

Unfortunately, there is no concurrency or I/O involved in this part of the code, I can’t test the compile time overhead of async/await in Rust, but this is only a small part of the quick-lint-js project, so I don’t have to worry too much.

I started by copying all the C++ code into the new project, and then removed parts known to be irrelevant to lexical analysis, such as the analyzer and LSP server. I even accidentally deleted too many codes and ended up having to re-add them back. The C++ tests kept passing as I continued to cut the code.

After completely cutting out the part involving lexical analysis in the quick-lint-js project, there are about 17,000 lines of C++ code in the project.

rewrite code

As for how to rewrite the thousands of lines of C++ code, I choose to follow the steps:

  1. Find a module suitable for conversion;

  2. Copy-paste code, test, search and replace and modify some syntax, continue to run cargo (Rust’s build system and package manager) tests until the build tests pass;

  3. If this module depends on another module, then find the dependent module, proceed to the second step, and then return to the current module;

  4. If there are still modules that have not been converted, go back to the first step.

The problem that primarily affects build times for Rust and C++ is that C++’s diagnostic system is implemented with a lot of code generation, macros, constexpr (constant expressions), while I rewrote the Rust version with code generation, proc macros, Normal macros and a little bit of const implementation. It is rumored that the proc macro is very slow, and it is also said that the proc macro is slow because of the poor code quality. I hope the proc macro I wrote is ok (pray~).

After I finished writing, I found out that the Rust project is bigger than the C++ project. The Rust code is 17.1k lines, while the C++ only has 16.6k lines.

Optimizing Rust builds

Build time is important because I already optimized the build time of the C++ project before intercepting the C++ code, so now I only need to do the same optimization for the build time of the Rust project. Here are the entries that I think might improve Rust build times:

  • faster linker

  • Cranelift backend

  • Compiler and Linker Flags

  • Distinction between workspace and test layout

  • Minimize dependencies

  • cargo-nexttest

  • Using PGO to customize the toolchain

faster linker

My first step is to profile the build, and I use the -Zself-profile rustc flag. Of the two files generated by this flag, the run_linker stage stands out in one of them:

First round – Zself-profile results

I previously managed to optimize C++ build times with a switch to the Mold linker, will this work for Rust?

Linux: Linker performance is nearly identical. (the smaller the data, the better)

Unfortunately, although there is indeed an improvement on Linux, the effect is not obvious. How about optimization on macOS? There are two alternatives to the default linker on macOS, lld and zld, with the following effects:

macOS: Linker performance is almost unchanged . (the smaller the data, the better)

It can be seen that the effect of replacing the default linker on macOS is also not obvious. I suspect this may be because the default linker on Linux and macOS has done the best for my small project. These optimized linkers ( Mold, lld, zld) work really well on large projects.

Cranelift backend

Let’s go back to another report from -Zself-profile, where the LLVM_module_codegen_emit_obj and LLVM_passes stages stand out:

– Second round results of Zself-profile

It is rumored that the rustc backend can be changed from LLVM to Cranelift, so I rebuilt it again with the rustc Cranelift backend, and the -Zself-profile result looks good:

Second round results using Cranelife’s -Zself-profile

Unfortunately, Cranelife is slower than LLVM in actual builds.

Rust backend: LLVM is better than Cranelift by default . (Tested on Linux, the smaller the data, the better)

Update 7 Jan 2023: rustc’s Cranelift backend maintainer bjorn3 helped me see why Cranelift didn’t work well on my project: probably rustup overhead. If this part of the Cranelife effect may be improved, the results in the above image did not take any measures.

Compiler and Linker Flags

There are a bunch of options in the compiler that can speed up (or slow down) the build, let’s try them out:

  • -Zshare-generics=y (rustc) (Nightly only)

  • -Clink-args=-Wl,-s (rustc)

  • debug = false (Cargo)

  • debug-assertions = false (Cargo)

  • incremental = true and incremental = false (Cargo)

  • overflow-checks = false (Cargo)

  • panic = ‘abort’ (Cargo)

  • lib.doctest = false (Cargo)

  • lib.test = false (Cargo)

rustc flag: Fast builds are better than debug builds . (Tested on Linux, the smaller the data, the better)

Note: “quick, -Zshare-generics=y” in the figure is equivalent to “quick, incremental=true” and the “-Zshare-generics=y” flag is enabled, and the rest of the histograms do not have the mark “-Zshare-generics=y” is because the flag is not enabled, which means the nightly rust compiler is required.

Most of the options used in the image above are documented, but I haven’t found a link where someone wrote about adding -s. The subcommand -s strips all debugging information including static linking of the Rust standard library, allowing the linker to do less work, thereby reducing link times.

Workspace and Test Layout

Both Rust and Cargo offer some flexibility when it comes to the physical location of files. For my project, the following are three reasonable layouts:

In theory, cargo can parallelize calls to rustc if we split the code into multiple crates. Given that I have a 32-thread CPU on my Linux machine and a 10-thread CPU on the macOS machine, parallelization should reduce build times.

There are a number of places where tests in a Rust project can run from a crate:

Due to the existence of the dependency cycle, I can’t do a benchmark for the “test in the source code file” layout, but I have done benchmarks for other layout combinations:

Rust full build: Workspace layout fastest . (Tested on Linux, the smaller the data, the better)

Rust incremental builds: Optimal placement unknown . (Tested on Linux, the smaller the data, the better)

In the workspace setting, whether it is divided into multiple executable tests (many test exes), or merged into a single executable test, it seems to be able to win the first place. So in the future, let’s follow the configuration of “workspace + multiple executable files”.

Minimize dependencies

The splitting of multiple crates supports optional functions, and some optional functions are enabled by default. The specific functions can be viewed through the cargo tree command:

Let’s turn off the std function in one of the crates, libc, and test it to see if the build time changes.

Cargo.toml

 [ dependencies] +libc = { version = "0.2.138", default-features = false } -libc = { version = "0.2.138" }

After turning off the libc function nothing changes . (Tested on Linux, the smaller the data, the better)

The build time didn’t change anything, it’s possible that the std function doesn’t really matter much. Anyway, let’s move on to the next link.

cargo-nexttest

As a tool that is said to be “60% faster than cargo testing”, cargo-nextest is very suitable for my project where 44% of the code is tested. Let’s compare build and test times:

Linux: cargo-nextest slowed down testing . (the smaller the data, the better)

On my Linux box, cargo-nextest didn’t help, the output was fine, but…

Example cargo-nexest test output:

 PASS [ 0.002s] cpp_vs_rust ::test_locale no_match PASS [ 0.002s] cpp_vs_rust ::test_offset_of fields_have_different_offsets PASS [ 0.002s] cpp_vs_rust ::test_offset_of matches_memoffset_for_primitive_fields PASS [ 0.002s] cpp_vs_rust ::test_padded_string as_slice_excludes_padding_bytes PASS [ 0.002s] cpp_vs_rust ::test_offset_of matches_memoffset_for_reference_fields PASS [ 0.004s] cpp_vs_rust ::test_linked_vector push_seven

What about macOS?

macOS: cargo-nextest speeds up build testing . (the smaller the data, the better)

On my MacBook pro, cargo-next really improves the speed of building tests. But why not on Linux? Is it related to hardware?

In the tests below, I’ll be using cargo-nextest on macOS, but not on Linux.

Using PGO to customize the toolchain

I’ve found that C++ compilers built with Profile-Guided Optimization (PGO, also known as FDO) can have significant performance gains. So let’s try to optimize the Rust toolchain with PGO and further optimize rustc with LLVM BOLT with -Ctarget-cpu=native.

Rust toolchain: Custom toolchains are the fastest. (Tested on Linux, the smaller the data, the better)

Check out this toolchain build script if you’re curious. Might not work on your machine, but as long as mine works: https://ift.tt/e4NoxQu

Compared to the C++ compiler, the Rust toolchain shipped via rustup appears to be already optimized. The combination of PGO and BOLT brought less than 10% performance improvement. But it is good to have an improvement, so we will continue to use this fastest toolchain in the subsequent competition with C++.

The Rust custom toolchain I built for the first time was 2% slower than Nightly. I repeatedly adjusted various options in Rust config. It took a few days of struggling to get the performance of the two evenly matched. As I was finalizing this post, I did a rustup update, pulled the git project, and rebuilt the toolchain from scratch. Turns out my custom toolchain was faster this time! It’s possible that I committed the wrong code in the Rust repository…

Optimizing C++ builds

In the original C++ project quick-lint-js, I have used common means to optimize the compilation time, such as using PCH, disabling exceptions and RTTI, adjusting compilation flags, removing non-essential #includes, moving code out of headers, external Methods such as template instances. But there are other C++ compilers and linkers that I haven’t tried, and before we get into the C++ vs. Rust comparison, let’s pick the one that works best for us.

Linux: Custom Clang is the fastest toolchain . (the smaller the data, the better)

Obviously, GCC on Linux is a special case, while Clang performs much better. My custom build of Clang (built with PGO and BOLT, like the Rust toolchain) has significantly optimized build times over Ubuntu’s Clang, while libstdc++ builds slightly faster than the average libc++ speed.

So how does my custom Clang plus libstdc++ perform in C++ vs. Rust?

macOS: Xcode is the fastest toolchain . (the smaller the data, the better)

On macOS, the Clang toolchain that comes with Xcode seems to be better optimized than the Clang toolchain from the LLVM website.

C++20 modules

My C++ code uses #include, but what if I use the new import in C++20? Are C++20 modules theoretically supposed to make compilation super fast?

I tried C++20 modules in my project, but until January 3, 2023, CMake module support on Linux is so experimental that I can’t even run “hello world”.

Maybe mid-2023 C++20 modules will shine, which is great for someone like me who cares a lot about build times. But for now, I’ll continue to use the classic C++ #include to compare with Rust.

Comparing C++ and Rust build times

After rewriting the C++ project into Rust, and optimizing the Rust build time as much as possible, the question arises: which is faster, C++ or Rust?

Unfortunately, the answer is “it depends”!

Linux: Rust builds faster than C++ in some cases . (the smaller the data, the better)

On my Linux box, Rust does build faster than C++ in some cases, but in others it’s either as fast or slower. On the incremental lex benchmark, we modified a lot of source code, Clang is faster than rustc, but on other incremental benchmarks, rustc will overtake Clang again.

macOS: C++ builds are generally faster than Rust . (the smaller the data, the better)

But things are very different on my macOS machine. C++ builds are often much faster than Rust. In the incremental test utf-8 benchmark, we modify a moderate number of test files, and the compilation speed of rustc will slightly exceed Clang, but on other benchmarks including full builds, Clang is obviously better.

More than 17k lines of code

The project I benchmarked was only 17k lines of code, which is considered small, so what about the larger projects with more than 100,000 lines of code?

I copied and pasted the code of the largest module, the lexical analyzer, 8, 16, and 24 times for testing. Since my benchmarks also include the time to run the tests, I think build times should scale linearly even for projects that build instantaneously.

C++ complete build outperforms Rust by multiples . (Tested on Linux, the smaller the data, the better)

C++ incremental builds outperform Rust after multiples . (Tested on Linux, the smaller the data, the better)

Rust and Clang both scale linearly, which is nice.

As expected, modifying the C++ header files, ie incrementally diag-type, can significantly impact build times. And the build time scaling factor in other incremental benchmarks is low due to the presence of the Mold linker.

I’m disappointed by the scalability of Rust builds, even if it’s just a benchmark for incremental utf-8 testing, the addition of extraneous files shouldn’t make it impact build times so much. The crate layout used for testing is “workspace and multiple executables tested”, so utf-8 tests should be able to compile executables independently.

in conclusion

Is compile time an issue for Rust? The answer is yes. While there are some hints and tricks that can speed up compilation, there are no orders of magnitude improvements that make me very happy when developing Rust.

How does Rust compile time compare to C++? It’s really bad. Rust has even worse compile times than C++ on large projects, at least for my coding style.

Looking back at my original hypothesis, almost all the troops were wiped out:

  1. The Rust rewrite has more lines of code than C++;

  2. In terms of full build, C++ has similar build time on 17,000 lines of code and less build time on 100,000 lines of code than Rust;

  3. In terms of incremental builds, Rust has a shorter build time than C++ in some cases, a longer build time for 17,000 lines of code, and even longer build time for 100,000 lines of code.

Am I upset? really. During the rewriting process, I kept learning about Rust. For example, proc marco can replace three different code generators, simplify the construction pipeline, and make life easier for new developers. But I don’t miss header files at all, and Rust’s utility classes are really nice to use, especially Cargo, rustup, and miri.

But I decided not to convert the rest of the quick-lint-js project to Rust as well, but maybe I’ll change my mind if Rust’s build times improve significantly. Assuming, of course, that I haven’t been distracted by Zig.

notes

source code

Abridged C++ project source code, ported Rust (including different project layout), code generation scripts and benchmarking scripts, GPL-3.0 and above.

Linux machine

Name: strapurpCPU: AMD Ryzen 9 5950X (PBO; stock clocks) (32 threads) (x86_64) RAM: G.SKILL F4-4000C19-16GTZR 2×16 GiB (overclocked to 3800 MT/s) OS: Linux Mint 21.1 kernel : Linux strapurp 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/LinuxLinux Performance Manager: schedutilCMake: Version 3.19.1 Ninja: Version 1.10.2 GCC: Version 12.1. 0-2ubuntu1~22.04 Clang (Ubuntu): version 14.0.0-1ubuntu1Clang (custom): version 15.0.6 (Rust fork; code commit 3dfd4d93fa013e1c0578d3ceac5c8f4ebba4b6ec) libstdc++ for Clang: version 11.3.0-1ubuntu1~22.04 Rust66 stable version: .0 (69f9c33d7 2022-12-12) Rust Nightly: Version 1.68.0-nightly (c7572670a 2023-01-03) Rust (custom): Version 1.68.0-dev (c7572670a 2023-01-03) Mold: Version 0.9.3 (ec3319b37f653dccfa4d1a859a5c687565ab722d)

binutils: version 2.38

This article is transferred from https://www.techug.com/post/i-rewrite-my-c-plus-project-with-rust-both-languages-are-very-painfulae86da8453a4307e5578/
This site is only for collection, and the copyright belongs to the original author.