Author: Ayende Rahien
Translator丨Mingzhishan
Planning丨Yan Yuanyuan
A few weeks ago, I wrote an article about the programming language Hare and its lack of generic data structures. Now, I don’t want to talk about this anymore, I want to talk about something more “generic”. In my opinion, any modern programming language aiming for high performance should support some form of generics, not supporting generics is a major mistake and a big contributor to increased complexity and performance penalty. Generic data structures are more optimized than one-off implementations, which I already talked about in a previous post.
Also, if you don’t support generics, you face a huge hurdle in optimization. You simply can’t build some helper programs. As an example, let’s talk about a topic I care most about – sorting. Handling sorted data is an important task of the database, and everything else is based on it. Let’s take a look at how to sort data (in memory) using several programming languages (using their definitions).
C language:
void qsort ( void * array, size_t count, size_t size, comparison_fn_t compare);
int comparison_fn_t ( const void *, const void *);
C++:
template < class RandomAccessIterator>
void sort ( RandomAccessIterator first, RandomAccessIterator last);
Java:
public static void sort( int [] a);
public static void sort( long[] a);
public static void sort( Object[] a);
C#:
public static void Sort<T> (T[] array);
Hare:
type cmpfunc = fn(a: const * void , b: const * void ) int ;
fn sort( [] void , size, *cmpfunc) void ;
Rust:
impl< T> [ T] {
pub fn sort(&mut self)
where
T: Ord,
}
Zig:
pub fn sort(
comptime T: type,
items: []T,
context: anytype,
comptime lessThan: fn (context: @TypeOf(context), lhs: T, rhs: T) bool,
) void
I only look at method declarations, not implementations. In fact, I don’t care how they are implemented right now. Suppose I want to sort an array of integers, what would be the result using these languages?
They can be divided into the following categories:
C and Hare will ask you to write:
int cmp_asc_int ( const void *a, const void *b) {
return *( int*)a > *( int*)b;
}
qsort( array, len, sizeof( int), cmp_asc_int);
That is, we pass a function pointer to the sorting routine, which will be called on each comparison.
C++, C#, Rust, Zig optimize routines. When called, it looks like this:
std::sort( array.begin(), array.end());
The idea is that the compiler can emit code for the call. Unlike the function that has to be executed once per call, the comparison operation is usually inlined and the call cost is completely eliminated.
Java is the only one of these languages that takes a different approach. Instead of using generics at compile time, it dispatches code to optimized routines based on runtime types. Of course, this means that the programmer has to write the same sorting code multiple times.
It’s important to note that this is nothing new. There was a discussion when the Go language added generic support, and it can be seen from the benchmark that the generic version has a 20% performance improvement. This is because invocation overhead is avoided and more opportunities for optimization are provided to the compiler.
We can see how a relatively simple decision (to make the language support generics) can have a huge impact on performance.
The opposite view is that we can always specialize code as needed, right? but it is not the truth. If you have generics, you get this behavior for free, but if you don’t, that’s not the case.
I develop databases for a living and we usually analyze the performance of our sorting code at the assembly level. I believe that almost every database developer does this. Sorting performance is critical to all database behavior. I stumbled across an article on Postgres performance optimization that had an interesting topic on this issue. They changed the implementation of sorting from using function pointers to direct calls. You can see the committed code here. Here is a screenshot of the code:
Postgres is 25 years old, and this is a well-known weakness of C relative to C++. Postgres makes a lot of sort calls, and this is an easy place to optimize performance.
As for the optimization effect, from this blog post, it can be seen that the optimization improves the overall performance by 4% to 6%. For those specific routines, the effect is quite astonishing.
For a 25-year-old codebase, a relatively simple change can bring about a 6% performance improvement, which is very rare.
But why would I say it this way?
Because when I read this blog post, the optimizations it mentioned resonated strongly with the previous discussion on generics. This is a good case study for this problem, because if the language (C for Postgres) doesn’t provide generics support in any meaningful way, optimization is hard and expensive.
Modern programming languages that aim for performance should take this into account when doing language design. If you don’t, the user will have to do something similar to what Postgres is doing. As we’ve just seen, these kinds of things are not perfect.
No generics means users have to put performance on the shelf.
In fact, almost all modern programming languages that care about high performance have generics. The one exception I can think of is Java, because it opted for backward compatibility when adding generics.
I took this article as a supplemental conclusion to my last article on generic data structures, and I think the end result is obvious. If you want a high-performance system, you should choose a programming language that allows you to express your logic concisely, and generics are a necessary tool to achieve this simplicity.
The text and pictures in this article are from AI Frontline
This article is reprinted from https://www.techug.com/post/modern-programming-languages-need-generics/
This site is for inclusion only, and the copyright belongs to the original author.