Getting to know the Go language

Original link: https://blog.gotocoding.com/archives/1767?utm_source=rss&utm_medium=rss&utm_campaign=%25e5%2588%259d%25e8%25af%2586go%25e8%25af%25ad%25e8%25a8%2580

In fact, strictly speaking, it is not a first acquaintance. About 15 years ago, I learned the grammar of Go language once.

Due to the poor reputation of Go language GC at that time, I didn’t study it too seriously, but I just learned the syntax roughly.

The impression of Go is that there is no other special impression except that the syntax is a bit weird.

This time, I took a closer look at Go (4 weeks so far). Re-examining the Go language, I found some things that are different from other languages, and how I feel about it.

The first is GC.

I recalled carefully, Go turned out to be the first compiled language I know with GC (IL2CPP does not count), the compilation here is not the kind of compiling code into bytecode and then interpreting it, it is actually compiled into a language that can be used in Native code executed on the CPU.

This leads to a problem, when the Go compiler compiles the code, it inserts GC-related code everywhere in the code.

When debugging at the source level, there is generally not much problem, and the debugger will intelligently skip the code inserted by the compiler.

However, when I want to see how a line of code is executed at the assembly level (this is a habit brought over from the C language in the past, generally writing a line of C syntax, can basically predict the generated non-optimized assembly code), I found that The code is littered with Go inserts, making the code a lot less readable.

And some languages that use virtual machines such as Lua, Java, etc. There is a one-to-one correspondence between OpCode and logic code, and GC-related details are encapsulated inside the virtual machine. This is very helpful for us to analyze the entire code execution flow.

Of course, this may be exactly what Go wants, maybe it doesn’t want you to do such low-level optimizations 😀

Next is assembly.

Yes, I was shocked when I learned that Go disassembled Plan9 assembly.

This means that even if I can break the compiler insertion barrier, I still can’t see the X86 instructions that end up executing, and I still don’t know how the code ends up executing on the CPU.

For the simplest example, everyone says that goroutines have less switching overhead than threads, but I’ve always been skeptical of this point of view.

According to my X86 assembly experience, in the optimization stage of the compiler, variables on the stack are always optimized to registers as much as possible, and even the first few parameters are passed through registers. Let’s take a look at some simple C code and the corresponding assembly.

 int foo(int a, int b) { int e = a / b; return a * b * e; }

 foo: .LFB0: .cfi_startproc mov eax, edi cdq idiv esi imul edi, esi imul eax, edi ret .cfi_endproc

You can see that the e variable in the foo function is not on the stack, but is directly allocated a register.

This leads to a problem that when a thread is preempted, the registers being used in the context of its current entire callstack are indeterminate.

Therefore, in linux, when Thread is swapped out, it needs to save a full set of registers (EAX, EBX….).

But all the Go forums say that goroutine switching is very cheap, he needs to save less registers, some even say he only needs to save 3 registers.

In fact, I believed this at first. If the switch point of goroutine is always performed when the function is called, he can completely reduce the number of “callee saved registers” of ABI to 3.

However, later I saw that goroutines can be preempted at any time.

I don’t quite understand this. Whether it is Plan9 assembly or not, as long as it runs on a machine with the x86 instruction set, their optimization idea should be to use as many registers as possible instead of stacks.

Then, as long as my entire function uses more than 3 registers, preempting a goroutine in for {} will inevitably save the entire set of registers, so the so-called switching lightweight will not exist, at most, the stack space consumption will be less.

When I wanted to look further for the answer, Plan9 got in the way. I’m having a hard time determining if in Plan9’s ABI he only holds 3 registers, then when generating X86 assembly from Plan9, it’s not 3. Unless I disassemble the final binary to x86, which I’m obviously not familiar with go to this extent, this question will have to be put on hold for now.

And I have to say that there is really very little relevant information, whether in Chinese or English.

Go’s slice is an interesting data structure.

Multiple slices sometimes share memory, sometimes not. Whether it will be shared can be inferred from the code.

As I understand it, this is basically the result of a performance compromise.

I think this is a positive compromise, because sharing and not sharing can be traced. As long as you pay attention, it is generally not a big problem.

What I’m more curious about is the part where the slice interacts with the GC. First look at a small piece of code:

 type slice struct { array unsafe.Pointer len int cap int } func foo() []int { a := make([]int, 5) b := a[3:4] return b }

In this code, I put the data structure of the slice together with the sample code.

We can know from any reference book of go that it is approximately equal to the following C code:

 struct slice { int *array; int len; int cap; } func foo() slice { struct slice a, b; a.array = malloc(5 * sizeof(int)); a.cap = 5; a.len = 5; b.array = &a.array[3]; b.cap = a.cap - 3; b.len = 4 - 3; return b; }

Everyone knows that the GC of the Go language is a concurrent three-color garbage collection.

Now comes the problem, because b.array does pointer arithmetic (all languages with garbage collection will avoid supporting pointer arithmetic because it makes GC harder).

When the GC module goes to the Mark variable b, how does it find the first address of this memory, I have never figured this out.

The relevant documents were not found, and it seems that everyone is not very concerned about this matter ^_^!

The above are some implementation details. Let’s talk about the design at the language level.

The interface mechanism and CSP synchronization mechanism of the Go language are really refreshing.

As a static language, Go language actually implements DuckType, which is quite surprising to me.

What’s more surprising is that his interface mechanism also has a very strange mechanism.

Here is a piece of code to see the effect:

 package main import "fmt" type FooBar interface { foo() bar() } type st1 struct { FooBar n int } type st2 struct { FooBar m int } func (s *st1) foo() { fmt.Println("st1.foo", sn) } func (s *st1) bar() { fmt.Println("st1.bar", sn) } func (s *st2) foo() { fmt.Println("st2.foo", sm) } func test(fb FooBar) { fb.foo() fb.bar() } func main() { v1 := &st1{n: 1} v3 := &st2{ m: 3, FooBar: v1, } test(v1) test(v3) } /*输出结果： st1.foo 1 st1.bar 1 st2.foo 3 st1.bar 1 */

For the input of the first two lines, in fact, when I knew that Go supports DuckType, it was already predictable.

But the output of the last two lines is really amazing. This combination method not only glues two structs, but also glues two variables. If used well, it may have surprising power.

Of course, there is no such thing as a free lunch. The entire interface mechanism has runtime overhead, which occurs when the specific struct is converted to the corresponding interface object.

The specific overhead may have to wait until I am familiar with the Plan9 assembly and runtime library before I can solve the puzzle.

Let’s take a look at Go’s CSP programming. Go implements CSP programming through channels.

Again, let’s take a look at a small piece of code:

 package main import ( "fmt" "time" ) func main() { ch1 := make(chan int) ch2 := make(chan int) go func() { n := <-ch1 fmt.Println(n) ch2 <- (n + 1) }() go func() { fmt.Println("0") ch1 <- 1 n := <-ch2 fmt.Println(n) }() time.Sleep(1 * time.Second) }

No matter how many times it is executed, this code will be printed strictly in the order of “0, 1, 2”.

If you write a similar piece of code in C using threads and general message queues, it will not have this effect.

Each time the program runs, it may output different results.

I think this is the essence of CSP (Communicating Sequential Process).

Channels are not just for communication, they are also a means of synchronization. It will coordinate the goroutines on both ends to connect at a certain point , and then various concurrency. At this docking point , the goroutines on both ends of the channel are synchronized.

In the words of the Go language documentation, the goroutine on the sending end will not be woken up until one end of the channel has not taken the data.

Of course, the Go language also provides a buffered channel, which is more like a message queue.

But I understand that the buffered channel is suitable for some extreme occasions, and the CSP recommends the use of the unbuffered channel.

Almost all Go reference books will emphasize to us that to write concurrent programs, parallelism is determined at runtime. This sentence combined with the concept of CSP gave me a different feeling.

Taking the above code as an example, when the fmt.Println on line 13 is replaced with a more specific and heavy task, it is impossible for the two goroutines to have a chance to execute in parallel.

The sentence “To write concurrent programs, parallelism is determined at runtime” seems to tell me faintly: don’t be afraid of CSP causing the parallelism to drop. As long as you open enough goroutines, the parallelism will quickly increase at runtime. This is why the Go language has been encouraging us to write programs with concurrent structures.

Imagine we have 64 CPU cores with 1W goroutines.

Even if every 156 goroutines are glued together by the channel and have to be executed serially, the 64 CPU cores will still be full.

In the CSP mode, the load of the entire system will be more balanced, and there will be no situations in which the producer runs out of memory or the consumer starves to death.

At the same time, in theory, due to the existence of implicit synchronization, there will be fewer concurrent bugs.

The post Getting to know the Go language first appeared on the Return to Chaos Blog .

This article is reproduced from: https://blog.gotocoding.com/archives/1767?utm_source=rss&utm_medium=rss&utm_campaign=%25e5%2588%259d%25e8%25af%2586go%25e8%25af%25ad%25e8%25a8%2580
This site is for inclusion only, and the copyright belongs to the original author.