Permalink to this article – https://ift.tt/BDr34SW
In the previous article , “Understanding Go Inline Optimization by Example” , we explored the inline optimizations that the Go compiler performs mid-compilation. The inline optimization is based on the IR intermediate representation, but there is more than one IR representation in the Go compilation process, which is consistent with the explanation at the beginning of Chapter 6 “Intermediate Code Generation” in the “Compilation Principles (Second Edition)” of the Dragon Book. , that is, in the process of translating a program in a given source language into a specific target machine code, a compiler may construct a series of intermediate representations (IR), as shown below:
High-level intermediate representations are closer to the source language, while low-level intermediate representations are closer to the target machine. In the Go compilation process, if the IR used for inline optimization is the high-level intermediate representation, then the low-level intermediate representation is the intermediate code form that supports static single assignment (SSA).
In this article, we will continue down the road of back-end optimization of the Go compiler, let’s meet static single assignment (SSA) .
1. History of Static Single Assignment (SSA)
Static Single Assignment (SSA), also known as Single Static Assignment, is a representation of intermediate code (IR) , or an attribute of intermediate code. Researcher: Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck, 1988.
IRs with SSA properties all have the following characteristics:
- Every variable needs to be defined before it can be used
- Each variable is assigned exactly once (making a variable’s value independent of its position in the program)
Here is a simple example (pseudocode):
y = 1 y = 2 x = y
Converted to SSA form as:
y1 = 1 y2 = 2 x1 = y2
We see that since SSA requires that each variable can only be assigned a value once, after conversion to SSA, the variable y is represented by y1 and y2. The larger the serial number, the newer the version of y. From this three-line code, we can also see that at the SSA level, the line of code y1 = 1 is a line of dead code, that is, the code that has no effect on the result, and can be shifted during intermediate code optimization. get rid of.
In 1991, Ron Cytron and Jeanne Ferrante, also from IBM Research, and the three previous researchers gave a fast algorithm for building SSA , which further promoted the rapid application of SSA in the field of compilers.
The proposal and subsequent popularity of SSA is precisely because the intermediate code in the form of SSA has a good optimization space. Based on SSA, some new compiler optimization algorithms can be opened or existing optimization algorithms can be enhanced. Therefore, since the proposal of SSA, various mainstream languages Compiler backends are gradually supporting SSA, including GCC, llvm, hotspot JVM, v8 js, etc. SSA has also become a de facto standard for IR representation.
So when did the Go language begin to associate with SSA? Let’s continue reading.
2. Go and SSA
Compared with GCC and LLVM, the Go compiler is still relatively young , so the time for SSA to join Go is not too long.
The work of Go SSA started before the Go 1.5 version implemented bootstrapping. At the beginning of February 2015, Dr. Keith Randall, a core member of the Go team responsible for the compiler backend, proposed a work plan for Go to support SSA on the golang-dev google group :
“我想从目前基于语法树的IR转换到更现代的基于SSA的IR。有了SSA IR,我们可以实现很多在当前编译器中难以做到的优化” - Keith Randall
At the same time, Dr. Keith Randall also wrote the “New SSA Backend for the Go Compiler” document, which specifically introduced the reasons for Go to support SSA and the implementation plan in several steps.
Dr. Keith Randall gave three reasons for why he chose to implement SSA IR by himself, instead of converting to the IR forms supported by gcc, llvm, etc. at the time and using mature backends for intermediate code optimization:
-
Considering the speed of Go compilation: The Go team and community have a special preference for compilation speed. Randall’s goal is to design a linear-time SSA algorithm to achieve fast SSA optimization, but IRs such as gcc and llmv obviously do not give additional consideration to speed. ;
-
From the perspective of functional integrity: the Go runtime requires an accurate stack frame map (the map of stack frame) to support GC and stack copying, which are not provided in gcc and llvm;
- Considering the compiler experience of Go core developers: If irs such as llvm and gcc are used, obviously Go core developers also need to rely on llvm or gcc when compiling go. It is difficult for them to say that this additional dependency is Experience friendly.
On March 1, 2016, Keith Randall merged the dev.ssa branch that supports ssa into the mainline of the Go project, just after the Go 1.7 master branch was opened.
In Go 1.7 , Go officially supports SSA, but due to time constraints, Go 1.7 SSA only supports optimizations for the amd64 architecture. Even so, after Go supports SSA, Keith Randall’s benchmark shows a 12% performance improvement and a 13% smaller code snippet:
Figure: go 1.7 benchmark (picture from Dr. Keith’s slide)
When Go 1.7 was officially released, its release document stated that the performance of Go programs was improved by more than 5%-35% due to support for SSA . From this point of view, the implementation of Go SSA has achieved the expected goal of Dr. Keith Randall, and also laid the foundation for the subsequent continuous optimization of the Go compiler.
In the Go 1.8 release in February 2017, the support of Go SSA was extended to all other CPU architectures supported by Go, including arm and arm64, mips and mips64, ppc64, etc.
After understanding the evolution of Go SSA, let’s briefly talk about the implementation of SSA in the Go compiler.
3. Convert to SSA
Let’s first take a look at where the conversion to SSA and SSA optimizations are in the compilation process:
Figure: The link of Go SSA (picture from Dr. Keith’s slide)
The above picture is a picture of Dr. Keith’s slide at the 2017 gophercon conference. This picture clarifies the generation of SSA form and the link of SSA optimization. However, in newer Go versions, there is also an ir that is different from the original abstract syntax tree before convert to SSA (eg: Go 1.19), and SSA is converted from this ir.
From the code point of view, the conversion of ir to SSA form occurs in the following links (Go 1.19 version code, other versions may have different code location and content):
// $GOROOT/src/cmd/compile/internal/gc/main.go func Main(archInit func(*ssagen.ArchInfo)) { base.Timer.Start("fe", "init") defer handlePanic() archInit(&ssagen.Arch) ... ... // Compile top level functions. // Don't use range--walk can add functions to Target.Decls. base.Timer.Start("be", "compilefuncs") fcount := int64(0) for i := 0; i < len(typecheck.Target.Decls); i++ { if fn, ok := typecheck.Target.Decls[i].(*ir.Func); ok { // Don't try compiling dead hidden closure. if fn.IsDeadcodeClosure() { continue } enqueueFunc(fn) fcount++ } } base.Timer.AddEvent(fcount, "funcs") compileFunctions() ... ... }
In Main, we see that the code will put all Target.Decls (functions) into the queue (compilequeue) through enqueueFunc, and then call compileFunctions to realize the conversion of each function from AST ir to SSA form, compileFunctions is in compile.go, its implementation as follows:
// $GOROOT/src/cmd/compile/internal/gc/compile.go func compileFunctions() { if len(compilequeue) == 0 { return } ... ... // By default, we perform work right away on the current goroutine // as the solo worker. queue := func(work func(int)) { work(0) } ... ... var compile func([]*ir.Func) compile = func(fns []*ir.Func) { wg.Add(len(fns)) for _, fn := range fns { fn := fn queue(func(worker int) { ssagen.Compile(fn, worker) compile(fn.Closures) wg.Done() }) } } types.CalcSizeDisabled = true // not safe to calculate sizes concurrently base.Ctxt.InParallel = true compile(compilequeue) ... ... }
In compileFunctions we see that the compiler takes the function in the form of AST IR from the compilequeue and calls ssagen.Compile to compile it into the form of SSA. Here is the code for ssagen.Compile:
// $GOROOT/src/cmd/compile/internal/ssagen/pgen.go // Compile builds an SSA backend function, // uses it to generate a plist, // and flushes that plist to machine code. // worker indicates which of the backend workers is doing the processing. func Compile(fn *ir.Func, worker int) { f := buildssa(fn, worker) // Note: check arg size to fix issue 25507. if f.Frontend().(*ssafn).stksize >= maxStackSize || f.OwnAux.ArgWidth() >= maxStackSize { largeStackFramesMu.Lock() largeStackFrames = append(largeStackFrames, largeStack{locals: f.Frontend().(*ssafn).stksize, args: f.OwnAux.ArgWidth(), pos: fn.Pos()}) largeStackFramesMu.Unlock() return } pp := objw.NewProgs(fn, worker) defer pp.Free() genssa(f, pp) // Check frame size again. // The check above included only the space needed for local variables. // After genssa, the space needed includes local variables and the callee arg region. // We must do this check prior to calling pp.Flush. // If there are any oversized stack frames, // the assembler may emit inscrutable complaints about invalid instructions. if pp.Text.To.Offset >= maxStackSize { largeStackFramesMu.Lock() locals := f.Frontend().(*ssafn).stksize largeStackFrames = append(largeStackFrames, largeStack{locals: locals, args: f.OwnAux.ArgWidth(), callee: pp.Text.To.Offset - locals, pos: fn.Pos()}) largeStackFramesMu.Unlock() return } pp.Flush() // assemble, fill in boilerplate, etc. // fieldtrack must be called after pp.Flush. See issue 20014. fieldtrack(pp.Text.From.Sym, fn.FieldTrack) }
The complete implementation of Compile is posted here. The buildssa function is really responsible for generating the intermediate code with SSA attributes in the Compile function. I saw that the buildssa function has nearly 300 lines of code, which is a bit complicated. come out:
// $GOROOT/src/cmd/compile/internal/ssagen/ssa.go // buildssa builds an SSA function for fn. // worker indicates which of the backend workers is doing the processing. func buildssa(fn *ir.Func, worker int) *ssa.Func { name := ir.FuncName(fn) ... ... // Convert the AST-based IR to the SSA-based IR s.stmtList(fn.Enter) s.zeroResults() s.paramsToHeap() s.stmtList(fn.Body) // fallthrough to exit if s.curBlock != nil { s.pushLine(fn.Endlineno) s.exit() s.popLine() } ... ... // Main call to ssa package to compile function ssa.Compile(sf) ... ... }
Let’s look at the ssa.Compile in buildssa later, which involves multiple rounds of SSA (pass) optimization, let’s look at the conversion from AST-based IR to SSA-based IR, whether it is fn.Enter or fn.Body , which are essentially a set of ir Nodes, and stmtList converts these nodes into SSA form one by one. Go provides a visual ssa dump tool, which we can look at more intuitively.
The Go language belongs to the imperative programming language (imperative programming language). This kind of programming paradigm has three typical control structures: sequential structure, selection structure and loop structure. Let’s take a look at how the simplest sequential structure is translated into SSA. :
// github.com/bigwhite/experiments/tree/master/ssa-examples/sequential.go package main func sum(a, b, c int) int { d := a + be := d + c return e } func main() { println(sum(1, 2, 3)) }
We generate the SSA conversion process of the function sum by the following command:
$GOSSAFUNC=sum go build sequential.go dumped SSA to ./ssa.html $mv ssa.html ssa-sequential.html $open ./ssa-sequential.html
The above open command will open the browser locally and display the ssa-sequential.html page:
In the above picture, the leftmost is the source code (the source code is displayed twice, it feels like a bug), the middle is the IR in the form of AST, and the rightmost box is the first version of SSA generated by the Go compiler. For better explanation, We paste it below:
// github.com/bigwhite/experiments/tree/master/ssa-examples/ssa-sequential.html b1:- v1 (?) = InitMem <mem> v2 (?) = SP <uintptr> v3 (?) = SB <uintptr> v4 (?) = LocalAddr <*int> {a} v2 v1 v5 (?) = LocalAddr <*int> {b} v2 v1 v6 (?) = LocalAddr <*int> {c} v2 v1 v7 (?) = LocalAddr <*int> {~r0} v2 v1 v8 (3) = Arg <int> {a} (a[int]) v9 (3) = Arg <int> {b} (b[int]) v10 (3) = Arg <int> {c} (c[int]) v11 (?) = Const64 <int> [0] v12 (+4) = Add64 <int> v8 v9 (d[int]) v13 (+5) = Add64 <int> v12 v10 (e[int]) v14 (+6) = MakeResult <int,mem> v13 v1 Ret v14 (+6) name a[int]: v8 name b[int]: v9 name c[int]: v10 name d[int]: v12 name e[int]: v13
From a structural point of view, SSA is divided into two parts, one part is blocks composed of b1 and Ret, and the other part is the corresponding relationship between named variables and SSA values.
In SSA, a block represents a basic code block (basic block) in a function control flow graph (control flow graph). From the code comments, you can see that SSA has four block types: Plain, If, Exit and Defer:
// $GOROOT/src/cmd/compile/internal/ssa/block.go // BlockKind is the kind of SSA block. // // kind controls successors // ------------------------------------------ // Exit [return mem] [] // Plain [] [next] // If [boolean Value] [then, else] // Defer [mem] [nopanic, panic] (control opcode should be OpStaticCall to runtime.deferproc) type BlockKind int16
But the actual BlockKind is inconsistent with the comment. opGen.go is an automatically generated file, and there are dozens of constant values of the BlockKind type in it. Even if the constants related to the CPU architecture are filtered out, there are still 8 remaining (from BlockPlain to BlockFirst):
// $GOROOT/src/cmd/compile/internal/ssa/opGen.go const ( BlockInvalid BlockKind = iota ... ... BlockPlain BlockIf BlockDefer BlockRet BlockRetJmp BlockExit BlockJumpTable BlockFirst )
In the SSA code example of the sum function above, b1 should be of type Plain, and Ret is obviously of type BlockRet.
A block of plain type is a set of values, and value is the basic element of SSA. According to the definition of SSA, a value can only be defined exactly once, but it can be used any number of times. For example, a value mainly includes a unique identifier, an operator, a type and some parameters. The following Strings returned by the LongString and LongHTML methods of the Value type can better illustrate the format of the Value. Especially the LongHTML method is the method to output the content in ssa html:
// $GOROOT/src/cmd/compile/internal/ssa/value.go // long form print. v# = opcode <type> [aux] args [: reg] (names) func (v *Value) LongString() string { ... ... } // $GOROOT/src/cmd/compile/internal/ssa/html.go func (v *Value) LongHTML() string { // TODO: Any intra-value formatting? // I'm wary of adding too much visual noise, // but a little bit might be valuable. // We already have visual noise in the form of punctuation // maybe we could replace some of that with formatting. s := fmt.Sprintf("<span class=\"%s ssa-long-value\">", v.String()) linenumber := "<span class=\"no-line-number\">(?)</span>" if v.Pos.IsKnown() { linenumber = fmt.Sprintf("<span class=\"l%v line-number\">(%s)</span>", v.Pos.LineNumber(), v.Pos.LineNumberHTML()) } s += fmt.Sprintf("%s %s = %s", v.HTML(), linenumber, v.Op.String()) s += " <" + html.EscapeString(v.Type.String()) + ">" s += html.EscapeString(v.auxString()) for _, a := range v.Args { s += fmt.Sprintf(" %s", a.HTML()) } r := v.Block.Func.RegAlloc if int(v.ID) < len(r) && r[v.ID] != nil { s += " : " + html.EscapeString(r[v.ID].String()) } var names []string for name, values := range v.Block.Func.NamedValues { for _, value := range values { if value == v { names = append(names, name.String()) break // drop duplicates. } } } if len(names) != 0 { s += " (" + strings.Join(names, ", ") + ")" } s += "</span>" return s }
Take the value v12 in the example as an example:
v12 (+4) = Add64 <int> v8 v9 (d[int])
- v12 is the unique identifier of the value, where 12 is the ID, and the ID is an integer starting from 1;
- (+4) is the line number of the corresponding source code;
- Add64 is the operator;
- is the type of value (v.Type());
- v8, v9 are the parameters of the Add64 operator;
- (d[int]) is the LocalSlot corresponding to v12. LocalSlot represents a location on the stack frame, which is used to identify and store output parameters, output parameters or other variable nodes.
Another part of the ssa dump output is the corresponding relationship between the named variable and the SSA value, and its format is also: name LocalSlot: value:
name a[int]: v8 name b[int]: v9 name c[int]: v10 name d[int]: v12 name e[int]: v13
The code that outputs the second part of the above is as follows:
// $GOROOT/src/cmd/compile/internal/ssa/print.go func (p stringFuncPrinter) named(n LocalSlot, vals []*Value) { fmt.Fprintf(pw, "name %s: %v\n", n, vals) }
The code execution flow of the sequential structure is from top to bottom, and there is only one successor block after each block. This kind of SSA conversion is easier to understand.
Let’s take another look at a selection control structure – the ssa of the if control statement. Here is our example Go source code:
// github.com/bigwhite/experiments/tree/master/ssa-examples/selection_if.go package main func foo(b bool) int { if b { return 2 } return 3 } func main() { println(foo(true)) }
We output the SSA intermediate code of function foo by the following command:
$GOSSAFUNC=foo go build selection_if.go dumped SSA to ./ssa.html $mv ssa.html ssa-selection-if.html $open ./ssa-selection-if.html
The open command launches the browser to display the SSA form of the foo function:
With the foundation of the Go SSA format above, this SSA code is easier to analyze.
There are multiple blocks in this SSA, including plain block, if block, ret block, etc. We focus on SSA’s handling of if statements.
In the classic SSA conversion theory, SSA converts the if branch into an SSA code with a Φ function (as shown in the figure below):
Figure: SSA conversion of if statement (picture from Dr. Keith’s slide)
The Φ function (Greek letter fài) is a merge point in the code, which can bring together the execution paths of the n blocks preceding it. However, it is only used for code analysis, and the Φ function will not exist in the final generated code. Algorithms such as where to insert the Φ function are too theoretical to be expanded here.
Let’s take a look at the processing of if statements by go in reality:
b1: v1 (?) = InitMem <mem> v2 (?) = SP <uintptr> v3 (?) = SB <uintptr> v4 (?) = LocalAddr <*bool> {b} v2 v1 v5 (?) = LocalAddr <*int> {~r0} v2 v1 v6 (3) = Arg <bool> {b} (b[bool]) v7 (?) = Const64 <int> [0] v8 (?) = Const64 <int> [2] v11 (?) = Const64 <int> [3] If v6 → b3 b2 (4) b2: ← b1 v13 (7) = Copy <mem> v1 v12 (7) = MakeResult <int,mem> v11 v13 Ret v12 (+7) b3: ← b1 v10 (5) = Copy <mem> v1 v9 (5) = MakeResult <int,mem> v8 v10 Ret v9 (+5) name b[bool]: v6
The key here is if block. If judges the value of v6, which is the variable b, if it is true, the code execution flows to block b3, otherwise it flows to block b2.
The following b2 and b3 blocks also contain the attributes of the preceding block. Taking b2 as an example, for the stream from the b1 block, the code of the corresponding block is executed. The switch-based selection statement is more complicated. Interested friends can take a look at ssa-selection-switch.html.
Let’s take a final look at the loop structure, here’s the Go code:
// github.com/bigwhite/experiments/tree/master/ssa-examples/for_loop.go package main func sumN(n int) int { var r int for i := 1; i <= n; i++ { r = r + i } return r } func main() { println(sumN(10)) }
The generated SSA is as follows:
We see that there are more ssa blocks in the loop structure, and the flow direction is more complicated. If it is converted into a graph, it should be like this:
We see: whether it is a selection structure or a loop structure, SSA essentially builds a control flow graph of a function, each node in the graph is a block, and the execution control flow of the function is transferred between each block. The subsequent SSA-based optimization is based on the feature that the value in the block is only assigned once and the control flow graph of the block .
Next, let’s take a brief look at what optimizations Go has done based on SSA IR.
4. Multiple rounds (pass) optimization based on SSA
The ssa.Compile call in the buildssa function performs multiple passes of the SSA IR-based optimization:
// $GOROOT/src/cmd/compile/internal/ssa/compile.go func Compile(f *Func) { ... ... for _, p := range passes { ... ... tStart := time.Now() p.fn(f) tEnd := time.Now() ... ... } }
We see that for a certain function, the Compile function performs multiple rounds of optimization on its installed preset passes. What are the passes? Let’s take a look:
// $GOROOT/src/cmd/compile/internal/ssa/compile.go // list of passes for the compiler var passes = [...]pass{ {name: "number lines", fn-3693: numberLines, required: true}, {name: "early phielim", fn-3693: phielim}, {name: "early copyelim", fn-3693: copyelim}, {name: "early deadcode", fn-3693: deadcode}, // remove generated dead code to avoid doing pointless work during opt {name: "short circuit", fn-3693: shortcircuit}, {name: "decompose user", fn-3693: decomposeUser, required: true}, {name: "pre-opt deadcode", fn-3693: deadcode}, ... ... {name: "regalloc", fn-3693: regalloc, required: true}, // allocate int & float registers + stack slots {name: "loop rotate", fn-3693: loopRotate}, {name: "stackframe", fn-3693: stackframe, required: true}, {name: "trim", fn-3693: trim}, // remove empty blocks }
After a rough count, there are about 50 passes (including multiple rounds of deadcode cleaning), and the code executed by each pass is located in the $GOROOT/src/cmd/compile/internal/ssa directory, and we can also dump it out. html to view the SSA results obtained after each pass, taking ssa-sequential.html as an example, the schematic diagram of multiple rounds of optimization is as follows:
Click on the optimized title in bold font (for example: lower deadcode for cse) on the browser page, the SSA code generated in this step will be displayed, and the last box is the assembly code of the target architecture generated based on SSA.
Each pass has its own uniqueness, such as cse, which stands for Common Subexpression Elimination. The following is an example of cse optimization:
y = x + 5 ... z = x + 5
After cse optimization (the x value has not changed in the premise that the intermediate process has not changed):
y = x + 5 ... z = y
In this example, after one round of cse, Go can save the next unnecessary addition (z = x + 5). Don’t look at the inconspicuous addition operation once, and the accumulation is not a small performance improvement.
If you are interested in the optimization of a certain pass, you can study it in depth by comparing the code in the $GOROOT/src/cmd/compile/internal/ssa directory and the SSA generated in the browser.
5. Summary
The logic behind the compiler is always difficult to understand. This article introduces the origin of the Go compiler and SSA, the links that drive SSA conversion and optimization in the Go compiler, and the form and process of the SSA generated by Go. opened a door. But in order to really understand the details of SSA conversion and SSA-based optimization steps, it is indispensable to carefully read SSA-related papers and materials (see References) and related code.
The code involved in this article can be downloaded here .
6. References
- Principles of Compilation (Second Edition) – https://ift.tt/2CFvgdH
- SSA: Static Single-Assignment Form – https://ift.tt/1PanKZE
- Static Single Assignment Book – https://ift.tt/LIQiSAT
- Static single-assignment form – https://ift.tt/NxEzX9c
- GopherCon 2017: Keith Randall – Generating Better Machine Code with SSA – https://ift.tt/1DjSYm6
- Generating Better Machine Code with SSA(slide) – https://ift.tt/d16FTfJ
- New SSA Backend for the Go Compiler – https://ift.tt/p4S9r8Z
“Gopher Tribe” Knowledge Planet aims to create a high-quality Go learning and advanced community! High-quality first published Go technical articles, “three-day” first published reading rights, analysis of the current situation of Go language development twice a year, read the fresh Gopher daily 1 hour in advance every day, online courses, technical columns, book content preview, must answer within six hours Guaranteed to meet all your needs about the Go language ecosystem! In 2022, the Gopher tribe will be fully revised, and will continue to share knowledge, skills and practices in the Go language and Go application fields, and add many forms of interaction. Everyone is welcome to join!
Gopher Daily Archive Repository – https://ift.tt/RYH1GTP
my contact information:
- Weibo: https://ift.tt/Fr5I0Jo
- Blog: tonybai.com
- github: https://ift.tt/WiGUFcE
Business cooperation methods: writing, publishing books, training, online courses, partnership entrepreneurship, consulting, advertising cooperation.
© 2022, bigwhite . All rights reserved.
This article is reprinted from https://tonybai.com/2022/10/21/understand-go-ssa-by-example/
This site is for inclusion only, and the copyright belongs to the original author.