The tossing WaitGroup

Original link: https://colobu.com/2022/08/30/waitgroup-to-love-to-toss/

WaitGroup is a concurrency primitive that is often used in Go concurrent programming for task scheduling. It looks like it has only a few simple methods and is relatively simple to use. In fact, the internal implementation of WaitGroup has also changed several times, mainly to optimize the atomic operation of its fields.

The original implementation of WaitGroup

The earliest implementation of WaitGroup is as follows:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three

type WaitGroup struct {
m Mutex
counter int32
waiters int32
sema * uint32
}
func (wg *WaitGroup) Add(delta int ) {
v := atomic.AddInt32(&wg.counter, int32 (delta))
if v < 0 {
panic ( “sync: negative WaitGroup count” )
}
if v > 0 || atomic.LoadInt32(&wg.waiters) == 0 {
return
}
wg.m.Lock()
for i := int32 (0 ); i < wg.waiters; i++ {
runtime_Semrelease(wg.sema)
}
wg.waiters = 0
wg.sema = nil
wg.m.Unlock()
}

The meaning of its implementation field is relatively clear, but the implementation is still slightly rough. For example, sema is implemented with pointers.

Then merge the fields counter and waiters . In order to ensure 8-bit alignment of 64-bit atomic operations, the alignment point of state1 needs to be found. sema removes the pointer implementation.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

type WaitGroup struct {
// 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
// 64-bit atomic operations require 64-bit alignment, but 32-bit
// compilers do not ensure it. So we allocate 12 bytes and then use
// the aligned 8 bytes in them as state.
state1 [12 ] byte
sema uint32
}
func (wg *WaitGroup) state() * uint64 {
if uintptr (unsafe.Pointer(&wg.state1)) %8 == 0 {
return (* uint64 )(unsafe.Pointer(&wg.state1))
} else {
return (* uint64 )(unsafe.Pointer(&wg.state1 [4 ]))
}
}

Later, WaitGroup implemented as follows and stabilized:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

type WaitGroup struct {
noCopy noCopy
// 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
// 64-bit atomic operations require 64-bit alignment, but 32-bit
// compilers do not ensure it. So we allocate 12 bytes and then use
// the aligned 8 bytes in them as state, and the other 4 as storage
// for the sema.
state1 [3 ] uint32
}
// state returns pointers to the state and sema fields stored within wg.state1.
func (wg *WaitGroup) state() (statep * uint64 , semap * uint32 ) {
if uintptr (unsafe.Pointer(&wg.state1)) %8 == 0 {
return (* uint64 )(unsafe.Pointer(&wg.state1)), &wg.state1 [2 ]
} else {
return (* uint64 )(unsafe.Pointer(&wg.state1 [1 ])), &wg.state1 [0 ]
}
}

The state1 and sema fields are combined into one field state1 , this array is uint32, four bytes. So either the first element is 8byte aligned, or the second element is 8byte aligned. Find the aligned 8bytes, and the remaining 4bytes are used as sema.

There is no problem with this implementation, but it is a bit forgiving. Because you have to check the alignment of state1 to determine which is the counter and waiters and which is the sema.

Ask a question: What is the maximum number of waiters in a WaitGroup?

Changes in Go 1.18

In Go 1.18, WaitGroup has been changed again. For the environment of 64bit architecture, the compiler ensures that fields of type uint64 are aligned according to 8byte.


1
2
3
4
5
6
7
8
9
10
11
12

type WaitGroup struct {
noCopy noCopy
// 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
// 64-bit atomic operations require 64-bit alignment, but 32-bit
// compilers only guarantee that 64-bit fields are 32-bit aligned.
// For this reason on 32 bit architectures we need to check in state()
// if state1 is aligned or not, and dynamically “swap” the field order if
// needed.
state1 uint64
state2 uint32
}

Of course, in order to be compatible with the 32bit architecture, you still need to judge the alignment:


1
2
3
4
5
6
7
8
9
10
11

func (wg *WaitGroup) state() (statep * uint64 , semap * uint32 ) {
if unsafe.Alignof(wg.state1) == 8 || uintptr (unsafe.Pointer(&wg.state1)) %8 == 0 {
// state1 is 64-bit aligned: nothing to do.
return &wg.state1, &wg.state2
} else {
// state1 is 32-bit aligned but not 64-bit aligned: this means that
// (&state1)+4 is 64-bit aligned.
state := (* [3 ] uint32 )(unsafe.Pointer(&wg.state1))
return (* uint64 )(unsafe.Pointer(&state [1 ])), &state [0 ]
}
}

In general, in linux/amd64 environment, this modification will bring about 9%~30% performance improvement.

Changes in Go 1.20

Optimization is not yet ten thousand. In Go 1.19, Russ Cox implemented atomic.Uint64, which is 8byte aligned in both 64bit and 32bit architectures. Why? Because it has a “Shangfang sword”: align64


1
2
3
4
5
6

// An Uint64 is an atomic uint64. The zero value is zero.
type Uint64 struct {
_ noCopy
_ align64
v uint64
}

There is no problem in the 64bit architecture. When you see this field in the 32bit architecture, the Go compiler will automatically align it to 8byte. This is a convention. You define struct under your package plus align64 is useless.
But if you also want your struct to be 8byte aligned, you can use the following technique:


1
2
3
4
5
6

import “sync/atomic”
type T struct {
_ [0 ]atomic.Int64 // Occupies 0 bytes, but the implied field is 8byte aligned
x uint64 // x is 8byte aligned
}

In this way, the implementation of WaitGroup can be simplified to:


1
2
3
4
5
6

type WaitGroup struct {
noCopy noCopy
state atomic.Uint64 // high 32 bits are counter, low 32 bits are waiter count.
sema uint32
}

There is also no need to implement a separate state() method. Just use the state field directly (remove the race code):


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28

func (wg *WaitGroup) Add(delta int ) {
state := wg.state.Add( uint64 (delta) << 32 )
v := int32 (state >> 32 )
w := uint32 (state)
if v < 0 {
panic ( “sync: negative WaitGroup counter” )
}
if w != 0 && delta > 0 && v == int32 (delta) {
panic ( “sync: WaitGroup misuse: Add called concurrently with Wait” )
}
if v > 0 || w == 0 {
return
}
// This goroutine has set counter to 0 when waiters > 0.
// Now there can’t be concurrent mutations of state:
// – Adds must not happen concurrently with Wait,
// – Wait does not increment waiters if it sees counter == 0.
// Still do a cheap sanity check to detect WaitGroup misuse.
if wg.state.Load() != state {
panic ( “sync: WaitGroup misuse: Add called concurrently with Wait” )
}
// Reset waiters count to 0.
wg.state.Store (0 )
for ; w != 0 ; w– {
runtime_Semrelease(&wg.sema, false , 0 )
}
}

This article is reprinted from: https://colobu.com/2022/08/30/waitgroup-to-love-to-toss/
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment