Original link: https://colobu.com/2022/09/06/string-byte-convertion/
In the development of Go 1.19, string.SliceHeader
and string.StringHeader
experienced a life-and-death struggle. These two types were once marked as deprecated ( deprecated
), but these two types are often used in slice of byte and string to efficiently interact with each other. In the transition scenario, if it is marked as deprecated, but there is no alternative method yet, so these two types have removed the deprecation mark again. If there is no accident, they will also be marked as deprecated again in Go 1.20. .
Conversion optimization for byte slice and string
Direct conversion of string(bytes)
or []byte(str)
will result in data duplication and poor performance. Therefore, in pursuit of extreme performance scenarios, we will use “hackers” to achieve these two types of Conversion, for example, k8s uses the following methods:
https://ift.tt/xO1eyal
1
2
3
4
5
6
7
8
9
|
// toBytes performs unholy acts to avoid allocations
func toBytes(s string ) [] byte {
return *(*[] byte )(unsafe.Pointer(&s))
}
// toString performs unholy acts to avoid allocations
func toString(b [] byte ) string {
return *(* string )(unsafe.Pointer(&b))
}
|
More use the following methods (rpcx also uses the following methods):
1
2
3
4
5
6
7
8
9
|
func SliceByteToString(b [] byte ) string {
return *(* string )(unsafe.Pointer(&b))
}
func StringToSliceByte(s string ) [] byte {
x := (* [2 ] uintptr )(unsafe.Pointer(&s))
h := [3 ] uintptr {x [0 ], x [1 ], x [1 ]}
return *(*[] byte )(unsafe.Pointer(&h))
}
|
Even, the standard library does it this way:
https://ift.tt/GV6yXWK
1
2
3
4
5
6
7
8
|
func Clone(s string ) string {
if len (s) == 0 {
return “”
}
b := make ([] byte , len (s))
copy (b, s)
return *(* string )(unsafe.Pointer(&b))
}
|
Because slice of byte is similar to string data structure, we can use this “hack” to coerce. These two types of data structures are defined in the reflect
package:
1
2
3
4
5
6
7
8
9
|
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
type StringHeader struct {
Data uintptr
Len int
}
|
Slice
has one more Cap
field than String
, and their data is stored in an array, and the Data
of these two structures stores a pointer to this array.
The new way in Go 1.20
Many projects use the above method to improve performance, but this is achieved through unsafe
, which is quite risky, because after the forced transfer, the slice may make some changes, causing the related data to be overwritten or recycled. There are often some unexpected problems. When I used this method to do RedisProxy, I also made similar mistakes. I thought it was a mistake in the standard library.
Therefore, Go officially plans to discard these two types SliceHeader
and StringHeader
in 1.20 to avoid misuse by everyone.
Discard it, but you must also provide a corresponding alternative. No, in Go 1.12, several methods String
, StringData
, Slice
and SliceData
were added to do this performance conversion.
- func Slice(ptr *ArbitraryType, len IntegerType) []ArbitraryType: Returns a Slice whose underlying array starts from ptr and has both length and capacity of len
- func SliceData(slice []ArbitraryType) *ArbitraryType: returns a pointer to the underlying array
- func String(ptr *byte, len IntegerType) string: Generate a string, the underlying array starts from ptr, and the length is len
- func StringData(str string) *byte: Returns the underlying array of strings
These four methods seem primitive and low-level.
This commit was made by cuiweixie. Because it involves a very basic and low-level implementation, and it is a method that may be widely used, everyone should review it very carefully. You can watch: go-review#427095 .
Even this modification has alarmed Rob Pike, who has been dormant for many months. He asked why there is only an implementation and no annotation documents: #54858 Of course, the reason is that this function is still under development and review, but it can be seen that Rob Pike takes this modification seriously.
cuiweixie even modified some writing methods in the standard library, using the four methods in unsafe he submitted.
Performance Testing
Although cuiweixie’s commit has not been merged into the master branch, there are still some variables, but I found that using gotip can use these methods. What I understand is that gotip is suitable for keeping the master branch consistent, isn’t it?
Anyway, let’s write a benchmark first:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
twenty one
twenty two
twenty three
twenty four
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
var L = 1024 * 1024
var str = strings.Repeat( “a” , L)
var s = bytes.Repeat([] byte { ‘a’ }, L)
var str2 string
var s2 [] byte
func BenchmarkString2Slice(b *testing.B) {
for i := 0 ; i < bN; i++ {
bt := [] byte (str)
if len (bt) != L {
b.Fatal()
}
}
}
func BenchmarkString2SliceReflect(b *testing.B) {
for i := 0 ; i < bN; i++ {
bt := *(*[] byte )(unsafe.Pointer(&str))
if len (bt) != L {
b.Fatal()
}
}
}
func BenchmarkString2SliceUnsafe(b *testing.B) {
for i := 0 ; i < bN; i++ {
bt := unsafe.Slice(unsafe.StringData(str), len (str))
if len (bt) != L {
b.Fatal()
}
}
}
func BenchmarkSlice2String(b *testing.B) {
for i := 0 ; i < bN; i++ {
ss := string (s)
if len (ss) != L {
b.Fatal()
}
}
}
func BenchmarkSlice2StringReflect(b *testing.B) {
for i := 0 ; i < bN; i++ {
ss := *(* string )(unsafe. Pointer(&s))
if len (ss) != L {
b.Fatal()
}
}
}
func BenchmarkSlice2StringUnsafe(b *testing.B) {
for i := 0 ; i < bN; i++ {
ss := unsafe.String(unsafe.SliceData(s), len (str))
if len (ss) != L {
b.Fatal()
}
}
}
|
Actual test results:
1
2
3
4
5
6
7
8
9
10
|
➜ strslice gotip test -benchmem -bench .
goos: darwin
goarch: arm64
pkg: github.com/smallnest/study/strslice
BenchmarkString2Slice- 8 18826 63942 ns/op 1048579 B/op 1 allocs/op
BenchmarkString2SliceReflect- 8 1000000000 0.6498 ns/op 0 B/op 0 allocs/op
BenchmarkString2SliceUnsafe- 8 1000000000 0.8178 ns/op 0 B/op 0 allocs/op
BenchmarkSlice2String- 8 18686 65864 ns/op 1048580 B/op 1 allocs/op
BenchmarkSlice2StringReflect- 8 1000000000 0.6488 ns/op 0 B/op 0 allocs/op
BenchmarkSlice2StringUnsafe- 8 1000000000 0.9744 ns/op 0 B/op 0 allocs/op
|
It can be seen that, without the “hacking” method, the two types of forced conversion take a lot of time. If the reflect
method is used, the performance improvement will be greatly improved.
If the latest unsafe
package is adopted, the performance can also be greatly improved, although the time-consuming is slightly increased than that of reflect
, which can be ignored.
This article is reproduced from: https://colobu.com/2022/09/06/string-byte-convertion/
This site is for inclusion only, and the copyright belongs to the original author.