Bit operation notes

Original link:

basic concept

Bit (bit, also known as binary digit) refers to a 1-bit binary number (0 or 1), which is the smallest unit of information in a computer.

Byte: A byte consists of 8 bits.

Skillful use of bit operations can improve the space-time efficiency of our programs.

Integer Storage and Operations in Computers

The following takes 32-bit binary numbers, that is, int and unsigned int types in C++ as an example.

original code, reverse code

A brief introduction:

  1. Original code : The highest bit is the sign bit, the positive number is $0$, the negative number is $1$, and all the remaining bits are the absolute value of the decimal number.

    • Pros: Most intuitive for humans.
    • Disadvantage: Cannot convert subtraction to addition. For example: $1-1=1+(-1)=0001+1001=1010=-2$; $0$ has two representations: $0000$ and $1000$.
  2. Inverse code : the highest bit is the sign bit, positive numbers are $0$, and negative numbers are $1$. The complement of a positive number is equal to itself, and the complement of a negative number is negated except for the sign bit.

    • Advantage: solves the problem of subtraction. $1-1=1+(-1)=0001+1110=1111=0$
    • Disadvantages: There are two representation methods for $0$, $0000$ and $1111$; the subtraction algorithm rules are more complicated, and extra judgment is needed for overflow.


  • 32-bit unsigned integer unsigned int :
    The 32-bit code $C$ is directly regarded as a 32-bit binary number $N$.

  • 32-bit signed integer int :
    With the highest bit as the sign bit , $0$ represents a non-negative number, and $1$ represents a negative number.

    For each code $C$ whose highest bit is $0$, it is directly regarded as a 32-bit binary number $S$.

    At the same time, it is defined that the value represented by the code ~C obtained after the bitwise inversion of the code is $-1-S$.

    32-bit complement representation unsigned int int
    $000000\cdots 000000$ $0$ $0$
    $011111\cdots 111111$ $2147483647$ $2147483647$
    $100000\cdots 000000$ $2147483648$ $-2147483648$
    $111111\cdots 111111$ $4294967295$ $-1$

It can be found that in the complement code, each value has a unique representation, and adding and subtracting any two values ​​is equivalent to the binary addition and subtraction operation with the highest bit without carry in the 32-bit complement code. . Both 32-bit unsigned and signed integers are equivalent to automatically modulo $2^{32}$ (wrap around) when over/underflow occurs.

This also explains the phenomenon of negative numbers when signed integer arithmetic overflows. Let’s do an experiment on the overflow of int :

 1 2 3 4 5 6 7 8 
 void print ( int t ) { cout << bitset < 32 > ( t ) << " " << t << endl ; } int main () { int t = 2147483647 ; print ( t ); print ( t + 1 ); print (( t + 1 ) * 2 ); return 0 ; }


 1 2 3 
 01111111111111111111111111111111 2147483647 10000000000000000000000000000000 -2147483648 00000000000000000000000000000000 0

The result of (t+1)*2 should be $1\underbrace{00000\cdots 000000} {32 0}$, according to the principle of no carry-over of the highest order , the result becomes $\underbrace{00000\cdots 000000} {32 0} = 0$.

Two’s complement is also known as “Two’s complement”. The inverse code is also called “Ones’ complement”, which directly inverts each bit of $C$ (positive number) to represent negative $C$. In negative representation, the complement code and the complement code differ by $1$ in absolute value. For example, in the above table, the first and fourth rows are one’s complement, and the second and third rows are one’s complement. As an integer encoding, two’s complement has many advantages over one’s complement. In addition to the “natural overflow modulo” mentioned above, the complement code focuses on solving the encoding uniqueness problem of $0$, which can represent one more number than the inverse code. At the same time, special judgment is reduced, which is simple and efficient in circuit design.

form addend addend and
32-bit complement $111\cdots 111$ $000\cdots 001$ $(1)_{overflow}000\cdots 000$
int $-1$ $1$ $0$
unsigned int $4294967295$ $1$ $0(\mod 2^{32})$

“One’s complement plus one” is only a property of complement code and cannot be defined as complement code. The complement of a negative number is a binary code that can be added to its opposite and overflow to make the calculation result in the computer $0$ . This is the original intention of the complement code design. The specific goal is to make $1+(-1) = 0$, which cannot be obtained by using the original code.

So for a negative number $-X$ with $n$ digits, there is the following relationship:

X_Complement + (-X)_Complement = 1\underbrace{00\cdots 0}_n = 2^n

So the complement of $-X$ should be the binary encoding of $2^nX$.

form addend addend and
32-bit complement $011\cdots 111$ $000\cdots 001$ $100\cdots 000$
int $2147483647$ $1$ $-2147483648$
unsigned int $2147483647$ $1$ $2147483648$

Because it needs to write out 32 bits to represent an int in binary, which is more cumbersome. However, in decimal, it is not easy to clearly reflect every bit of the complement, so in programming, hexadecimal is often used to represent a constant, so only 8 characters need to be written, each character ($0\sim 9, A\sim F$) represents 4 binary bits in complement. The hexadecimal constant of C++ starts with 0x , 0x itself just declares the hexadecimal number, and the part after 0x corresponds to the specific hexadecimal value. E.g:

32-bit complement int (decimal) int (hex)
$000000\cdots 000000$ $0$ $0x0$
$011111\cdots 111111$ $2147483647$ $0x7F\thinspace FF\thinspace FF\thinspace FF$
$00111111 Repeat 4 times$ $1061109567$ $0x3F\thinspace 3F\thinspace 3F\thinspace 3F$
$111111\cdots 111111$ $-1$ $0xFF\thinspace FF\thinspace FF\thinspace FF$

The $0x3F\thinspace 3F\thinspace 3F\thinspace 3F$ in the table above is a useful value with two properties:

  1. Its double is no more than $0x7F\thinspace FF\thinspace FF\thinspace FF$, the largest positive integer that int can represent.
  2. Every 8 bits (every byte) of an integer is the same.

When our commonly used memset(a, val, sizeof(a)) initializes an int array $a$, it fills $val(0x00\sim 0xFF)$ to each byte of the array $a$, and an int occupies 4 bytes, so using memset can only assign the same int every 8 bits .

Therefore, when we want to initialize the value in an array to positive infinity, in order to avoid addition overflow or cumbersome judgment, we can use memset(a, 0x3f, sizeof(a)) to assign $0x3F\thinspace 3F\thinspace to the array 3F\thinspace The value of 3F$.

shift operation

shift left

Move the numbers to the left at the same time in the binary representation, the low bits are filled with $0$, and the high bits are discarded after they cross the boundary.

Arithmetic shift right

In the two’s complement representation, the number is shifted to the right at the same time, the high bit is filled with the sign bit, and the low bit is discarded after it crosses the boundary.

n»1 = \lfloor \frac{n}{2.0} \rfloor

Note that integer division is implemented in C++ by rounding towards zero (discarding the fractional part), eg $(-3)/2=-1$, $3/2=1$.

logical right shift

In the two’s complement representation, the number is shifted to the right at the same time, the high bit is filled with $0$, and the low bit is discarded after it crosses the boundary.

An unsigned integer right shift uses a logical right shift. For signed integers, it was not specified until C++20 whether to use an arithmetic or logical right shift (arithmetic right shift on most platforms). Arithmetic right shift was only specified in C++20.


This article is reproduced from:
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment