Packet capture analysis TCP handshake and wave

Original link: https://www.mghio.cn/post/216a8d02.html

cover.jpg

foreword

The first thing to be clear is that TCP is a reliable transmission protocol , and all its characteristics are ultimately for this reliable transmission service. I have seen many articles on the Internet about the三次握手of TCP connection and the四次挥手of disconnection, but they are all too theoretical. After thinking about it over and over again, I feel that I am still an engineering-oriented person. To learn these theoretical knowledge, the best way is to understand it through actual cases, so as to be concrete and profound. This article uses Wireshark to capture packets to analyze the TCP三次握手and四次挥手. If you also feel that you understand these theories, it is strongly recommended that you also combine packet capture practice to strengthen your understanding of these theoretical knowledge.

three-way handshake

The three-way handshake of TCP connection establishment is that the two parties of the connection negotiate and confirm some information (Sequence number, Maximum Segment Size, Window Size, etc.). Sequence number has two functions: one is the initial sequence number (ISN) when the SYN flag is 1; Then the actual sequence number of the first data byte and the acknowledgment number in the corresponding ACK are the sequence number plus 1; the other is that when the SYN flag is 0, it is the segment of the current session (the transport layer is called segment, and the network layer is called segment). The cumulative sequence number of the first data byte of the packet, the data link layer is called frame). Maximum Segment Size is referred to as MSS, which indicates the information that can be transmitted in the largest segment (excluding TCP and IP headers). Window Size indicates the size of the sender’s receiving window. Let’s take a look at the three-way handshake process of my local access to the blog mghio :

three-way-hand-shake.jpg

The three small red boxes in the figure represent the three-way handshake to establish a connection with the server .

  1. The first step, the client side (this example is the browser) sends SYN to the server side;
  2. In the second step, after the server receives the SYN message, it replies SYN + ACK to the client, and ACK indicates that it has received the SYN message from the client;
  3. In the third step, after the client receives the reply SYN + ACK, it also replies with an ACK to indicate that it has received the SYN + ACK from the server.

At this point, port 60469 on the client side is already in the ESTABLISHED state.
It can be seen that the core purpose of the three-way handshake is to inform each other of the object’s own Sequence number. The blue box is the initial Sequence number of the client and the ACK returned by the client. The green box is the initial Sequence number of the server and the ACK returned by the client. In this way, after negotiating the initial sequence number, the sender can determine the packet loss and retransmit the lost packet when sending the data packet.

Another purpose of the three-way handshake is to negotiate some information (the yellow box in the above figure is the Maximum Segment Size, and the pink box is the Window Size).

three-way-hand-shake-dg.jpg

At this point, you can know that what is usually said to建立TCP连接is essentially the preparatory work for realizing reliable TCP transmission. In fact, the physical layer does not have this connection there. After TCP establishes a connection, it owns and maintains some state information. This state information includes Sequence number, MSS, Window Size, etc. The TCP handshake is to negotiate these initial values. And these states are the essence of what we usually call a TCP connection. Because this is too important, I would like to stress again that TCP is a reliable transport protocol, and all its features are ultimately for this reliable transport service .

waved four times

Let’s take a look again, when closing the browser page is the four wave process of disconnection:

tcp-close-sequence.jpg

I believe you have already found out that the packet capture in the picture above did not capture the hand wave four times, but three times. Why is this?

This is due to the delay mechanism of TCP (because the system kernel does not know whether the application can be closed immediately), when the waved terminal (here is the 443 port of the server) receives the waved terminal (here is the 63612 port of the client) When a FIN request is made, the ACK will not be sent immediately, but will be sent after a delay time. However, at this time, there is no data to be sent by the waved terminal, and a FIN request will be sent to the waved terminal, which may cause the waved terminal to send a FIN request. The FIN and ACK are received by the waving terminal together, which leads to the phenomenon that the second and third waving are merged into one, which finally shows the situation of “three waving”.

Disconnecting and waving four times is divided into the following four steps (assuming that there is no waving and merging):

  1. In the first step, the client side actively sends a FIN packet to the server side;
  2. In the second step, the server side replies ACK (corresponding to the ACK of the FIN packet in the first step) to the client, indicating that the server knows that the client side is about to disconnect;
  3. In the third step, the server side sends a FIN packet to the client side, indicating that the server side has no data to send and can be disconnected;
  4. In the fourth step, the client side replies with an ACK packet to the server side, indicating that since both transmitters have sent FIN packets to indicate that they can be disconnected, then it is really disconnected.

The following is the TCP connection flow state diagram (where the CLOSED state is virtual and does not actually exist). This diagram is very important. After remembering this diagram, basically all TCP network problems can be solved.

tcp_state_diagram.png

One of the more difficult to understand is the TIME_WAIT state, which will be experienced by the end that actively shuts down. The longest time that this end stays in this state is twice the Maximum segment lifetime (MSL), which is mostly referred to as 2MSL for short. The TIME_WAIT state exists for two reasons:

  1. To reliably achieve TCP full-duplex connection termination;
  2. Let the old repeated segments disappear in the network (the longest time a segment survives in the network is 1 MSL, and it is 2 MSL once and for all);

Why is the handshake three times and the wave four times?

Hey, this is a classic interview question. In fact, the reason why most people wave their hands behind their backs is four times: because TCP is full-duplex (two-way), it takes four times to recycle… But ask again: handshake is also two-way, but why only three times?

The information circulating on the Internet all say that TCP is two-way, so it takes four times to recycle, but the handshake is also two-way (both sides of the handshake are telling each other their initial Sequence number), so why not use four-way handshake? Therefore, you need to ask a few more reasons for everything, and have a spirit of exploration and doubt.

If you look back at the second step (SYN + ACK) of the above three-way handshake, it can actually be divided into two steps: the first step is to reply ACK, and the second step is to send SYN again, but the efficiency will be lower. , In this case, the three-way handshake does not become a four-way handshake.

It seems that the four times of waving is mainly to reply an ACK packet after receiving the first FIN packet. There is one more time here. If you can also reply FIN + ACK like a handshake, then the four times of waving will become three times. Here’s a picture of the waving packet above:

tcp-close-sequence.jpg

The second red box in this figure is the FIN + ACK packet replied by the server, so that four waved hands become three times (if one packet counts as one). The main reason for using four waves of hands here is: after the passive shutdown terminal receives the FIN, it knows that the active shutdown terminal is about to be closed, and then the system kernel layer will notify the application layer to close. At this time, the application layer may need to do some preparations before closing. Work, and there may be data that has not been sent, so the system kernel first responds with an ACK packet, and then sends a FIN packet when the application layer is ready to actively adjust close and close.

In the handshake process, there is no such preparation process, so SYN + ACK can be sent immediately (the two-step synthesis here is one step, which improves efficiency). During the waving process, the system kernel can only ACK after receiving the other party’s FIN, and cannot actively apply FIN for the application, because the system kernel does not know whether the application can be closed immediately.

Summarize

TCP is a very complex protocol. In order to achieve reliable transmission and deal with multiple problems in various network transmissions, there are some very classic solutions, such as network congestion control algorithms, sliding windows, and data retransmissions. It is strongly recommended that you read the book rfc793 and TCP/IP Detailed Explanation Volume 1: Protocols .

If you are one of those people who can master a skill just by looking at the theory, and then infer one from the other, then I admire you very much; if not, then when you learn theoretical knowledge, you should combine practice to strengthen your understanding of the theory. Repeated repetition can better grasp a knowledge, pay attention to skills, and learn to use tools to achieve goals when necessary.

Finally, all the features of TCP are basically the core of which is to achieve the goal of reliable transmission , and some are for the purpose of optimizing performance.

This article is reproduced from: https://www.mghio.cn/post/216a8d02.html
This site is for inclusion only, and the copyright belongs to the original author.