Talking about TIME_WAIT from an experience

duplicate-segment.png

Today, let’s talk about the problem of TCP’s TIME_WAIT . This problem is well known to everyone, but this time, I encountered a different scene, and it was solved two days ago. I just wrote an article, and by the way, I will talk about all those things about TIME_WAIT . By the way, this scene is related to EaseProbe , my open-source live detection gadget. I will first talk about the problems in this scene, and then I will tell you about it along this scene.

problem background

Let’s talk about the background first. EaseProbe is a lightweight and independent small tool for probing the health of services. It supports http/tcp/shell/ssh/tls/host and various middleware probing, and then sends notifications directly to On mainstream IM, such as: Slack/Telegram/Discrod/Email/Team, including domestic corporate WeChat/Dingding/Feishu, it is very easy to use, and people who have used it say it is good?.

This detection tool must establish the entire network link from scratch each time it is activated, that is to say, it needs to perform DNS query from the beginning, establish a TCP link, then communicate, and then close the link. Here, we will not set the TCP KeepAlive reuse link, because the detection tool not only needs to detect the remote service, but also the entire network. Get the condition of the entire link.

However, such continuous new links and closed links, according to the state machine of TCP, we know that this will lead to the TCP link of TIME_WAIT appearing on the probe side. According to the definition of the TCP protocol, this TIME_WAIT needs to wait 2 times the MSL time, TCP links will be recycled by the system. Before recycling, this link will occupy system resources, mainly two resources, one is the file descriptor, which is fine and can be adjusted, and the other is the port number, which cannot be adjusted. Yes, because as the client that initiates the request, theoretically you only have 64K port numbers available on the same IP (actually, the system defaults to only nearly 30K, from 32,768…

The post about TIME_WAIT from an experience first appeared on Lenix Blog .

This article is reprinted from https://blog.p2hp.com/archives/9450
This site is for inclusion only, and the copyright belongs to the original author.

Leave a Comment