MTU and UDP (and protocols based on UDP)

Original link: https://www.kawabangga.com/posts/5160

After I wrote “Everything about MTU and MSS” last time, I had another question recently. After thinking hard for a week, I finally got the answer. Now that I think about it, the answer to the question is simple and effective, but we discussed it with a few colleagues during lunch, and we didn’t think about it quickly, so we still think it’s worth recording.

First I want to spend some space describing the problem. Because when I communicate with my colleagues, I find that everyone will think that I am asking another question.

We know that if the size of the IP packet is larger than the MTU, then layer 3 will be responsible for fragmentation, that is, a large packet is split into multiple small packets and sent separately. Then my question is, when the third layer transmits data from its side to the next hop, it only knows its own MTU, but not the MTU of the other party. If the MTU of the other party is smaller than its own, how to unpack and send give it?

mtu-example.png

How to send data to the other side whose MTU is smaller than itself?

In the previous article, we know that because TCP is a connection protocol, when the connection is established, there is MSS negotiation. If the MTU of the intermediate device is relatively small, MSS clamping will occur, so that both ends can be guaranteed. Data that exceeds the MTU will be sent.

But for connectionless-oriented protocols, such as UDP, how to deal with this problem?

First of all, it is absolutely impossible not to process it, because in theory, if a packet with an MTU larger than the MTU that it can accept is received, it will be discarded. If UDP has no retransmission mechanism, it will keep sending and losing, and the sending end will not know what happened.

Then I thought of PMTUD, which was also mentioned in that article. But the purpose of PMTUD is: to avoid IP fragmentation, first know the MTU on the link through PMTUD, and then ensure not to send packets larger than the MTU size in subsequent communications. This is not my question. My question is, if the MTU of the other party is relatively small, and Don’t Fragmentation is not set at this time, how does the third layer unpack. Moreover, for protocols such as UDP, some scenarios are not realistic. Do I have to send multiple packets to detect the MTU before a DNS request? The efficiency is too low. I also actually went to grab the package, and it is true that there is no PMTUD.

The next answer I get is: the other party will discard the packet, and then send an ICMP back, Type=3 (Destination Unreachable) and code=4, packet too big and DF is set. It means that I have received a packet, and I cannot Forwarding, but this package has set DF again, telling me not to unpack it, I have no choice but to throw it away, you should know. I am not very satisfied with this answer, because according to the semantics, this Code=4 means that DF is set and I just lost it. And what I want to ask is, DF does not have a set, so you unpack it casually, how do you know the MTU of the other party and then unpack it?

These answers have been discussed with my colleagues, and it seems that they can solve the problem, but they don’t seem to be reasonable.

I searched for some information, but I feel that I didn’t directly answer this question. Most of the articles seem to have skipped this part when they mentioned it. They just introduced how to perform fragmentation according to the MTU at their end. Some feel outrageous, such as here , saying that the router knows the MTU of the other party. I’m just curious, how does it know? The IP protocol does not have any mechanism to negotiate the MTU.

router-know-dest-mtu.png

I also did an experiment myself to verify it. Created two virtual machines, A MTU=1000, B MTU=500, and then use A to ping B, size=800, and found that there is no fragmentation from A to B, and there is fragmentation from B to A. But both sides can receive packets (I guess this is a problem in the experimental environment, because the network between the two VMs is relatively simple, so the network card can handle this unreasonable packet?) At least, we have proved that at the IP layer , it will not care what the other party’s MTU is, it will only fragment according to its own MTU.

One day, a colleague came to Singapore on a business trip from Shenzhen. We had dinner together and brought up this topic again. He said directly: UDP doesn’t care about this problem!

Yes, that’s why I didn’t find anything related to MTU in the UDP-related materials. This protocol is too simple to handle this problem. If you want to implement a protocol based on UDP, you have to deal with the problem of exceeding the MTU yourself.

This is my answer based on my own reading of some RFCs. If there are any mistakes, please point them out.

for example:

The DNS protocol stipulates: RFC 1035 DNS response cannot exceed 512 bytes (UDP message), if it exceeds 512 bytes, the content after 512 bytes will be truncated. Is the content of 512 bytes safe? (All nodes on the link can normally accept this size?), let’s calculate: the smallest Frame of Layer 2 Ethernet is 576 bytes, IP header 20 bytes + IP option 0-40 bytes, UDP header 8 bytes, so in When IP option =0, the final Ethernet Frame of UDP message with 512 bytes is: 512 bytes + 20 bytes + 0 + 8 = 540 bytes, which is less than 576 bytes. is safe. IP option is safe when it is less than 576 – 540 = 36 bytes. It can be said that in most cases, this size is safe.

This is DNS’s solution to the MTU problem: I only send the smallest layer-2 packet in the world, so it’s okay, right?

A similar solution is the TFTP protocol ( RFC 1350 ), the default is 512 bytes, but it can be configured. However, the user is responsible for the configuration, and if the configuration is improper, the packet will be lost directly .

KCP also has a default value of 1400 bytes, but it supports setting through the function ikcp_setmtu. Since this is essentially a “Pure algorithm protocol”, you can have your own implementation of MTU detection.

The last is QUIC , which is the most representative. Its processing method is:

  1. The implementation of QUIC should (the RFC uses SHOULD) use PMTUD, and should record the MTU of each source ip + dest ip
  2. But if there is no PMTUD, it can also be considered that MTU=1280, and the protocol setting max_udp_payload_size = 1200 bytes, so, according to the above algorithm, the IPv4 header can have a maximum of 52 bytes, and the IPv6 header can have 32 bytes, which is enough under normal circumstances use
  3. If the link cannot even support 1280 PDUs, QUIC will make this UDP unusable (equal to not being able to connect to the port), and then fallback to TCP

For 3, there is another question, that is, how does QUIC know whether the MTU of 1280 can be transmitted? I found a very magical setting of this protocol, that is, the size of each IP packet is the same, such as MTU=1280, then each layer 2 packet sent is 1280 bytes, if it is not enough, padding to 1280, if If it cannot be passed, then the handshake packet cannot be passed, and it is discarded from the beginning.

quic-size.png

All QUIC packages are the same size

It’s wonderful, but I think one thing to note is that the intermediate overlay protocol may have to pay attention to this point when designing: for example, if the Overlay wants to insert 100 bytes of data in the middle, and the MTU is set to 1400, then the 1450 packet should not be received . That is, even if sometimes there is no 100 bytes of data to be inserted, it should be padding 100 bytes. Otherwise, for a protocol like QUIC, there may be no problem in the handshake phase, and let it pass. The protocol considers MTU=1450, but frequent packet loss may occur later.

Finally, I would like to reiterate that I don’t know much about QUIC, I just read some information. If the reader finds an error in this article, please point it out.

The post MTU and UDP (and protocols based on UDP) first appeared on Kawabanga! .

This article is transferred from: https://www.kawabangga.com/posts/5160
This site is only for collection, and the copyright belongs to the original author.