Path
MTU Discovery is a technique used to dynamically discover the path MTU from the
source to the destination using the DF (Don’t Fragment) bit from the IP header.
It is the smallest effective transmitted MTU along the path defined by IP
source, IP destination and maybe TOS of the packets. The basic idea of the
mechanism is that the source will assume the path MTU to be equal with the
known first hop MTU and to send IP packets with DF bit set with that known MTU
along the path. If along the path, a link has a next-hop smaller MTU the router
will drop the IP datagram and send back an ICMP Destination Unreachable with
the code Fragmentation needed and DF set. After receiving this ICMP message the
host will reduce the path MTU for that particular link.
There
are more ways to implement path MTU discovery as stated in RFC 1191 , the big differences are between router and
host implementation:
– The router must include the MTU of the
next-hop in the lower 16 bit of the ICMP Destination Unreachable –
Fragmentation needed and DF bit set (datagram too big as RFC says). This is the
most used implementation but it has its flows.
– The host implementation on the other
hand might elect to reduce the path MTU to the next-hop value received on the
ICMP Datagram too big message or clear the DF bit on the IP header.
– When the ICMP datagram too big message
does not contain the MTU of the next-hop, the things get complicated and a lot
of possible algorithms might be implemented by the hosts.
1. Router specification
The
difference between the MTU of the link and the MSS can be found here IP-MTU-vs-MSS, as a summary you can check the
below formula:
MSS =
MTU – IP Header (+ Options if present) – TCP Header (+ Options if present)
IP
Header = 20 / 60 bytes without / with Options
TCP
Header = 20 / 60 bytes without / with Options
Normally,
without the Path MTU Discovery implemented and used, when a router needs to
send an IP datagram which is bigger than the IP MTU configured on the local
interface (it is still the Maximum transmit Unit not Received one), the router
will fragment the IP datagram, you can check here IP-Fragmentation. Same as for MPLS packet described
here MPLS-Fragmentation.
If the Path MTU Discovery is enabled,meaning the DF bit is set on the IP datagram,if the MTU of the next-hop is smaller than the IP datagram, than the router is required to return a Destination Unreachable – Fragmentation needed and DF set ICMP message (Type 3, Code 4) back to the source. The router MUST include in the ICMP header the MTU of the next-hop, in the low-order 16 bits of the ICMP unused header field, the high-order 16 bits remained unused must be set to 0(zero), you can check the ICMP modified header below:
The value carried in the Next-Hop MTU field must have the following minimum and maximum bytes size:
– Minimum forward size must be 68 bytes and minimum received size must be 576 bytes, as specified in RFC 791 , every internet module must be able to forward a datagram of 68 octet without further fragmentation. Thus the minimum value of this field should be 68 bytes.
– Maximum size should be the largest datagram size which can be forwarded by this router without the need of fragmentation; the size includes the IP header and data, but does not include any other lower level headers,like Ethernet header.
2. Host specification
When
a host receives a Datagram too big message, it must reduce the datagram size
based on the Next-Hop MTU value from the ICMP message. The host must force the
PMTU Discovery process to converge using one of the following:
– Reducing the PTU of the datagram.
– Clearing the DF bit on the IP datagram.
– Reducing the PTU of the datagram.
– Clearing the DF bit on the IP datagram.
Considering
the PMTU might be changed over time, the host has the following timers of
sending new increased MTU size over the link in order to check if the MTU has
been changed, even if the MTU theoretically will not be changed frequently:
– If the path MTU has been decreased, the host must take immediate action as fast as possible without waiting for any timer to expire.
– The host must check if the path MTU has been increased by an estimated higher PMTU, but no more than 5 minute after the Datagram too big message has been received in a previous attempt of increasing the MTU of the path and no more than 1 minute after a successfully increase took place. The RFC recommends these values to be twice of these minimum values, meaning 10 and 2 minutes respectively.
3. Other possible host implementation mechanisms
– If the path MTU has been decreased, the host must take immediate action as fast as possible without waiting for any timer to expire.
– The host must check if the path MTU has been increased by an estimated higher PMTU, but no more than 5 minute after the Datagram too big message has been received in a previous attempt of increasing the MTU of the path and no more than 1 minute after a successfully increase took place. The RFC recommends these values to be twice of these minimum values, meaning 10 and 2 minutes respectively.
3. Other possible host implementation mechanisms
The
implementation methods for hosts to deal with the unmodified Next-Hop MTU field
from the ICMP Datagram too big message are not standardized by this RFC, only
some guidelines re described. The host again should have a mechanism to make
PMTU Discovery mechanism to converge using one of the following:
– Reducing the size of the MTU to the minimum value of 576 bytes, this might fail at link efficiency utilization or might even lead to fragmentation some times. But it is still the fastest method.
– Clearing the DF bit set from the IP datagram, this might take a while to be activated on all link segments of the path and some Datagram too big messages could still arrive for a while.
– Some algorithm of searching the proper MTU might be implemented. Any search strategy must store the already tested MTU.
These correct MTU size searching algorithms are implying to continue to send IP datagram with DF bit set but with different MTU size. The not recommended algorithms, because of the slow convergence, are or to multiply the estimated MTU value with a constant (for example 0.75) or to do a binary search (but this requires a complex host PMTU Discovery implementation). One algorithm which might be faster is to assume that there are actually a few MTUs standard values and to search among them and also the MTU plateau, which is a power of 2. If the accurate MTU value is not present in the plateau table, than the algorithm will not underestimate the value by more than a power of 2.
– Reducing the size of the MTU to the minimum value of 576 bytes, this might fail at link efficiency utilization or might even lead to fragmentation some times. But it is still the fastest method.
– Clearing the DF bit set from the IP datagram, this might take a while to be activated on all link segments of the path and some Datagram too big messages could still arrive for a while.
– Some algorithm of searching the proper MTU might be implemented. Any search strategy must store the already tested MTU.
These correct MTU size searching algorithms are implying to continue to send IP datagram with DF bit set but with different MTU size. The not recommended algorithms, because of the slow convergence, are or to multiply the estimated MTU value with a constant (for example 0.75) or to do a binary search (but this requires a complex host PMTU Discovery implementation). One algorithm which might be faster is to assume that there are actually a few MTUs standard values and to search among them and also the MTU plateau, which is a power of 2. If the accurate MTU value is not present in the plateau table, than the algorithm will not underestimate the value by more than a power of 2.
Even
it is generally not a good idea to disable the path MTU discovery on the hosts
or routers you can find below some of the methods to disable it based on the operating
system.
4.1 Solaris 10
It
is enabled by default and, for older versions, has more aggressive timers than
the RFC recommends, Solaris tries to rediscover the path MTU every 30 seconds.
Since the 2.5 the timer is set to 60 seconds.
ndd -set /dev/ip ip_path_mtu_discovery
0 – Disable the TCP
path MTU Discovery
ip_ire_pathmtu_interval
– Configure the path
MTU refresh interval
4.2 HP-UX
By
default, Path MTU Discovery is enabled for TCP sockets and disabled for UDP
sockets. The ndd command can control three MTU related variables:
ip_ire_pathmtu_interval - Controls the probe interval for
PMTU
ip_pmtu_strategy - Controls the Path MTU Discovery
strategy
tcp_ignore_path_mtu - Disable setting MSS from ICMP 'Frag
Needed'
nettune -s tcp_pmtu 0 – Disable the TCP path MTU Discovery
ndd -h ip_pmtu_strategy
0 – Disable the TCP
path MTU Discovery
nettune -s udp_pmtu 0 – Disable the UDP path MTU Discovery
4.3 IBM AIX Unix
By
default, the tcp_pmtu_discover and udp_pmtu_discover options are
disabled on AIX® 4.2.1 through AIX 4.3.1, and enabled
on AIX 4.3.2 and later.
no -o tcp_pmtu_discover=0
– Disable the TCP
path MTU Discovery
no -o
udp_pmtu_discover=0 –
Disable the UDP path MTU Discovery
no -o pmtu_default_age=5 – Configure the ageing
time for the path MTU value to 5 minutes(default 10 minutes)
no -o
pmtu_rediscover_interval=5–
Configure the rediscover interval for the path MTU value to 5 minutes(default
10 minutes)
4.4 LINUX
By
default, Path MTU Discovery is enabled for both TCP and UDP. Linux can be
configured to handle Path MTU Discovery in the following ways:
– IP_PMTUDISC_DONT – Don’t send IP packets with DF set, therefore do not use Path MTU Discover.
– IP_PMTUDISC_DO – Do set the DF flag in the header of the packets generated on the local node (not forwarded ones), in an attempt to find the best PMTU for every transmission.
– IP_PMTUDISC_WANT – Decide whether to use path MTU Discovery on a per-route basis, this is the default.
– IP_PMTUDISC_PROBE – Set the DF flag, but ignore the Path MTU.
– IP_PMTUDISC_DONT – Don’t send IP packets with DF set, therefore do not use Path MTU Discover.
– IP_PMTUDISC_DO – Do set the DF flag in the header of the packets generated on the local node (not forwarded ones), in an attempt to find the best PMTU for every transmission.
– IP_PMTUDISC_WANT – Decide whether to use path MTU Discovery on a per-route basis, this is the default.
– IP_PMTUDISC_PROBE – Set the DF flag, but ignore the Path MTU.
You
can disable it using the following command:
echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc
/proc/sys/net/ipv4/route/min_pmtu
– Configure the
minimum MTU (default 552)
/proc/sys/net/ipv4/route/mtu_expires
– Configure the
ageing time for the path MTU value (default 600 seconds)
4.5 Windows 95/98/ME
By
default, Path MTU Discovery is enabled; you can disable it using the following
entry in registers:
Hkey_Local_Machine\System\CurrentControlSet\Services\VxD\MSTCP
PMTUDiscovery = 0
Data Type: DWORD
4.6 Windows 2000/XP
By
default, Path MTU Discovery is enabled; you can disable it using the following
entry in registers:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters
\EnablePMTUDiscovery
PMTU Discovery: 0
Data Type: DWORD
4.7 Cisco router
You
can disable only the TCP path MTU discovery using the command below, it is
disabled by default and the age-time is 10 minutes.
no ip tcp
path-mtu-discovery [age-timer {minutes | infinite}]
You
can enable it using the command below:
ip tcp
path-mtu-discovery [age-timer {minutes | infinite}]
BGP
Path MTU Discovery is enabled by default on the Cisco routers for all BGP
sessions, can be enabled or disabled using the following commands:
bgp
transport path-mtu-discovery
Can
be enabled per neighbor using the following command:
no neighbor {ip-address | peer-group-name} transport {connection-mode | path-mtu-discovery}
4.8 Juniper router
Path
MTU discovery for outgoing TCP connections is enabled by default, in order to
disable it you can run:
[edit system
internet-options]
no-path-mtu-discovery
In
Junos OS, TCP path MTU discovery is disabled by default for all BGP neighbor
sessions. It can be enabled per neighbor, per group or routing-instance using
the following command in the specific configuration view:
mtu-discovery;
4.9 Huawei router
No
public documentation available.
If
you need to change the default MSS size, please check the link IP-MTU-vs-MSS where you can find more informations.
5. Dealing with broken path MTU
Between most common reasons which lead to path MTU discovery failures, including black-holes are the following:
– The router does not have implemented the path MTU Discovery and does not send back in the ICMP error with the MTU of next-hop or does not send the ICMP at all(but the router is still dropping the too big packet).
– The host, IP source, has an implementation or configuration problem and ignores the ICMP error messages.
– A router or a firewall in the way from the router to the source discards the ICMP error messages before they can reach the IP source.
5. Dealing with broken path MTU
Between most common reasons which lead to path MTU discovery failures, including black-holes are the following:
– The router does not have implemented the path MTU Discovery and does not send back in the ICMP error with the MTU of next-hop or does not send the ICMP at all(but the router is still dropping the too big packet).
– The host, IP source, has an implementation or configuration problem and ignores the ICMP error messages.
– A router or a firewall in the way from the router to the source discards the ICMP error messages before they can reach the IP source.
All
of them can be solved if the path MTU discovery is disabled, as explained in
the above chapter, even this is not the best practice solution it is a solution
to avoid the black-holed traffic.
The
last one is the most common problem and most of the time appears due to
configuration error, can be solved as below, solution available only for Cisco
implementation, on Juniper the DF bit can be cleared only for some tunnel
interfaces the other ones ca be implemented with the specific commands:
– Packet filtering ACL should be
modified to accept the most important ICMP messages and not to deny all ICMP:
access-list
101 permit icmp any any unreachable
access-list
101 permit icmp any any time-exceeded
access-list
101 deny icmp any any
access-list
101 permit ip any any
– Clear the DF bit on the router and
allow fragmentation anyway
interface
gi x/x
ip
policy route-map clear-df-bit
route-map
clear-df-bit permit 10
match ip address 111
set ip df 0
access-list
111 permit tcp any any
– Manipulate
the TCP MSS option value MSS
int gi x/x
ip tcp adjust-mss 1460
Path MTU Discovery is a good mechanism to have enabled in the network as long as there are no over zealous network administrators, or server or firewalls administrators whom will disabled the ICMP on the interfaces, especially for this case the ICMP Destination Unreachable – Fragmentation needed and DF bit set must be let enabled.
Path MTU Discovery is a good mechanism to have enabled in the network as long as there are no over zealous network administrators, or server or firewalls administrators whom will disabled the ICMP on the interfaces, especially for this case the ICMP Destination Unreachable – Fragmentation needed and DF bit set must be let enabled.
by Mihaela Paraschivu
No comments: