Issues with LVS-Tun, PMTUD and MSS fixup seem to come up periodically.
We want to use LVS-Tun but do not want to end up in a situation where
we're relying on functional PMTUD or selective MSS fixup on the real
servers. The main issue being that if either of these fail to function
a client will end up in a situation where their sessions may hang in
such a way nothing short of a co-op testing with tpcdumps would reveal
the cause of the problem.
It seems that a more obvious solution to this is to allow the kernel to
frag the IPIP as needed by clearing the DF bit on the packet and
skipping the MTU exceeded check. This is a technical violation of RFC
2003 but under some circumstances it is advantageous to just let it
fragment. Any additional overhead of handling the frags is relatively
insignificant and we end up able to handle ~100mbits of traffic inbound
per real server before there is likely to be a collision in
fragmentation reassembly, and even then, only if packets arrive at the
real server out of order.
The patch to hack this into the existing code is only two lines long and
appears to work correctly in limited testing. A sysctl variable to
control the behavior would be easy enough.
Thoughts?
---
/root/rpmbuild/SOURCES/linux-2.6.32-220.13.1.el6/net/netfilter/ipvs/ip_vs_xmit.c
2009-12-02 19:51:21.000000000 -0800
+++ net/netfilter/ipvs/ip_vs_xmit.c 2012-05-09 17:24:05.180140929 -0700
@@ -559,6 +559,9 @@
if (skb_dst(skb))
skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);
+ //clear the DF bit so the kernel will frag the packet
+ old_iph->frag_off = 0;
+
df |= (old_iph->frag_off & htons(IP_DF));
if ((old_iph->frag_off & htons(IP_DF))
@@ -608,7 +611,7 @@
iph = ip_hdr(skb);
iph->version = 4;
iph->ihl = sizeof(struct iphdr)>>2;
- iph->frag_off = df;
+ iph->frag_off = 0;
iph->protocol = IPPROTO_IPIP;
iph->tos = tos;
iph->daddr = rt->rt_dst;
--
Kelsey Cummings - kgc@xxxxxxxxxxxxxx sonic.net, inc.
System Architect 2260 Apollo Way
707.522.1000 Santa Rosa, CA 95407
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|