LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] lvs tun and ipip fragments

To: Kelsey Cummings <kgc@xxxxxxxxxxxxxx>
Subject: Re: [lvs-users] lvs tun and ipip fragments
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: Julian Anastasov <ja@xxxxxx>
Date: Fri, 18 May 2012 01:14:35 +0300 (EEST)
        Hello,

On Wed, 9 May 2012, Kelsey Cummings wrote:

> Issues with LVS-Tun, PMTUD and MSS fixup seem to come up periodically.
> We want to use LVS-Tun but do not want to end up in a situation where
> we're relying on functional PMTUD or selective MSS fixup on the real 
> servers.  The main issue being that if either of these fail to function
> a client will end up in a situation where their sessions may hang in
> such a way nothing short of a co-op testing with tpcdumps would reveal
> the cause of the problem.
> 
> It seems that a more obvious solution to this is to allow the kernel to 
> frag the IPIP as needed by clearing the DF bit on the packet and
> skipping the MTU exceeded check.  This is a technical violation of RFC
> 2003 but under some circumstances it is advantageous to just let it
> fragment.  Any additional overhead of handling the frags is relatively 
> insignificant and we end up able to handle ~100mbits of traffic inbound
> per real server before there is likely to be a collision in 
> fragmentation reassembly, and even then, only if packets arrive at the
> real server out of order.
> 
> The patch to hack this into the existing code is only two lines long and
> appears to work correctly in limited testing.  A sysctl variable to
> control the behavior would be easy enough.

        ipip_tunnel_xmit() uses such logic to disable
the PMTUD with nopmtudisc flag in the ip tool. May be we
can do the same, the question is how to provide this flag,
may be with IPVS_DEST_ATTR_NO_PMTUD attribute in
include/linux/ip_vs.h

        May be it should come as some new field in
struct ip_vs_dest, eg. unsigned int mode_flags; and
#define IP_VS_DEST_MODE_NO_PMTUD 0x0001

        Then we can check dest->mode_flags & IP_VS_DEST_MODE_NO_PMTUD
if we have to clear DF.

        This should be coordinated with Simon for
corresponding ipvsadm changes for the IPVS_DEST_ATTR_NO_PMTUD
attribute.

> Thoughts?
> 
> --- 
> /root/rpmbuild/SOURCES/linux-2.6.32-220.13.1.el6/net/netfilter/ipvs/ip_vs_xmit.c
>     2009-12-02 19:51:21.000000000 -0800
> +++ net/netfilter/ipvs/ip_vs_xmit.c     2012-05-09 17:24:05.180140929 -0700
> @@ -559,6 +559,9 @@
>         if (skb_dst(skb))
>                 skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);
>  
> +       //clear the DF bit so the kernel will frag the packet
> +       old_iph->frag_off = 0;
> +
>         df |= (old_iph->frag_off & htons(IP_DF));
>  
>         if ((old_iph->frag_off & htons(IP_DF))
> @@ -608,7 +611,7 @@
>         iph                     =       ip_hdr(skb);
>         iph->version            =       4;
>         iph->ihl                =       sizeof(struct iphdr)>>2;
> -       iph->frag_off           =       df;
> +       iph->frag_off           =       0;
>         iph->protocol           =       IPPROTO_IPIP;
>         iph->tos                =       tos;
>         iph->daddr              =       rt->rt_dst;
> 
> -- 
> Kelsey Cummings - kgc@xxxxxxxxxxxxxx      sonic.net, inc.
> System Architect                          2260 Apollo Way
> 707.522.1000                              Santa Rosa, CA 95407

Regards

--
Julian Anastasov <ja@xxxxxx>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>