LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] NFCT and PMTU

To: Julian Anastasov <ja@xxxxxx>
Subject: Re: [lvs-users] NFCT and PMTU
Cc: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From: lvs@xxxxxxxxxx
Date: Tue, 11 Sep 2012 21:52:53 +0100 (BST)
That patch solves the problem, at least for ICMP port unreachable packets. 
I tested ICMP port unreachable packets without the patch and like ICMP 
must fragment packets they were not being forwarded with conntrack=1. So 
it all looks good.

The port unreachable test is a really useful way to test ICMP forwarding. 
I will be using that in the future!

Thankyou for your help
Tim

On Tue, 11 Sep 2012, Julian Anastasov wrote:

>
>       Hello,
>
> On Mon, 10 Sep 2012, lvs@xxxxxxxxxx wrote:
>
>> I have a number of LVS directors running a mixture of CentOS 5 and CentOS
>> 6 (running kernels 2.6.18-238.5.1 and 2.6.32-71.29.1). I have applied the
>> ipvs-nfct patch to the kernel(s).
>>
>> When I set /proc/sys/net/ipv4/vs/conntrack to 1 I have PMTU issues. When
>> it is set to 0 the issues go away. The issue is when a client on a network
>> with a <1500 byte MTU connects. One of my real servers replies to the
>> clients request with a 1500 byte packet and a device upstream of the
>> client will send an ICMP must fragment. When conntrack=0 the director
>> passed the (modified) ICMP packet on to the client. When conntrack=1 the
>> director doesn't send an ICMP to the real server. I can toggle conntrack
>> and watch the PMTU work and not work.
>
>       I can try to reproduce it with recent kernel.
> Can you tell me what forwarding method is used? NAT? Do
> you have a test environment, so that you can see what
> is shown in logs when IPVS debugging is enabled?
>
>       Do you mean that when conntrack=0 ICMP is forwarded
> back to client instead of being forwarded to real server?
>
>       Now I remember for some problems with ICMP:
>
> - I don't see this change in 2.6.32-71.29.1:
>
> commit b0aeef30433ea6854e985c2e9842fa19f51b95cc
> Author: Julian Anastasov <ja@xxxxxx>
> Date:   Mon Oct 11 11:23:07 2010 +0300
>
>    nf_nat: restrict ICMP translation for embedded header
>
>       Skip ICMP translation of embedded protocol header
>    if NAT bits are not set. Needed for IPVS to see the original
>    embedded addresses because for IPVS traffic the IPS_SRC_NAT_BIT
>    and IPS_DST_NAT_BIT bits are not set. It happens when IPVS performs
>    DNAT for client packets after using nf_conntrack_alter_reply
>    to expect replies from real server.
>
>    Signed-off-by: Julian Anastasov <ja@xxxxxx>
>    Signed-off-by: Simon Horman <horms@xxxxxxxxxxxx>
>
> diff --git a/net/ipv4/netfilter/nf_nat_core.c 
> b/net/ipv4/netfilter/nf_nat_core.c
> index e2e00c4..0047923 100644
> --- a/net/ipv4/netfilter/nf_nat_core.c
> +++ b/net/ipv4/netfilter/nf_nat_core.c
> @@ -462,6 +462,18 @@ int nf_nat_icmp_reply_translation(struct nf_conn *ct,
>                       return 0;
>       }
>
> +     if (manip == IP_NAT_MANIP_SRC)
> +             statusbit = IPS_SRC_NAT;
> +     else
> +             statusbit = IPS_DST_NAT;
> +
> +     /* Invert if this is reply dir. */
> +     if (dir == IP_CT_DIR_REPLY)
> +             statusbit ^= IPS_NAT_MASK;
> +
> +     if (!(ct->status & statusbit))
> +             return 1;
> +
>       pr_debug("icmp_reply_translation: translating error %p manip %u "
>                "dir %s\n", skb, manip,
>                dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY");
> @@ -496,20 +508,9 @@ int nf_nat_icmp_reply_translation(struct nf_conn *ct,
>
>       /* Change outer to look the reply to an incoming packet
>        * (proto 0 means don't invert per-proto part). */
> -     if (manip == IP_NAT_MANIP_SRC)
> -             statusbit = IPS_SRC_NAT;
> -     else
> -             statusbit = IPS_DST_NAT;
> -
> -     /* Invert if this is reply dir. */
> -     if (dir == IP_CT_DIR_REPLY)
> -             statusbit ^= IPS_NAT_MASK;
> -
> -     if (ct->status & statusbit) {
> -             nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> -             if (!manip_pkt(0, skb, 0, &target, manip))
> -                     return 0;
> -     }
> +     nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> +     if (!manip_pkt(0, skb, 0, &target, manip))
> +             return 0;
>
>       return 1;
> }
>
>       If this patch does not help we have to debug it
> somehow.
>
>> I would happily leave conntrack off, but it has a huge performance impact.
>> With my traffic profile the softirq load doubles when I turn off
>> conntrack. My busiest director is doing 2.1Gb of traffic and with
>> conntrack off it can probably only handle 2.5Gb.
>
>       It is interesting to know about such comparison
> for conntrack=0 and 1. Can you confirm again both numbers?
> 2.1 is not better than 2.5.
>
>> I am hoping that this issue has been observed and fixed and someone will
>> be able to point me to the patch so I can back port it to my kernels (or
>> finally get rid of CentOS 5!).
>>
>> Thanks
>> Tim
>
> Regards
>
> --
> Julian Anastasov <ja@xxxxxx>
>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>