That patch solves the problem, at least for ICMP port unreachable packets.
I tested ICMP port unreachable packets without the patch and like ICMP
must fragment packets they were not being forwarded with conntrack=1. So
it all looks good.
The port unreachable test is a really useful way to test ICMP forwarding.
I will be using that in the future!
Thankyou for your help
Tim
On Tue, 11 Sep 2012, Julian Anastasov wrote:
>
> Hello,
>
> On Mon, 10 Sep 2012, lvs@xxxxxxxxxx wrote:
>
>> I have a number of LVS directors running a mixture of CentOS 5 and CentOS
>> 6 (running kernels 2.6.18-238.5.1 and 2.6.32-71.29.1). I have applied the
>> ipvs-nfct patch to the kernel(s).
>>
>> When I set /proc/sys/net/ipv4/vs/conntrack to 1 I have PMTU issues. When
>> it is set to 0 the issues go away. The issue is when a client on a network
>> with a <1500 byte MTU connects. One of my real servers replies to the
>> clients request with a 1500 byte packet and a device upstream of the
>> client will send an ICMP must fragment. When conntrack=0 the director
>> passed the (modified) ICMP packet on to the client. When conntrack=1 the
>> director doesn't send an ICMP to the real server. I can toggle conntrack
>> and watch the PMTU work and not work.
>
> I can try to reproduce it with recent kernel.
> Can you tell me what forwarding method is used? NAT? Do
> you have a test environment, so that you can see what
> is shown in logs when IPVS debugging is enabled?
>
> Do you mean that when conntrack=0 ICMP is forwarded
> back to client instead of being forwarded to real server?
>
> Now I remember for some problems with ICMP:
>
> - I don't see this change in 2.6.32-71.29.1:
>
> commit b0aeef30433ea6854e985c2e9842fa19f51b95cc
> Author: Julian Anastasov <ja@xxxxxx>
> Date: Mon Oct 11 11:23:07 2010 +0300
>
> nf_nat: restrict ICMP translation for embedded header
>
> Skip ICMP translation of embedded protocol header
> if NAT bits are not set. Needed for IPVS to see the original
> embedded addresses because for IPVS traffic the IPS_SRC_NAT_BIT
> and IPS_DST_NAT_BIT bits are not set. It happens when IPVS performs
> DNAT for client packets after using nf_conntrack_alter_reply
> to expect replies from real server.
>
> Signed-off-by: Julian Anastasov <ja@xxxxxx>
> Signed-off-by: Simon Horman <horms@xxxxxxxxxxxx>
>
> diff --git a/net/ipv4/netfilter/nf_nat_core.c
> b/net/ipv4/netfilter/nf_nat_core.c
> index e2e00c4..0047923 100644
> --- a/net/ipv4/netfilter/nf_nat_core.c
> +++ b/net/ipv4/netfilter/nf_nat_core.c
> @@ -462,6 +462,18 @@ int nf_nat_icmp_reply_translation(struct nf_conn *ct,
> return 0;
> }
>
> + if (manip == IP_NAT_MANIP_SRC)
> + statusbit = IPS_SRC_NAT;
> + else
> + statusbit = IPS_DST_NAT;
> +
> + /* Invert if this is reply dir. */
> + if (dir == IP_CT_DIR_REPLY)
> + statusbit ^= IPS_NAT_MASK;
> +
> + if (!(ct->status & statusbit))
> + return 1;
> +
> pr_debug("icmp_reply_translation: translating error %p manip %u "
> "dir %s\n", skb, manip,
> dir == IP_CT_DIR_ORIGINAL ? "ORIG" : "REPLY");
> @@ -496,20 +508,9 @@ int nf_nat_icmp_reply_translation(struct nf_conn *ct,
>
> /* Change outer to look the reply to an incoming packet
> * (proto 0 means don't invert per-proto part). */
> - if (manip == IP_NAT_MANIP_SRC)
> - statusbit = IPS_SRC_NAT;
> - else
> - statusbit = IPS_DST_NAT;
> -
> - /* Invert if this is reply dir. */
> - if (dir == IP_CT_DIR_REPLY)
> - statusbit ^= IPS_NAT_MASK;
> -
> - if (ct->status & statusbit) {
> - nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> - if (!manip_pkt(0, skb, 0, &target, manip))
> - return 0;
> - }
> + nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> + if (!manip_pkt(0, skb, 0, &target, manip))
> + return 0;
>
> return 1;
> }
>
> If this patch does not help we have to debug it
> somehow.
>
>> I would happily leave conntrack off, but it has a huge performance impact.
>> With my traffic profile the softirq load doubles when I turn off
>> conntrack. My busiest director is doing 2.1Gb of traffic and with
>> conntrack off it can probably only handle 2.5Gb.
>
> It is interesting to know about such comparison
> for conntrack=0 and 1. Can you confirm again both numbers?
> 2.1 is not better than 2.5.
>
>> I am hoping that this issue has been observed and fixed and someone will
>> be able to point me to the patch so I can back port it to my kernels (or
>> finally get rid of CentOS 5!).
>>
>> Thanks
>> Tim
>
> Regards
>
> --
> Julian Anastasov <ja@xxxxxx>
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|