Re: [lvs-users] 2.3.36 performance

To:	Julian Anastasov <ja@xxxxxx>
Subject:	Re: [lvs-users] 2.3.36 performance
Cc:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx, "Howard M. Kash (Civ, ARL/CISD)" <howard.kash@xxxxxxxxxxx>
From:	Simon Horman <horms@xxxxxxxxxxxx>
Date:	Sun, 31 Oct 2010 10:20:44 +0900

On Sat, Oct 30, 2010 at 06:55:19PM +0300, Julian Anastasov wrote:
> 
>       Hello,
> 
> On Sat, 30 Oct 2010, Simon Horman wrote:
> 
> >>>Could the nf_conntrack changes have caused this?  There were also many
> >>>MSI and bnx2 updates in 2.6.36, so not sure if it's LVS or not.
> >>
> >>Hi Howard,
> >>
> >>Yes, it is very likely that the problem you are seeing
> >>is a regression caused by the introduction of full-NAT.
> >>
> >>There is a fix for this, which will be included in 2.6.37-rc1
> >>but unfortunately it was to invasive to include in 2.6.36 as
> >>the problem was noticed fairly late in the release cycle.
> 
>       If Howard is happy with this idea we can prepare
> single or separated patches for testing with 2.6.36. It will
> make the conntrack optional and disabled by default.

The existing patches seem to apply to 2.6.36.
I'm not sure there is a need for an extra patch / reworked patches
with different behaviour to what will appear in 2.6.37-rc1.

> >>As I understand it, the fix that was made by the three patches
> >>listed below.
> >>
> >>These patches appear to apply cleanly on top of 2.6.36.
> >>The v2.6.36-nfct branch of
> >>git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git
> >>is 2.6.36 plus these three patches.
> >>
> >>I believe that even with these patches in order to avoid the performance
> >>penalty you need to set /proc/sys/net/ipv4/vs/snat_reroute to 0.
> >>
> >>
> >>
> >>commit 8a8030407f55a6aaedb51167c1a2383311fcd707
> >>Author: Julian Anastasov <ja@xxxxxx>
> >>Date:   Tue Sep 21 17:38:57 2010 +0200
> >>
> >>    ipvs: make rerouting optional with snat_reroute
> >>
> >>            Add new sysctl flag "snat_reroute". Recent kernels use
> >>    ip_route_me_harder() to route LVS-NAT responses properly by
> >>    VIP when there are multiple paths to client. But setups
> >>    that do not have alternative default routes can skip this
> >>    routing lookup by using snat_reroute=0.
> >>
> >>    Signed-off-by: Julian Anastasov <ja@xxxxxx>
> >>    Signed-off-by: Patrick McHardy <kaber@xxxxxxxxx>
> >
> >Julian,
> >
> >do you think that it would be possible to add some auto-detection
> >that turns snat_reroute on and off as necessary?
> 
>       Not sure how snat_reroute can be optimized because
> it is for traffic to client. But in the case with OPS
> it is not used at all. It is true that 2.6.36
> changes the picture, I'm just not sure how much because
> now every IPVS packet hits existing netfilter conntrack
> while before 2.6.36 we create and destroy conntrack per packet.
> With boxes having enough memory both for IPVS conns and
> netfilters conntracks and if the netfilter's hash lookups are
> faster than creating new conntrack we can see better
> results. Except nf_conntrack_max I'm not sure what needs to be
> tuned. And 2.6.37-rc1 will add more delays for non-IPVS
> traffic with these new handlers in LOCAL_OUT.

Understood.

> May be we
> have to find some trick there to avoid lookups that are
> not needed. For OPS 2.6.37-rc1 will destroy conntrack
> immediately while 2.6.36 keeps them according to the UDP
> timeout.

OPS is a special case, so I guess there is some scope for optimising it.
But OPS is not the common case IMHO.

>       OTOH, we can reorder some checks in __ip_vs_conn_in_get
> and ip_vs_conn_out_get. In the old days it was equally
> faster to check v4 addresses and ports but now when
> RAM is slower and IPv6 is in the game we can put the ports
> at first position. For example:
> 
> this code
> 
>                 if (cp->af == p->af &&
>                     ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
>                     ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
>                     p->cport == cp->cport && p->vport == cp->vport &&
>                     ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
>                     p->protocol == cp->protocol) {
> 
> can be optimized to:
> 
>                 if (p->cport == cp->cport && p->vport == cp->vport &&
>                     cp->af == p->af &&
>                     ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
>                     ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
>                     ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
>                     p->protocol == cp->protocol) {
> 
>       It will help also to reorder ip_vs_conn fields in this way:
> 
>       struct list_head        c_list;         /* hashed list heads */
>       __be16                  cport;
>       __be16                  vport;
>       __be16                  dport;
>       __u8                    af;             /* address family */
>       __u8                    protocol;       /* Which protocol (TCP/UDP) */
>       volatile __u32          flags;          /* status flags */
>       union nf_inet_addr      caddr;          /* client address */
>       union nf_inet_addr      vaddr;          /* virtual address */
>       union nf_inet_addr      daddr;          /* destination address */
> 
>       It will help IPv4 to see main fields in first 32 bytes.
> 
>       Note that this change converts af and protocol to
> single octet. May be protocol was u16 just to fill space
> but when af was added we can put them together in a word.

These optimisation seem reasonable to me.
I guess we should do some benchmarking to see
if they make any difference.


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread]	Current Thread	[Next in Thread>
[lvs-users] 2.3.36 performance, Howard M. Kash (Civ, ARL/CISD) Re: [lvs-users] 2.3.36 performance, Simon Horman Re: [lvs-users] 2.3.36 performance, Simon Horman Re: [lvs-users] 2.3.36 performance, Julian Anastasov Re: [lvs-users] 2.3.36 performance, Simon Horman <=

Previous by Date:	Re: [lvs-users] 2.3.36 performance, Julian Anastasov
Previous by Thread:	Re: [lvs-users] 2.3.36 performance, Julian Anastasov
Indexes:	[Date] [Thread] [Top] [All Lists]