On Sat, Oct 30, 2010 at 06:55:19PM +0300, Julian Anastasov wrote:
>
> Hello,
>
> On Sat, 30 Oct 2010, Simon Horman wrote:
>
> >>>Could the nf_conntrack changes have caused this? There were also many
> >>>MSI and bnx2 updates in 2.6.36, so not sure if it's LVS or not.
> >>
> >>Hi Howard,
> >>
> >>Yes, it is very likely that the problem you are seeing
> >>is a regression caused by the introduction of full-NAT.
> >>
> >>There is a fix for this, which will be included in 2.6.37-rc1
> >>but unfortunately it was to invasive to include in 2.6.36 as
> >>the problem was noticed fairly late in the release cycle.
>
> If Howard is happy with this idea we can prepare
> single or separated patches for testing with 2.6.36. It will
> make the conntrack optional and disabled by default.
The existing patches seem to apply to 2.6.36.
I'm not sure there is a need for an extra patch / reworked patches
with different behaviour to what will appear in 2.6.37-rc1.
> >>As I understand it, the fix that was made by the three patches
> >>listed below.
> >>
> >>These patches appear to apply cleanly on top of 2.6.36.
> >>The v2.6.36-nfct branch of
> >>git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git
> >>is 2.6.36 plus these three patches.
> >>
> >>I believe that even with these patches in order to avoid the performance
> >>penalty you need to set /proc/sys/net/ipv4/vs/snat_reroute to 0.
> >>
> >>
> >>
> >>commit 8a8030407f55a6aaedb51167c1a2383311fcd707
> >>Author: Julian Anastasov <ja@xxxxxx>
> >>Date: Tue Sep 21 17:38:57 2010 +0200
> >>
> >> ipvs: make rerouting optional with snat_reroute
> >>
> >> Add new sysctl flag "snat_reroute". Recent kernels use
> >> ip_route_me_harder() to route LVS-NAT responses properly by
> >> VIP when there are multiple paths to client. But setups
> >> that do not have alternative default routes can skip this
> >> routing lookup by using snat_reroute=0.
> >>
> >> Signed-off-by: Julian Anastasov <ja@xxxxxx>
> >> Signed-off-by: Patrick McHardy <kaber@xxxxxxxxx>
> >
> >Julian,
> >
> >do you think that it would be possible to add some auto-detection
> >that turns snat_reroute on and off as necessary?
>
> Not sure how snat_reroute can be optimized because
> it is for traffic to client. But in the case with OPS
> it is not used at all. It is true that 2.6.36
> changes the picture, I'm just not sure how much because
> now every IPVS packet hits existing netfilter conntrack
> while before 2.6.36 we create and destroy conntrack per packet.
> With boxes having enough memory both for IPVS conns and
> netfilters conntracks and if the netfilter's hash lookups are
> faster than creating new conntrack we can see better
> results. Except nf_conntrack_max I'm not sure what needs to be
> tuned. And 2.6.37-rc1 will add more delays for non-IPVS
> traffic with these new handlers in LOCAL_OUT.
Understood.
> May be we
> have to find some trick there to avoid lookups that are
> not needed. For OPS 2.6.37-rc1 will destroy conntrack
> immediately while 2.6.36 keeps them according to the UDP
> timeout.
OPS is a special case, so I guess there is some scope for optimising it.
But OPS is not the common case IMHO.
> OTOH, we can reorder some checks in __ip_vs_conn_in_get
> and ip_vs_conn_out_get. In the old days it was equally
> faster to check v4 addresses and ports but now when
> RAM is slower and IPv6 is in the game we can put the ports
> at first position. For example:
>
> this code
>
> if (cp->af == p->af &&
> ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
> ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
> p->cport == cp->cport && p->vport == cp->vport &&
> ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
> p->protocol == cp->protocol) {
>
> can be optimized to:
>
> if (p->cport == cp->cport && p->vport == cp->vport &&
> cp->af == p->af &&
> ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
> ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
> ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
> p->protocol == cp->protocol) {
>
> It will help also to reorder ip_vs_conn fields in this way:
>
> struct list_head c_list; /* hashed list heads */
> __be16 cport;
> __be16 vport;
> __be16 dport;
> __u8 af; /* address family */
> __u8 protocol; /* Which protocol (TCP/UDP) */
> volatile __u32 flags; /* status flags */
> union nf_inet_addr caddr; /* client address */
> union nf_inet_addr vaddr; /* virtual address */
> union nf_inet_addr daddr; /* destination address */
>
> It will help IPv4 to see main fields in first 32 bytes.
>
> Note that this change converts af and protocol to
> single octet. May be protocol was u16 just to fill space
> but when af was added we can put them together in a word.
These optimisation seem reasonable to me.
I guess we should do some benchmarking to see
if they make any difference.
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|