Hello,
On Sat, 30 Oct 2010, Simon Horman wrote:
>>> Could the nf_conntrack changes have caused this? There were also many
>>> MSI and bnx2 updates in 2.6.36, so not sure if it's LVS or not.
>>
>> Hi Howard,
>>
>> Yes, it is very likely that the problem you are seeing
>> is a regression caused by the introduction of full-NAT.
>>
>> There is a fix for this, which will be included in 2.6.37-rc1
>> but unfortunately it was to invasive to include in 2.6.36 as
>> the problem was noticed fairly late in the release cycle.
If Howard is happy with this idea we can prepare
single or separated patches for testing with 2.6.36. It will
make the conntrack optional and disabled by default.
>> As I understand it, the fix that was made by the three patches
>> listed below.
>>
>> These patches appear to apply cleanly on top of 2.6.36.
>> The v2.6.36-nfct branch of
>> git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git
>> is 2.6.36 plus these three patches.
>>
>> I believe that even with these patches in order to avoid the performance
>> penalty you need to set /proc/sys/net/ipv4/vs/snat_reroute to 0.
>>
>>
>>
>> commit 8a8030407f55a6aaedb51167c1a2383311fcd707
>> Author: Julian Anastasov <ja@xxxxxx>
>> Date: Tue Sep 21 17:38:57 2010 +0200
>>
>> ipvs: make rerouting optional with snat_reroute
>>
>> Add new sysctl flag "snat_reroute". Recent kernels use
>> ip_route_me_harder() to route LVS-NAT responses properly by
>> VIP when there are multiple paths to client. But setups
>> that do not have alternative default routes can skip this
>> routing lookup by using snat_reroute=0.
>>
>> Signed-off-by: Julian Anastasov <ja@xxxxxx>
>> Signed-off-by: Patrick McHardy <kaber@xxxxxxxxx>
>
> Julian,
>
> do you think that it would be possible to add some auto-detection
> that turns snat_reroute on and off as necessary?
Not sure how snat_reroute can be optimized because
it is for traffic to client. But in the case with OPS
it is not used at all. It is true that 2.6.36
changes the picture, I'm just not sure how much because
now every IPVS packet hits existing netfilter conntrack
while before 2.6.36 we create and destroy conntrack per packet.
With boxes having enough memory both for IPVS conns and
netfilters conntracks and if the netfilter's hash lookups are
faster than creating new conntrack we can see better
results. Except nf_conntrack_max I'm not sure what needs to be
tuned. And 2.6.37-rc1 will add more delays for non-IPVS
traffic with these new handlers in LOCAL_OUT. May be we
have to find some trick there to avoid lookups that are
not needed. For OPS 2.6.37-rc1 will destroy conntrack
immediately while 2.6.36 keeps them according to the UDP
timeout.
OTOH, we can reorder some checks in __ip_vs_conn_in_get
and ip_vs_conn_out_get. In the old days it was equally
faster to check v4 addresses and ports but now when
RAM is slower and IPv6 is in the game we can put the ports
at first position. For example:
this code
if (cp->af == p->af &&
ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
p->cport == cp->cport && p->vport == cp->vport &&
((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
p->protocol == cp->protocol) {
can be optimized to:
if (p->cport == cp->cport && p->vport == cp->vport &&
cp->af == p->af &&
ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
p->protocol == cp->protocol) {
It will help also to reorder ip_vs_conn fields in this way:
struct list_head c_list; /* hashed list heads */
__be16 cport;
__be16 vport;
__be16 dport;
__u8 af; /* address family */
__u8 protocol; /* Which protocol (TCP/UDP) */
volatile __u32 flags; /* status flags */
union nf_inet_addr caddr; /* client address */
union nf_inet_addr vaddr; /* virtual address */
union nf_inet_addr daddr; /* destination address */
It will help IPv4 to see main fields in first 32 bytes.
Note that this change converts af and protocol to
single octet. May be protocol was u16 just to fill space
but when af was added we can put them together in a word.
Regards
--
Julian Anastasov <ja@xxxxxx>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
|