LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: HA-LVS DR ip_finish_output: bad unowned skb

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: HA-LVS DR ip_finish_output: bad unowned skb
Cc: "Aneesh Kumar K.V" <aneesh.kumar@xxxxxx>
Cc: OpenSSI Developers <ssic-linux-devel@xxxxxxxxxxxxxxxxxxxxx>
From: Graeme Fowler <graeme@xxxxxxxxxxx>
Date: Mon, 05 Sep 2005 20:39:35 +0100
Hi

This is probably/possibly a "me too!".

On Sun, 2005-09-04 at 22:13 -0400, Roger Tsang wrote:
> I'm running a streamed inline LVS-DR setup with "sed" scheduler where the 
> directors are also realservers in itself. Incoming traffic goes to only one 
> of the directors which has the VIP, so the other director is passive (for 
> failover). This has worked wonderfully in kernel-2.4. However with 
> kernel-2.6's new ipvs code, I see that the passive director is also trying 
> to LVS-DR route already loadbalanced packets received from its internal 
> (eth1) interface.
<snip>

I've also seen this behaviour. Specifically it's happened to me in a
3-server DNS platform using Keepalived, where all three servers are
directors and realservers:

Server1 - MASTER pri 200 - public IP on eth0 1.1.1.1/24, private IP on
eth1 1.2.1.1/24, VIP on eth0 1.1.1.101

Server2 - BACKUP pri 150 - public IP on eth0 1.1.1.2/24, private IP on
eth1 1.2.1.2/24, VIP on lo 1.1.1.101

Server3 - BACKUP pri 100 - public IP on eth0 1.1.1.3/24, private IP on
eth1 1.2.1.3/24, VIP on lo 1.1.1.101

Traffic arriving on Server1 eth0 on tcp|udp port 53 is MARKed using the
iptables 'mangle' table with a value of 0x5 or 0x6 according to
protocol.

LVS then matches on fwmark.

If Server1 fails (or I shut Keepalived down), Server2 makes transition
from BACKUP to MASTER, and everything runs as expected.

Bringing Server1 back up again, however, causes problems - once it goes
MASTER and Server2 returns to BACKUP, the connection table on Server2 is
populated with data from the previous set of connections. Server1 (as
director) then starts to send traffic out via eth1 to the realservers,
and Server2 makes a match in its' connection table and load balances the
traffic to the other two realservers... and of course, Server1 has a
connection in its' table too, so reflects the packet to Server2, and so
on, and so forth.

After several (minutes|hours|days) the internal net is terribly
congested and the cluster dies a horrible death.

To resolve this I'm doing something terribly clumsy, which is to
completely stop keepalived on Server2 (and/or Server3), clear the LVS
table, rmmod the loaded modules, and then restart keepalived. This is
the only way I have found to completely clear the connection table.

It's not the neatest solution but it does work for me; I would say
however that I'd thought I'd have been able to clear the table manually
yet I seem to be unable to do so. Most odd.

I thought that making use of fwmarks to isolate the traffic would help,
but it doesn't seem to have done.

Also, to confuse matters, this happened with my old 2.4.20 kernel
platform, and my lovely new FC3-based 2.6.12 system. Oh well.

This might or might not be the same problem :)

Graeme


<Prev in Thread] Current Thread [Next in Thread>