On 2012-08-24 16:04, David Coulson wrote:
> 1) Update to a more current CentOS - 6.0 has lots of bugs
> 2) How are you moving the IP between servers? Pacemaker, keepalived?
> Have you checked the arp tables or route cache on a system which is
> incorrectly sending traffic to the passive load balancer?
> For what it is worth, in my environment my load balancers both run
> complete ipvs tables built by ldirectord. Each run a master and backup
> sync daemon using different sync IDs. Never had a problem, but I don't
> think your issue has anything to do with IPVS and is more of a
> routing/ARP problem.
Sorry, it looks like I have not made myself clear.
a) everything is fine with traffic redirection during the failover. We
use direct arp broadcasts (via arping -U) and the traffic CORRECTLY goes
to the "new" balancer.
b) the traffic CORRECTLY goes to the "old" balancer, because when it
stops being the load balancer, it continutes to be a member of the
"farm", and the new balancer sends some portion (50% in out test case)
of connections to that server, which happened to be the "old" balancer.
So, again, routing is fine, the traffic does go as it should (to the new
balancer and then to the old one, as a part of the "real server pool" or
"farm"). The problem is that ipvs software on the "old" balancer drops
some of the packets when it is running the sync daemon. As soon as the
sync daemon is stopped, the problem disappears.
> On 8/24/12 7:33 AM, Dmitry Akindinov wrote:
>> We are facing a problem with ipvsadm.
>> A test system consists of 2 Linux boxes (stock CentOS 6.0), both running
>> stock ipvs.
>> The application software provides various TCP services (POP, IMAP, HTTP,
>> etc.), and also controls
>> the ipvs module via the ipvsadm utility.
>> Both systems have ipvsadm running. One system is an "active" load
>> balancer, one is the 'standby' balancer.
>> Both systems are used to serve the TCP request.
>> The iptables are used to put a "100" mark on all packets coming to the
>> VIP address.
>> The "active" loadbalancer has the following config:
>> -A -f 100 -s rr -p 1
>> -a -f 100 -r server1:0 -g -w 1
>> -a -f 100 -r server2:0 -g -w 1
>> The "passive" load balancer config is empty (but its iptable still work
>> and do mark the VIP packets with the 100 mark).
>> The "active" balancer runs the sync daemon in the "master" mode, the
>> "passive" balancer - in the "backup" mode.
>> Everything works fine, all TCP services are balanced, etc.
>> Now, we initiate a failover. During the failover, the ipvs table on the
>> old "active" balancer is cleared,
>> and the new "active" ipvs gets the same configuration as existed on the
>> old one (the same lines as above).
>> The usual arp tricks take place to direct the VIP traffic to the new
>> The old balancer daemon is stopped and restarted in the "backup" mode,
>> the new balancer daemon is stopped
>> and restarted in the "master" mode.
>> Now, the strange thing start to happen:
>> the TCP requests balanced to the new balancer are processed OK.
>> the TCP requests balanced via the new balancer to the old balancer work
>> half-way one:
>> a) the old balancer sees an incoming SYN packet (tcpdump ensures that
>> the incoming packets hit the new load balancer first),
>> opens the connection, and send the initial prompt (for POP3, IMAP4, SMTP
>> protocols) to the client.
>> b) the client receives all SYN-ACKs and the prompt data packets, - the
>> client is connected and it sees the prompt.
>> c) when the client sends any data to the server, the data is delivered
>> to the new load balancer, it redirects it to the old balancer, and there
>> the packet is just dropped on the floor: the application does not see
>> it, the client re-sends the packet after TCP time out, it is delivered
>> to the old balancer via the new one, and it is dropped again.
>> 1. This problem does not appear after every failover, but it happens in
>> many (if not most) cases
>> 2. The problem does not go away even if we wait for a few hours after
>> the failover took place.
>> 3. The problem shows up only for protocols like POP, IMAP, SMTP, where
>> the server immediately sends a prompt back to the client.
>> The problem does not show up when the HTTP protocol is used, i.e. when
>> the client is the first to send data over a newly established connection.
>> Finally. If we stop ipvs on the "old" (inactive) load balancer, where it
>> is not being used, the problem immediately goes away.
>> And if we now restart it (its config rule set being empty before and
>> after restart) - the problem does not reappear.
>> It looks like the "old" balancer remembers something about the VIP, and
>> when we remove its routing rules, it does not clean
>> that table, and it causes problems. Which is strange, because we are
>> talking about *new* connections, i.e. the connections established after
>> the failover is complete: ipvs should not have any info about them that
>> it may keep after it stopped being the "active" balancer.
>> If course, we can just restart ipvs when it goes from the 'active' to
>> the 'passive" state, but that would be kinda rude...
> Please read the documentation before posting - it's available at:
> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Please read the documentation before posting - it's available at:
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users