Re: ipvs failback patch

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: ipvs failback patch
From: Ranga Nathan <kairanga@xxxxxxx>
Date: Tue, 22 Nov 2005 10:51:10 -0800
Thanks for the info. I still have problems... :-(

Leon Keijser wrote:


I had the same problem that you have. Failover works perfectly, but failback
to the master caused all connections to drop. I fixed it by first making
sure both daemons (master & backup) run before heartbeat is started. Second,
that still caused some clients to disconnect, so i added a 'sleep' of about
30s before heartbeat starts. That fixed it for me.

Oh, and i don't know if this matters any, but on the primary LVS i started
the daemons with syncid 20 (master) and 21 (backup). On the secondary LVS,
20 (backup) and 21 (master).
This seems to follow Horms's response regarding syncmaster.
OK, I have done the same things as you suggested. Now I have
--start-daemon master --syncid 20
--start-daemon backup --syncid 21
on LD1 and
--start-daemon master --syncid 21
--start-daemon backup --syncid 20
on LD2.

Both are running on both  LDs.

This is how I test.
I start an ssh session for VIP when LD1 is master. I see the progress using "watch ipvsadm -L -n". I then reboot LD1. In a minute I see LD2 taking over the connection smoothly. After LD1 reboots, it snatches the connections back and the ssh session drops. Firstly should LD1 always snatch back from LD2? Or only when LD2 drops out?

I put the sleep 30 delay in the heartbeat startup script.
Anything else I should do?


I am sorry, I did not explain very well. My language is not very technical :-) I did what you suggested before. From master to backup the failover worked fine. I did not lose any connections. From backup, when the master (after a reboot) snatched back the nodes, the connections dropped. I am sure that when the master came backup, it started the daemon, as I had "--start-daemon master" in the /etc/ipvsadm.rules. I confirmed this by doing "ipvsadm --start-daemon master" on the master and it said "Daemon has already run". I could not query if it is in "master" status. I presumed so.

I there a way for the master and backup swap dynamically when snatching the nodes?

