> Sorry, I got another problem with lvs.
>
> Here is my lvs.cf:
> =======================
> primary = 10.0.0.41
> service = lvs
> rsh_command = rsh
> backup_active = 1
> backup = 10.0.0.42
> heartbeat = 1
> heartbeat_port = 539
> keepalive = 6
> deadtime = 18
> network = nat
> nat_router = 10.0.1.254 eth1:0
> virtual gretel {
> active = 1
> address = 10.0.0.38 eth0:0
> port = 80
> load_monitor = uptime
> scheduler = wrr
> protocol = tcp
> server gretel4 {
> address = 10.0.1.4
> active = 1
> }
> server gretel3 {
> address = 10.0.1.3
> active = 1
> weight = 2
> }
> }
> ========================
> I have using "while true;do lynx -dump 10.0.0.38;done" from a client to
> test the lvs. Then, I shut down the primary node(gretel). The client
> side seems got a small period of time of hang (canot access the page).
> Is this a correct situation in order to wait for the backup node
> (gretelf) to take over? But, later, the client can access the page.
There should be a very brief period of unavailability, but it should
not last long.
What network is your client located on? Is it on the same subnet
as gretel and gretelf? If so, you may be seeing an arp cache
timeout, then your client issues an arp request and gets the new
MAC address for the 10.0.0.38. However, on failover, when gretelf
takes over, it should be issuing a gratuitous arp to indicate that
it has taken over the VIP. You should be able to see this using
"tcpdump".
> Then, when I restart the gretel, problems come in. Of course, the pulse
> is not started on the gretel.
Why not? "chkconfig --level 345 pulse on" on Redhat will ensure that pulse
starts automatically on reboot.
> Then, I start the pulse on gretel. The
> client no longer can access the page.
That is not good.
> At the same time, the gretel
> doesn't seem to be active and the gretelf still in active.
> Here is the "ps axw" log on gretel:
> ========================
> 958 ? S 0:00 pulse
> 1302 ? S 0:00 /usr/sbin/lvs --nofork -c /etc/lvs.cf
> 1314 ? S 0:00 /usr/sbin/nanny -c -h 10.0.1.4 -p 80 -a 180
> -I /usr/sbin/ipvsadm -t 10 -w 1 -V 10.0.0.38 -M m -U rsh
> 1315 ? S 0:00 /usr/sbin/nanny -c -h 10.0.1.3 -p 80 -a 180
> -I /usr/sbin/ipvsadm -t 10 -w 2 -V 10.0.0.38 -M m -U rsh
This looks active to me. What does "ifconfig -a" show.
> ============================
>
> At the same time, the gretel keep on reporting the following lines in
> /var/log/messages:
> =====================================
> Oct 25 11:58:10 greteld nanny[1314]: running command "rsh" "10.0.1.4"
> "uptime"
> Oct 25 11:58:10 greteld nanny[1315]: running command "rsh" "10.0.1.3"
> "uptime"
> Oct 25 11:58:30 greteld nanny[1314]: running command "rsh" "10.0.1.4"
> "uptime"
> Oct 25 11:58:30 greteld nanny[1315]: running command "rsh" "10.0.1.3"
> "uptime"
> Oct 25 11:58:50 greteld nanny[1314]: running command "rsh" "10.0.1.4"
> "uptime"
> Oct 25 11:58:50 greteld nanny[1315]: running command "rsh" "10.0.1.3"
> "uptime"
> Oct 25 11:59:10 greteld nanny[1314]: running command "rsh" "10.0.1.4"
> "uptime"
> Oct 25 11:59:10 greteld nanny[1315]: running command "rsh" "10.0.1.3"
> "uptime"
> Oct 25 11:59:30 greteld nanny[1314]: running command "rsh" "10.0.1.4"
> "uptime"
> Oct 25 11:59:30 greteld nanny[1315]: running command "rsh" "10.0.1.3"
> "uptime"
> ====================================
>
> Why? For ur information, I have stop the pulse on gretelf at this time.
How did you stop pulse? "/etc/rc.d/init.d/pulse stop" is what I would
use.
> I am almost confuse already. Sometimes it works, sometimes it not.
> hai~~~ Pls correct me and show me the where have I done wrong.
I suspect that gretel is active when you start it. I am not sure
how gretel and gretelf coordinate and figure out who should be master.
This is going to be specific to piranha, and I just don't have time
to go figure it out. You might have to ask for support from Redhat.
I am not actually running piranha anywhere right now and I don't
have time to go set it up and play with it.
--
John Cronin
|