LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Problems with FOS

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Problems with FOS
From: "Teh Yong Wei" <ywteh@xxxxxxxxxxxxxx>
Date: Mon, 23 Oct 2000 11:44:54 +0800
Dear all,

I have a problem with my lvs and fos.

Here is my lvs.cf:
===================
primary = 10.0.0.41
service = fos
rsh_command = rsh
backup_active = 0
backup = 10.0.0.42
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = nat
nat_router = 10.0.1.254 eth1:0
failover web {
     address = 10.0.0.38 eth0:0
     active = 1
     port = 80
     timeout = 6
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     start_cmd = "/etc/rc.d/init.d/httpd start"
     stop_cmd = "/etc/rc.d/init.d/httpd stop"
}
virtual gretel {
     active = 1
     address = 10.0.0.38 eth0:0
     port = 80
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     load_monitor = uptime
     scheduler = wrr
     protocol = tcp
     timeout = 6
     reentry = 15
     server gretel3 {
         address = 10.0.1.3
         active = 1
         weight = 2
     }
     server gretel4 {
         address = 10.0.1.4
         active = 1
         weight = 1
     }
}

=======================
Here is the log file for primary node:
=======================
Oct 23 11:21:35 gretel pulse[6990]: PARTNER HAS TOLD US TO GO INACTIVE!
Oct 23 11:21:35 gretel fos[7018]: Shutting down due to signal 15
Oct 23 11:21:35 gretel fos[7018]: Shutting down local service
10.0.0.38:80 
Oct 23 11:21:35 gretel fos[7018]: running command 
"/etc/rc.d/init.d/httpd" "stop"
Oct 23 11:21:36 gretel httpd: httpd shutdown succeeded
Oct 23 11:21:36 gretel fos[7018]: will now exit to notify pulse...
Oct 23 11:21:36 gretel pulse[7095]: running command  "/sbin/ifconfig"
"eth0:0" "down"
Oct 23 11:21:36 gretel pulse[6990]: running command  "/usr/sbin/fos"
"--monitor" "-c" "/etc/lvs.cf" "--nofork"
Oct 23 11:21:36 gretel fos[7096]: Stopping local services (if any)
Oct 23 11:21:36 gretel fos[7096]: Shutting down local service
10.0.0.38:80 
Oct 23 11:21:36 gretel fos[7096]: running command 
"/etc/rc.d/init.d/httpd" "stop"
Oct 23 11:21:36 gretel httpd: httpd shutdown failed
Oct 23 11:21:36 gretel fos[7096]: running command  "/usr/sbin/nanny"
"-c" "-h" "10.0.0.42" "-V" "10.0.0.38" "-p" "80" "-s" "GET /
HTTP/1.0\r\n\r\n" "-x" "HTTP" "-R" "/etc/rc.d/init.d/httpd start" "-D"
"/etc/rc.d/init.d/httpd stop" "-t" "6"
Oct 23 11:21:36 gretel fos[7096]: Starting monitor for 10.0.0.38:80
running as pid 7096
Oct 23 11:21:36 gretel nanny[7109]: Failover service monitor for
10.0.0.38:80 started
Oct 23 11:21:36 gretel nanny[7109]: Remote service 10.0.0.42:80 is
available
Oct 23 11:23:06 gretel nanny[7109]: read from 10.0.0.42:80 timed out
Oct 23 11:23:06 gretel nanny[7109]: Exiting due to connection failure of
10.0.0.42:80
Oct 23 11:23:06 gretel fos[7096]: Monitor for service 10.0.0.38:80
exited. This is a failover condition!
Oct 23 11:23:06 gretel fos[7096]: will now exit to notify pulse...

=======================
Here is the log file for backup node:
=======================
Oct 23 11:21:10 gretelf fos[15505]: Starting local service 10.0.0.38:80
...
Oct 23 11:21:10 gretelf fos[15505]: running command 
"/etc/rc.d/init.d/httpd" "start"
Oct 23 11:21:10 gretelf httpd: httpd startup succeeded
Oct 23 11:21:13 gretelf pulse[15421]: Notifying partner WE are taking
control!
Oct 23 11:21:15 gretelf pulse[15504]: gratuitous fos arps finished
========================

1) Why when I startup the pulse at backup node, it straight away become
active and the primary node become inactive??

2) After that, I cannot access the VIP address via a client. But, when I
use "ps axw" on both primary and backup nodes, there is only pulse
running on primary node. Here is the 
backup"ps axw":
=================
15421 ?        S      0:00 pulse
15505 ?        S      0:00 /usr/sbin/fos --active -c /etc/lvs.cf
--nofork
15531 ?        S      0:00 httpd
15534 ?        S      0:00 httpd
15535 ?        S      0:00 httpd
15536 ?        S      0:00 httpd
15537 ?        S      0:00 httpd
15538 ?        S      0:00 httpd
15539 ?        S      0:00 httpd
15540 ?        S      0:00 httpd
15541 ?        S      0:00 httpd

3) Furthermore, it seems that the httpd on primary node is stopped. Why
is this happen?

4) I also try to unplug the primary node from the Internet, the backup
node is failover. But, when I plugged the primary node back to the
Internet, the primary node doesn't seem to failover. The primary node
still is inactive wheareas the backup node is active. Pls advice.

Thank you.


Don't worry, Be happy! :)


<Prev in Thread] Current Thread [Next in Thread>