Thank you for ur reply.
> > Here is my lvs.cf
> > ===============
> [snip - looked fine to my inexpert eyes]
>
> > primary = 10.0.0.41
> > service = lvs
> > rsh_command = rsh
> > backup_active = 1
> > backup = 10.0.0.42
> > heartbeat = 1
> > heartbeat_port = 539
> > keepalive = 6
> > deadtime = 18
> > network = nat
> > nat_router = 10.0.1.254 eth1:0
> > virtual gretel {
> > active = 1
> > address = 10.0.0.38 eth0:0
> > port = 80
> > load_monitor = uptime
> > scheduler = wrr
> > protocol = tcp
> > server gretel4 {
> > address = 10.0.1.4
> > active = 1
> > }
> > server gretel3 {
> > address = 10.0.1.3
> > active = 1
> > weight = 2
> > }
> > }
> >
> > ===============
> > I have a primary node(gretel) and a backup node(greteld) and 2 real
> > servers (gretel3 and gretel4). I am running on lvs. When I go to
> > 10.0.0.38, I am accessing the page on either gretel3 or gretel4,
which
> > the gretel is doing the load-balancing, right?
>
> This is correct.
>
> > When I turn off the network connection for gretel (both NIC), the
backup
> > node (greteld) is take over the load balancing job from gretel,
right?
>
> This is correct.
>
> > But, when I plugged it back the 2 NIC to the gretel, the primary
node
> > (gretel) didn't become active. Indeed, I cannot access the page
> > 10.0.0.38 anymore. Why is this happen?
>
> I am not sure. Can you use tcpdump to watch what is happening>
> What does ifconfig show? Do both gretel and gretelf have the VIP
> (10.0.0.38) up? If so, that is going to be a problem, for sure.
> Perhaps when you reconnect gretel, it doesn't realize it failed and it
> is just resuming. I don't have enough experience with Piranha
> to know what it does in this case.
This is what I got from /var/log/messages at gretel:
==================================
Oct 25 09:58:07 gretel nanny[12924]: running command "rsh" "10.0.1.4"
"uptime"
Oct 25 09:58:08 gretel nanny[12926]: running command "rsh" "10.0.1.3"
"uptime"
Oct 25 09:58:27 gretel nanny[12924]: running command "rsh" "10.0.1.4"
"uptime"
Oct 25 09:58:28 gretel nanny[12926]: running command "rsh" "10.0.1.3"
"uptime"
Oct 25 09:58:47 gretel nanny[12924]: running command "rsh" "10.0.1.4"
"uptime"
Oct 25 09:58:48 gretel nanny[12926]: running command "rsh" "10.0.1.3"
"uptime"
Oct 25 09:59:07 gretel nanny[12924]: running command "rsh" "10.0.1.4"
"uptime"
Oct 25 09:59:08 gretel nanny[12926]: running command "rsh" "10.0.1.3"
"uptime"
Oct 25 09:59:27 gretel nanny[12924]: running command "rsh" "10.0.1.4"
"uptime"
Oct 25 09:59:28 gretel nanny[12926]: running command "rsh" "10.0.1.3"
"uptime"
==============================
What does this mean?
Here is the /var/log/messages for gretelf:
=================================
Oct 25 04:02:00 gretelf anacron[22218]: Updated timestamp for job
`cron.daily' to 2000-10-25
Oct 25 09:52:45 gretelf pulse[21302]: partner dead: activating lvs
Oct 25 09:52:45 gretelf pulse[22803]: running command "/sbin/ifconfig"
"eth1:0" "10.0.1.254" "up"
Oct 25 09:52:45 gretelf pulse[22804]: running command "/sbin/ifconfig"
"eth0:0" "10.0.0.38" "up"
Oct 25 09:52:45 gretelf pulse[21302]: partner active: deactivating lvs
Oct 25 09:52:45 gretelf pulse[22805]: running command "/sbin/ifconfig"
"eth0:0" "down"
Oct 25 09:52:45 gretelf pulse[22801]: running command
"/usr/sbin/send_arp" "-i" "eth1" "10.0.1.254" "0010B556E9A6"
"10.0.1.255" "ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22802]: running command
"/usr/sbin/send_arp" "-i" "eth0" "10.0.0.38" "00104BCA8523" "10.0.0.255"
"ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22808]: running command "/sbin/ifconfig"
"eth1:0" "down"
Oct 25 09:52:51 gretelf pulse[22800]: gratuitous lvs arps finished
[root@gretelf /root]# less /var/log/messages | grep "Oct 25"
Oct 25 04:02:00 gretelf anacron[22218]: Updated timestamp for job
`cron.daily' to 2000-10-25
Oct 25 09:52:45 gretelf pulse[21302]: partner dead: activating lvs
Oct 25 09:52:45 gretelf pulse[22803]: running command "/sbin/ifconfig"
"eth1:0" "10.0.1.254" "up"
Oct 25 09:52:45 gretelf pulse[22804]: running command "/sbin/ifconfig"
"eth0:0" "10.0.0.38" "up"
Oct 25 09:52:45 gretelf pulse[21302]: partner active: deactivating lvs
Oct 25 09:52:45 gretelf pulse[22805]: running command "/sbin/ifconfig"
"eth0:0" "down"
Oct 25 09:52:45 gretelf pulse[22801]: running command
"/usr/sbin/send_arp" "-i" "eth1" "10.0.1.254" "0010B556E9A6"
"10.0.1.255" "ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22802]: running command
"/usr/sbin/send_arp" "-i" "eth0" "10.0.0.38" "00104BCA8523" "10.0.0.255"
"ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22808]: running command "/sbin/ifconfig"
"eth1:0" "down"
Oct 25 09:52:51 gretelf pulse[22800]: gratuitous lvs arps finished
====================================
What does this mean??
> This may be the result of having two systems that know they can't
> talk to each other, but how do they know which one failed? In
> other words, if gretel can't talk to gretelf, how can gretel know
> if that is because something is wrong with gretel or something
> is wrong with gretelf? So gretel just keeps running when it
> is reattached - but gretelf is running now too.
>
> How does Piranha deal with this? Keith?
>
> > But, after I have restarted the
> > httpd, only I can access the page 10.0.0.38.
>
> Where are you restarting the httpd? On gretel? On gretel3 and
gretel4?
I restarted the httpd on gretel. httpd on gretelf, gretel3 and gretel4
is still running.
Another doubt: In lvs, when primary node is down, the backup node will
take over. Then, why will still need fos?
Don't worry, Be happy! :)
|