Re: Problems with FOS

To:	John Cronin <jsc3@xxxxxxxxxxxxx>
Subject:	Re: Problems with FOS
Cc:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From:	"Teh Yong Wei" <ywteh@xxxxxxxxxxxxxx>
Date:	Wed, 25 Oct 2000 10:15:16 +0800

Thank you for ur reply.

> > Here is my lvs.cf
> > ===============
> [snip - looked fine to my inexpert eyes]
> 
> > primary = 10.0.0.41
> > service = lvs
> > rsh_command = rsh
> > backup_active = 1
> > backup = 10.0.0.42
> > heartbeat = 1
> > heartbeat_port = 539
> > keepalive = 6
> > deadtime = 18
> > network = nat
> > nat_router = 10.0.1.254 eth1:0
> > virtual gretel {
> >      active = 1
> >      address = 10.0.0.38 eth0:0
> >      port = 80
> >      load_monitor = uptime
> >      scheduler = wrr
> >      protocol = tcp
> >      server gretel4 {
> >          address = 10.0.1.4
> >          active = 1
> >      }
> >      server gretel3 {
> >          address = 10.0.1.3
> >          active = 1
> >          weight = 2
> >      }
> > }
> > 
> > ===============
> > I have a primary node(gretel) and a backup node(greteld) and 2 real
> > servers (gretel3 and gretel4). I am running on lvs. When I go to
> > 10.0.0.38, I am accessing the page on either gretel3 or gretel4,
which
> > the gretel is doing the load-balancing, right?
> 
> This is correct.
>  
> > When I turn off the network connection for gretel (both NIC), the
backup
> > node (greteld) is take over the load balancing job from gretel,
right?
> 
> This is correct.
>  
> > But, when I plugged it back the 2 NIC to the gretel, the primary
node
> > (gretel) didn't become active. Indeed, I cannot access the page
> > 10.0.0.38 anymore. Why is this happen?
> 
> I am not sure.  Can you use tcpdump to watch what is happening>
> What does ifconfig show?  Do both gretel and gretelf have the VIP
> (10.0.0.38) up?  If so, that is going to be a problem, for sure.
> Perhaps when you reconnect gretel, it doesn't realize it failed and it
> is just resuming.  I don't have enough experience with Piranha
> to know what it does in this case.


This is what I got from /var/log/messages at gretel:
==================================
 Oct 25 09:58:07 gretel nanny[12924]: running command  "rsh" "10.0.1.4"
"uptime"
Oct 25 09:58:08 gretel nanny[12926]: running command  "rsh" "10.0.1.3"
"uptime"
Oct 25 09:58:27 gretel nanny[12924]: running command  "rsh" "10.0.1.4"
"uptime"
Oct 25 09:58:28 gretel nanny[12926]: running command  "rsh" "10.0.1.3"
"uptime"
Oct 25 09:58:47 gretel nanny[12924]: running command  "rsh" "10.0.1.4"
"uptime"
Oct 25 09:58:48 gretel nanny[12926]: running command  "rsh" "10.0.1.3"
"uptime"
Oct 25 09:59:07 gretel nanny[12924]: running command  "rsh" "10.0.1.4"
"uptime"
Oct 25 09:59:08 gretel nanny[12926]: running command  "rsh" "10.0.1.3"
"uptime"
Oct 25 09:59:27 gretel nanny[12924]: running command  "rsh" "10.0.1.4"
"uptime"
Oct 25 09:59:28 gretel nanny[12926]: running command  "rsh" "10.0.1.3"
"uptime"
==============================
What does this mean?

Here is the /var/log/messages for gretelf:
=================================
Oct 25 04:02:00 gretelf anacron[22218]: Updated timestamp for job
`cron.daily' to 2000-10-25
Oct 25 09:52:45 gretelf pulse[21302]: partner dead: activating lvs
Oct 25 09:52:45 gretelf pulse[22803]: running command  "/sbin/ifconfig"
"eth1:0" "10.0.1.254" "up"
Oct 25 09:52:45 gretelf pulse[22804]: running command  "/sbin/ifconfig"
"eth0:0" "10.0.0.38" "up"
Oct 25 09:52:45 gretelf pulse[21302]: partner active: deactivating lvs
Oct 25 09:52:45 gretelf pulse[22805]: running command  "/sbin/ifconfig"
"eth0:0" "down"
Oct 25 09:52:45 gretelf pulse[22801]: running command 
"/usr/sbin/send_arp" "-i" "eth1" "10.0.1.254" "0010B556E9A6"
"10.0.1.255" "ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22802]: running command 
"/usr/sbin/send_arp" "-i" "eth0" "10.0.0.38" "00104BCA8523" "10.0.0.255"
"ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22808]: running command  "/sbin/ifconfig"
"eth1:0" "down"
Oct 25 09:52:51 gretelf pulse[22800]: gratuitous lvs arps finished
[root@gretelf /root]# less  /var/log/messages | grep "Oct 25"
Oct 25 04:02:00 gretelf anacron[22218]: Updated timestamp for job
`cron.daily' to 2000-10-25
Oct 25 09:52:45 gretelf pulse[21302]: partner dead: activating lvs
Oct 25 09:52:45 gretelf pulse[22803]: running command  "/sbin/ifconfig"
"eth1:0" "10.0.1.254" "up"
Oct 25 09:52:45 gretelf pulse[22804]: running command  "/sbin/ifconfig"
"eth0:0" "10.0.0.38" "up"
Oct 25 09:52:45 gretelf pulse[21302]: partner active: deactivating lvs
Oct 25 09:52:45 gretelf pulse[22805]: running command  "/sbin/ifconfig"
"eth0:0" "down"
Oct 25 09:52:45 gretelf pulse[22801]: running command 
"/usr/sbin/send_arp" "-i" "eth1" "10.0.1.254" "0010B556E9A6"
"10.0.1.255" "ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22802]: running command 
"/usr/sbin/send_arp" "-i" "eth0" "10.0.0.38" "00104BCA8523" "10.0.0.255"
"ffffffffffff"
Oct 25 09:52:45 gretelf pulse[22808]: running command  "/sbin/ifconfig"
"eth1:0" "down"
Oct 25 09:52:51 gretelf pulse[22800]: gratuitous lvs arps finished
====================================
What does this mean??

> This may be the result of having two systems that know they can't
> talk to each other, but how do they know which one failed?  In
> other words, if gretel can't talk to gretelf, how can gretel know
> if that is because something is wrong with gretel or something
> is wrong with gretelf?  So gretel just keeps running when it
> is reattached - but gretelf is running now too.
> 
> How does Piranha deal with this?  Keith?
> 
> > But, after I have restarted the
> > httpd, only I can access the page 10.0.0.38.
> 
> Where are you restarting the httpd?  On gretel?  On gretel3 and
gretel4?

I restarted the httpd on gretel. httpd on gretelf, gretel3 and gretel4 
is still running.

Another doubt: In lvs, when primary node is down, the backup node will
take over. Then, why will still need fos?

Don't worry, Be happy! :)

<Prev in Thread]	Current Thread	[Next in Thread>
Problems with FOS, Teh Yong Wei Re: Problems with FOS, Teh Yong Wei Re: Problems with FOS, John Cronin Re: Problems with FOS, Teh Yong Wei Re: Problems with FOS, Teh Yong Wei <=

Previous by Date:	Re: More on scheduling..., Benjamin Lee
Next by Date:	Re: Release new code: Scheduler for distributed caching, Joe Cooper
Previous by Thread:	Re: Problems with FOS, Teh Yong Wei
Next by Thread:	Need HELP! FOS does not work! What means SIOCGIFADDR failed?, thomas . hoelsken
Indexes:	[Date] [Thread] [Top] [All Lists]