LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

[lvs-users] ultramonkey 3 and problems detecting alive servers

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] ultramonkey 3 and problems detecting alive servers
From: Jordi Moles <jordi@xxxxxxxxx>
Date: Mon, 11 Feb 2008 10:42:39 +0100
hi everyone,

i've experienced some trouble with ultramonkey for the last few weeks.

Let's set an example: i've got this in the ldiretord.conf

*********
checktimeout=20
checkinterval=5
autoreload=no
logfile="local0"
quiescent=yes

..........

virtual=212.36.**.**:25
        real=212.36.**.**:25 gate
        real=212.36.**.**:25 gate
        real=212.36.**.**:25 gate
        real=212.36.**.**:25 gate
        service=smtp
        login="*******"
        passwd="******"
        scheduler=wrr
        protocol=tcp
*********

The *'s are just to hide the information, are not there in the conf file.

So..... the "ipvsadm -l" shows this:

*************
TCP  gluster.cdmon.com:smtp wrr
  -> 212.36.**.**:smtp      Route   1      0          0
  -> 212.36.**.**:smtp      Route   1      0          0
  -> 212.36.**.**:smtp      Route   1      0          0
  -> 212.36.**.**:smtp      Route   1      0          0
*************

It's ok, all to 1, ultramonkey sees every server.
If now, some of these four servers goes down, ultramonkey will put the "0"
 on it, which is perfect.

The thing is..... after some days, and i don't know why, ultramonkey is 
not able to check the status anymore.
I mean, if after a week or so, any of these 4 servers hangs, ultramonkey 
will keep seeing it as "alive" and the weight will still be "1".
Or may be, sometimes, i will keep just one server up, and we i switch 
the other ones on, ultramonkey will never detect them alive again.

Fortunatelly, i've got 2 balanced ultramonkey servers, and the only way 
for me to get all servers alive again, is by doing the stand_by thing 
and letting the other one to "check" all the servers again. The problem 
is, thought, that meanwhile users can't access the data. It's just a few 
seconds, of course, but too many users :(

I've tried running the ldirectord  |reload|force-reload|stop|start 
thing.... but it does nothing

The logs don't show any warning or error either, they just say something 
like this:

********
Feb  8 09:28:21 umok01 ldirectord[24625]: Quiescent real server: 
212.36.**.**:25 ( x 212.36.**.**:25) (Weight set to 0)
********

and that's it, no further warning or error messages are shown.

Is there actually a way to fix this? can i launch a flush command 
without having to do the stand_by thing?

Thank you.


<Prev in Thread] Current Thread [Next in Thread>
  • [lvs-users] ultramonkey 3 and problems detecting alive servers, Jordi Moles <=