hi everyone,
i've experienced some trouble with ultramonkey for the last few weeks.
Let's set an example: i've got this in the ldiretord.conf
*********
checktimeout=20
checkinterval=5
autoreload=no
logfile="local0"
quiescent=yes
..........
virtual=212.36.**.**:25
real=212.36.**.**:25 gate
real=212.36.**.**:25 gate
real=212.36.**.**:25 gate
real=212.36.**.**:25 gate
service=smtp
login="*******"
passwd="******"
scheduler=wrr
protocol=tcp
*********
The *'s are just to hide the information, are not there in the conf file.
So..... the "ipvsadm -l" shows this:
*************
TCP gluster.cdmon.com:smtp wrr
-> 212.36.**.**:smtp Route 1 0 0
-> 212.36.**.**:smtp Route 1 0 0
-> 212.36.**.**:smtp Route 1 0 0
-> 212.36.**.**:smtp Route 1 0 0
*************
It's ok, all to 1, ultramonkey sees every server.
If now, some of these four servers goes down, ultramonkey will put the "0"
on it, which is perfect.
The thing is..... after some days, and i don't know why, ultramonkey is
not able to check the status anymore.
I mean, if after a week or so, any of these 4 servers hangs, ultramonkey
will keep seeing it as "alive" and the weight will still be "1".
Or may be, sometimes, i will keep just one server up, and we i switch
the other ones on, ultramonkey will never detect them alive again.
Fortunatelly, i've got 2 balanced ultramonkey servers, and the only way
for me to get all servers alive again, is by doing the stand_by thing
and letting the other one to "check" all the servers again. The problem
is, thought, that meanwhile users can't access the data. It's just a few
seconds, of course, but too many users :(
I've tried running the ldirectord |reload|force-reload|stop|start
thing.... but it does nothing
The logs don't show any warning or error either, they just say something
like this:
********
Feb 8 09:28:21 umok01 ldirectord[24625]: Quiescent real server:
212.36.**.**:25 ( x 212.36.**.**:25) (Weight set to 0)
********
and that's it, no further warning or error messages are shown.
Is there actually a way to fix this? can i launch a flush command
without having to do the stand_by thing?
Thank you.
|