LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: ldirectord stops almost immediately

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: ldirectord stops almost immediately
From: Peter Nash <peter.nash@xxxxxxxxxxxxxxxxx>
Date: Fri, 10 Feb 2006 18:34:12 +0000
Hi Leon

I think I've found a fix for this problem for my configuration.

Like you I couldn't reproduce the problem on every real-server - for two 
out of three taking the services off-line caused ldirectord to fail but 
off-lining the other server caused no problems.  They are virtually 
identical hardware and configuration except for slight differences in CPU 
speed.

> 
> I also noticed that the problem seems to occur with only 1 server! Very
> strange as this server is also on a VMWare box and actually is an exact
> clone of the other Citrix terminal server (which ldirectord can check
> without a problem).
> 


I think the issue was fixed in ldirectord v1.77.2.39 - there's a reference 
to a "race condition in the connect and sip checks" which is a timing 
issue which could explain why it shows up with some real servers and not 
others.

I tried 4 versions of ldirectord with the following results:

1.62.2.6  ( from my old directors ):
Works fine

1.77.2.32  ( from the heartbeat-ldirectord-1.2.3.cvs.20050927-1.rh.el.um.1 
RPM downloaded from the Ultramonkey 3 download page )
Fails with the "Alarm" message when services go off-line on some servers

1.77.2.41 ( latest release from CVS ):
Seems to fix the Alarm problem but when run from heartbeat it didn't 
produce any log output in /var/log/ldirectord  ( I didn't spend much time 
investigating but it was the same on both directors )

1.77.2.39 ( from CVS, the version that fixes the "race condition" problem:
Seems to fix the Alarm problem and also produces log output as expected.

I've therefore left my setup running with ldirectord 1.77.2.39 ( just 
copied the ldirectord file into /usr/sbin for now ) so I can monitor this 
over the next few days.  I've also written a simple script I'm running 
from cron on both directors every 5 minutes that will email me if the 
"ipvs_syncmaster" process is running and /etc/ha.d/resource.d/ldirectord 
is not - that way at least I get to know about the problem before it bites 
me!

Hope this is helpful to you.

Regards,

Peter.



<Prev in Thread] Current Thread [Next in Thread>