In my LVS cluster I think I may have found a bug in ldirectord.
It seems that if a web server opens a TCP/IP connection, but does not
respond on that connection, ldirectord hangs waiting for a response and
never continues checking the other servers. I had checking set to
'negotiate' to detect that exact problem, but it ldirectord just sits
there. A stack trace on ldirectord just says:
read 3)
and sits there. I'm not sure if it's just for http or https connections
or both. I'll have to investigate more, but I think it's just https
connections.
I noticed this when a real server died. The kernel was still
responding, pings worked and tcp/ip opened connections, but nothing ever
came over the connections. Ldirectord was hung and a dead server was
still in the list.
To me, this seems like a serious problem as I had a real server that was
dead but was not being removed from the list because ldirectord was
hanging and it seemed there was no timeout for that connection. It had
hung for a day before I noticed what was going on. Once I changed to
connection type for https to simply 'connect' it started to run normally
again. Of course, the dead server was still dead, but ldirectord added
it to the list because it responded to a tcp/ip connection.
Is this a problem in ldirectord itself, or in the Perl modules?
Ideas? Solutions?
Thanks.
Kelly
--
--------------------------------------------
-- Kelly Corbin
-- kcorbin@xxxxxxxxxxxxxx
--
-- On the web @ http://www.theiqgroup.com
-- The IQ Group, Inc.
-- 6740 Antioch Suite 110
-- Merriam, KS 66202
-- (913)-722-6700
-- Fax (913)722-7264
--------------------------------------------
|