Leon Keijser <errtu@xxxxxxx> wrote:
> Horms,
>
>>ldirectord -d ...
>>
>>If you are having problems with ldirectord crashing, then that is a
>>problem.
>>I'd be very welcome of any information you can provide to help track
>>down the cause.
>
> I've ran ldirectord in debug mode (from command line), on the test-lvs setup
> and found this when i looked this morning:
>
> DEBUG3: Activated service 192.168.50.17:3389
> DEBUG2: Checking connect: real
> server=connect:tcp:192.168.50.18:3389:3389:1:\/:
> (virtual=tcp:192.168.51.200:3389)
> DEBUG3: Connected to (port 3389)
> DEBUG2: Enabled server=192.168.50.18
> DEBUG3: Activated service 192.168.50.18:3389
> DEBUG2: Checking connect: real
> server=connect:tcp:192.168.50.121:1494:1494:1:\/:
> (virtual=tcp:192.168.51.201:1494)
> DEBUG3: Connected to (port 1494)
> DEBUG2: Enabled server=192.168.50.121
> DEBUG3: Activated service 192.168.50.121:1494
> DEBUG2: Checking connect: real
> server=connect:tcp:192.168.50.122:1494:1494:1:\/:
> (virtual=tcp:192.168.51.201:1494)
> DEBUG2: Disabled server=192.168.50.122
> DEBUG3: Deactivated service 192.168.50.122:1494: Died at
> /usr/sbin/ldirectord line 2043.
> DEBUG2: Checking negotiate: real
> server=negotiate:none:tcp:127.0.0.1:22::1:\/:
> (virtual=tcp:192.168.51.203:22)
> DEBUG2: Checking none
> DEBUG2: Enabled server=127.0.0.1
> /etc/init.d/ldirectord: line 59: 21478 Alarm clock $@
> rpzlvstest01 root #
>
>
> Now yesterday i've seen this line as well:
>
> DEBUG3: Deactivated service 192.168.50.122:1494: Died at
> /usr/sbin/ldirectord line 2043.
>
> But sometimes this 'died' is replaced by 'Timeout alarm' and a different
> line number. I dunno but the word 'died' kinda worries me. But since
> ldirectord didn't in fact die, but kept running happily all day long, i let
> it go.
The "died" is an artifact of the way that this code is implemented in
perl. Its actually an eval (think child process) that died, and that is
actually expected. The patch below should make these messages a bit
clearer, but in the mean time, here is what they mean:
Timeout Alarm: The check timed out
Died: The check failed for some other reason, probably
because the real-server is there but the port is closed
In both cases the real-server is considered to be offline and
is deactivated if it was condidered to be online.
In the patch "Timeout Alarm" becomes "Timeout".
And "Died" becomes "Failed to Open Socket"
> Starting the ldirectord again the next morning restores the last server
> (50.17). Right now i made a little script to check if ldirectord is running
> at 4:00am (1 hour after the last scheduled server reboot, and 1 hour after
> ldirectord apparently stopped/crashed/died).
Thanks, let me know how it goes.
--
Horms
Index: ldirectord
===================================================================
RCS file: /home/cvs/linux-ha/linux-ha/ldirectord/ldirectord,v
retrieving revision 1.128
diff -u -r1.128 ldirectord
--- ldirectord 1 Dec 2005 00:40:09 -0000 1.128
+++ ldirectord 2 Dec 2005 08:07:59 -0000
@@ -2033,14 +2033,15 @@
eval {
local $SIG{'__DIE__'} = "DEFAULT";
- local $SIG{'ALRM'} = sub { die "Timeout Alarm" };
+ local $SIG{'ALRM'} = sub { die "Timeout" };
&ld_debug(4, "Timeout is $$v{checktimeout}");
alarm $$v{checktimeout};
my $sock = &ld_open_socket($$r{server}, $port, $$v{protocol});
if ($sock) {
close($sock);
} else {
- die(); #socket attempt failed immediately (not timeout)
+ #socket attempt failed immediately (not timeout)
+ die("Failed to Open Socket");
}
&ld_debug(3, "Connected to $1 (port $port)");
alarm 0; # Cancel the alarm
|