
ldirectord checktype=connect false positive

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: ldirectord checktype=connect false positive
From: Lloyd Brown <somewhere_or_other@xxxxxxx>
Date: Tue, 20 Sep 2005 10:03:01 -0600
Hey everybody,

I'm trying to use lvs and ldirectord to balance ssh access to interactive nodes on an HPC cluster (see I have an issue, though. Since there's no special module to check ssh, I'm just trying to use the checktype=connect to check if the daemon is running on the realservers. While it seems to work if either the daemon is not responding to the socket (daemon crash, kernel panic, etc.), or if the interface on the realserver is down, the real problem comes when the system is functioning correctly, and the interface is up, but I've deliberately shut down the daemon. In this situation, the socket connect attempt will fail, but will return immediately, not timeout. The code in the check_connect subroutine seems to operate on the assumption that it will time out. I've prepared a small patch, and I'm attaching it, but wanted to ask for more input, especially since I'm not very good at perl. If anyone wants to clean up my modifications, feel free.

Lloyd Brown
BYU Supercomputing
--- original/ldirectord 2005-09-20 08:54:00.000000000 -0600
+++ patched/ldirectord  2005-09-20 09:54:14.000000000 -0600
@@ -2000,6 +2000,8 @@
                my $sock = &ld_open_socket($$r{server}, $port, $$v{protocol});
                if ($sock) {
+               } else {
+                       die(); #socket attempt failed immediately (not timeout)
                &ld_debug(3, "Connected to $1 (port $port)");
                alarm 0; # Cancel the alarm

<Prev in Thread] Current Thread [Next in Thread>