Hey everybody,
I'm trying to use lvs and ldirectord to balance ssh access to
interactive nodes on an HPC cluster (see
http://marylou.byu.edu/m4/marylou4.htm). I have an issue, though.
Since there's no special module to check ssh, I'm just trying to use the
checktype=connect to check if the daemon is running on the realservers.
While it seems to work if either the daemon is not responding to the
socket (daemon crash, kernel panic, etc.), or if the interface on the
realserver is down, the real problem comes when the system is
functioning correctly, and the interface is up, but I've deliberately
shut down the daemon. In this situation, the socket connect attempt
will fail, but will return immediately, not timeout. The code in the
check_connect subroutine seems to operate on the assumption that it will
time out. I've prepared a small patch, and I'm attaching it, but wanted
to ask for more input, especially since I'm not very good at perl. If
anyone wants to clean up my modifications, feel free.
Thanks,
Lloyd Brown
BYU Supercomputing
http://marylou.byu.edu
--- original/ldirectord 2005-09-20 08:54:00.000000000 -0600
+++ patched/ldirectord 2005-09-20 09:54:14.000000000 -0600
@@ -2000,6 +2000,8 @@
my $sock = &ld_open_socket($$r{server}, $port, $$v{protocol});
if ($sock) {
close($sock);
+ } else {
+ die(); #socket attempt failed immediately (not timeout)
}
&ld_debug(3, "Connected to $1 (port $port)");
alarm 0; # Cancel the alarm
|