LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] Dead servers not being removed from pool, ldirectord

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] Dead servers not being removed from pool, ldirectord
From: Michael Moody <michael@xxxxxx>
Date: Wed, 28 May 2008 19:20:21 -0600
The /etc/resolv.conf is identical.

The /etc/hosts is identical except this entry on the backup load balancer:

192.168.1.100   lvs1.bodybuilding.com lvs1

(I highly doubt that would have any bearing on it).

Any other suggestions?

Michael

Graeme Fowler wrote:
> On Wed, 2008-05-28 at 07:27 -0600, Michael S. Moody wrote:
>   
>> This happened again today, dead servers were not being removed. I had to
>> stop heartbeat, and allow the resources to transfer to the second load
>> balancer. Something is seriously wrong, but I don't know what it is. It
>> doesn't seem to happen on the second load balancer.
>>     
>
> Looking at your strace (which I'll edit, and is missing timestamps -
> next time if you can please use the "-tt" switch to get microsecond
> timing) shows the following:
>
> Setting up file descriptor 22, which is to be used to open a TCP stream
> socket:
>
>   
>> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 22
>> ioctl(22, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffd25ea8c0) = -1 EINVAL 
>> (Invalid argument)
>> lseek(22, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
>> ioctl(22, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffd25ea8c0) = -1 EINVAL 
>> (Invalid argument)
>> lseek(22, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
>>     
>
> The arg/seek errors there are fine, so ignore them. Now it
> sets/gets/sets flags:
>
>   
>> fcntl(22, F_SETFD, FD_CLOEXEC)          = 0
>> fcntl(22, F_GETFL)                      = 0x2 (flags O_RDWR)
>> fcntl(22, F_SETFL, O_RDWR|O_NONBLOCK)   = 0
>>     
>
> ...and now we connect to your realserver:
>
>   
>> connect(22, {sa_family=AF_INET, sin_port=htons(21), 
>> sin_addr=inet_addr("192.168.1.195")}, 16) = -1 EINPROGRESS (Operation now in 
>> progress)
>>     
>
> ...and here, FD 22 is being prepared for read/write (I think!):
>
>   
>> select(24, NULL, [22], NULL, {0, 0})    = 1 (out [22], left {0, 0})
>>     
>
> ...and is now connected, so we get flags, set flags, and wait to read
> from it:
>
>   
>> connect(22, {sa_family=AF_INET, sin_port=htons(21), 
>> sin_addr=inet_addr("192.168.1.195")}, 16) = 0
>> fcntl(22, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
>> fcntl(22, F_SETFL, O_RDWR)              = 0
>>     
>
> The read/write operation times out:
>
>   
>> select(24, [22], NULL, NULL, {0, 0})    = 0 (Timeout)
>>     
>
> ...and FD22 - the FTP connection - is closed.
>
>   
>> close(22)                               = 0
>>     
>
> Rinse, repeat, etc.
>
> The lack of timestamps is a bit of a blocker here, as there's no way to
> discern how long ldirectord is waiting before the timeouts occur.
>
> I'll suggest one thing, however: does the affected realserver have the
> exact same hosts file (with obvious differences if that isn't a complete
> oxymoron) and resolver configuration as the working one?
>
> It strikes me that the connection is timing out because the FTP daemon
> or xinetd, or some other wrapper, is trying to do a reverse DNS lookup
> of the calling IP and that's the part causing the timeout - if the
> daemon has to wait for a lookup to complete before returning the banner,
> perhaps ldirectord's timeout is less than that so it gives up and moves
> on?
>
> I think you've unearthed a config problem in your local setup, but it
> could be a bug. Let's go with making sure the realserver knows who
> everyone is first.
>
> Graeme
>
>
>
>   

-- 

Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Direct: (650) 265-4154
Web: http://www.GlobalSystemsConsulting.com

Engineering Support: support@xxxxxx
Billing Support: billing@xxxxxx
Customer Support Portal:  http://my.gsc.cc


NOTICE - This message contains privileged and confidential information intended 
only for the use of the addressee named above. If you are not the intended 
recipient of this message, you are hereby notified that you must not 
disseminate, copy or take any action in reliance on it. If you have received 
this message in error, please immediately notify Global Systems Consulting, its 
subsidiaries or associates. Any views expressed in this message are those of 
the individual sender, except where the sender specifically states them to be 
the view of Global Systems Consulting, its subsidiaries and associates.



<Prev in Thread] Current Thread [Next in Thread>