LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Mystery director deaths

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Mystery director deaths
From: Andrei Taranchenko <andrei@xxxxxxxxxxxxx>
Date: Thu, 05 Jan 2006 14:33:23 -0500
Did you monitor the swap size? Is it getting out of control? Are you
using a modded ldirectord to monitor a custom service protocol, if so?

On Tue, 2006-01-03 at 20:51, Bruce Richardson wrote:
> I have a legacy ultramonkey configuration in a production environment
> that is causing bizarre problems.  2 IBM servers running Debian Sarge
> with a 2.6 kernel (custom compiled 2.6.6 kernel), with both servers
> running both the syncmaster and syndbackup processes.  Unfortunately,
> the person who set this up didn't leave a source deb or any notes about
> what they did.  There are also slight version differences between some
> of the components on the two boxes (I know, it's a mess, I didn't crate
> it) due to only one of the boxes having had the ultramonkey repository
> in sources.list.
> 
> This pair has been used with one of them as a primary and the other only
> ever briefly taking charge.  It seems (this is a set-up that I
> inherited) that the primary was failing every 3 or 4 months.  The
> secondary would then fail if left in master mode for more than a week.
> 
> To try and fix this mess, I span up two vanilla Debian Sarge boxes with
> the latest ldirectord and hearbeat packages.  When I used one of them to
> replace the secondary, it died only a few minutes after the primary
> failed over to it.  It then died again shortly afterwards even on
> standby.
> 
> When I say "die", I mean complete and immediate freeze with no
> indications in the logs and a frozen screen (if a console is connected
> at the time).  Absolutely no indication of what might be the cause.
> 
> I have similar director-pairs in other environments that cause no such
> problems.  There are three main differences between those systems and
> this pair:  the healthy systems use
> 
>       1.  Stock Debian 2.6.8 kernels and packages.
>       2.  IPaddr2 rather than IPaddr
>       3.  Connection syncing only in master->slave mode (as opposed to
>       master->master) or simply not at all.
> 
> My feeling with this is that the connection tracking/syncing is at the
> root of the problem, possibly the fact that it is doing master->master.
> The very speedy death of the vanilla Sarge box that I tried to put in as
> a secondary tends to reinforce this in my mind.
> 
> Can anybody offer any thoughts?
-- 
Andrei Taranchenko <andrei@xxxxxxxxxxxxx>
TowerData


<Prev in Thread] Current Thread [Next in Thread>