LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Mystery director deaths

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Mystery director deaths
From: Bruce Richardson <itsbruce@xxxxxxxxxxx>
Date: Wed, 4 Jan 2006 01:51:01 +0000
I have a legacy ultramonkey configuration in a production environment
that is causing bizarre problems.  2 IBM servers running Debian Sarge
with a 2.6 kernel (custom compiled 2.6.6 kernel), with both servers
running both the syncmaster and syndbackup processes.  Unfortunately,
the person who set this up didn't leave a source deb or any notes about
what they did.  There are also slight version differences between some
of the components on the two boxes (I know, it's a mess, I didn't crate
it) due to only one of the boxes having had the ultramonkey repository
in sources.list.

This pair has been used with one of them as a primary and the other only
ever briefly taking charge.  It seems (this is a set-up that I
inherited) that the primary was failing every 3 or 4 months.  The
secondary would then fail if left in master mode for more than a week.

To try and fix this mess, I span up two vanilla Debian Sarge boxes with
the latest ldirectord and hearbeat packages.  When I used one of them to
replace the secondary, it died only a few minutes after the primary
failed over to it.  It then died again shortly afterwards even on
standby.

When I say "die", I mean complete and immediate freeze with no
indications in the logs and a frozen screen (if a console is connected
at the time).  Absolutely no indication of what might be the cause.

I have similar director-pairs in other environments that cause no such
problems.  There are three main differences between those systems and
this pair:  the healthy systems use

        1.  Stock Debian 2.6.8 kernels and packages.
        2.  IPaddr2 rather than IPaddr
        3.  Connection syncing only in master->slave mode (as opposed to
        master->master) or simply not at all.

My feeling with this is that the connection tracking/syncing is at the
root of the problem, possibly the fact that it is doing master->master.
The very speedy death of the vanilla Sarge box that I tried to put in as
a secondary tends to reinforce this in my mind.

Can anybody offer any thoughts?

-- 
Bruce

The ice-caps are melting, tra-la-la-la.  All the world is drowning,
tra-la-la-la-la.  -- Tiny Tim.

<Prev in Thread] Current Thread [Next in Thread>