Tobias Klausmann wrote:
> Hi!
>
> On Thu, 05 Jul 2007, Gerry Reno wrote:
>
>> Tobias Klausmann wrote:
>>
>>> We're currently using keepalived and vanilla 2.6 kernels (which
>>> already have LVS, so no patching needed). We're also looking into
>>> ldirectord since keepalived has given us some trouble.
>>>
>>>
>> Tobias,
>> Are you still having the same catatonic problem? Or is this something new?
>>
>
> It's similar, yet different.
>
> First, it seems it's no longer triggered by config reloads but
> "just happens". Also, it happens very infrequently, maybe once a
> month, probably even less often - that is, over the five[0]
> productive and one test LBs, so statistically, it probably
> happens once or twice a year on a single LB.
>
Infrequent, spurious problems are tough.
> [0] We have 10+1 servers, five pairs with one productivem one
> standby plus one testing server. The way we switch things, a
> catatonic test server will pretty much go unnoticed.
>
> As such, it's pretty much impossible to reproduce. The symptoms
> are slightly different, to: keepalived *looks* okay, but it just
> doesn't see when a server disappears. Also, it eventually starts
> ignoring HUP completely. It's not completely frozen though: it
> keeps doing checks.
>
How do you detect the condition? Are you monitoring keepalived somehow?
What actions are necessary to recover?
> Another odd thing I've witnessed: if you tell keepalived to bind
> to an IP (for the checks) that is'nt configured, it will complain
> a bit but still continue trying - and leaving everything
> inservice. I think it should either complain more loudly or take
> everything out of service as not being able to check is about the
> same as everything being down.
>
Have you discussed this with keepalived team?
> Regards,
> Tobias
>
>
|