LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: [lvs-users] [Keepalived-devel] Keepalived communication with kernel

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx, Lista KeepAlived <keepalived-devel@xxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] [Keepalived-devel] Keepalived communication with kernel failing after some time
From: Rodrigo Severo <rodrigo@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 9 Nov 2011 14:54:44 -0200
Hi,


First of all let me thank you for your help and attention and for pointing
me to the LVS users mailing list. I wasn't aware of it.


On Wed, Nov 9, 2011 at 12:58 PM, Graeme Fowler <graeme@xxxxxxxxxxx> wrote:

> [copying in the LVS users list]
>
> On Wed, 2011-11-09 at 12:04 -0200, Rodrigo Severo wrote:
> > I have been using keepalived for some years now.
> >
> > For some time now keepalived has started to fail when updating VS on
> > the kernel. This kind of thing happens after some time where
> > keepalived is working perfectly, i.e., failed servers been succesfully
> > removed and returned servers successfully added to VSs. Just after
> > keepalived is started everything works fine. After some time it starts
> > to fail to update the VSs on the kernel.
> >
> > To make it work again I just have to restart keepalived.
> >
> > The error message I get on these failures are like:
> >
> > [Keepalived_healthcheckers] IPVS: Invalid operation.  Possibly wrong
> > module version, address not unicast, ...
> >
> >
> > It's important to observe that the same exact operation that works
> > fine just after keepalived is started will fail with the above error
> > after some time (one or two hours) so the suggestions on the error
> > message - wrong module version, wrong kind of address - can be safely
> > discarded as causes of the problem.
> >
> > I'm using Gentoo with kernel 3.0.6 and keepalived 1.2.2.
> >
> > Any suggestions on how I can further debug this issue?
>
> Yes. Please grab the log lines which indicate keepalived starting, doing
> stuff to servers, then failing to do stuff to servers and send it to
> lvs-users@xxxxxxxxxxxxxxxxxxxxxxx I think we need to see timing, the
> number of operations done and so on.
>

Here is a example: http://pastebin.com/uwzKKGXh

Please observe that all VS updates up to 11:51 worked fine. Both updates
after 14:14 failed with the above error message.

You will also see that there aren't many updates happening.


> Your kernel is "out there" some way ahead of large numbers of the rest
> of the world who lag behind on the 2.6.x branch. I suspect something
> isn't quite right in the IPVS code in 3.0.x but I couldn't say what it
> is.
>

If you believe the kernel version might be to blame, I can try some older
one.

Do you have a suggestion of version to test? Versions 2.6.39, 2.6.38 and
2.6.32 are specially easy to test but I can test any other version you
believe is important.

I forget to mention in my first message what I believe is causing the
problem: some kind of timeout on the socket used by keepalived to
communicate with the kernel. I don't have any particular info pointing to
this except the fact that everything works for some time after keepalived
is started and after some time it stops. Unfortunately I don't know how
would I test this hypothesis.



-- 
---------------------------------------------------------------------------------------
Rodrigo Severo

Fábrica de Idéias
SBS Quadra 2 - Bloco S - Ed. Empire Center - Sala 1.301
Brasília - DF - CEP 70070-904
Tel. (61) 3321-1357       Fax (61) 3223-1712
---------------------------------------------------------------------------------------
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

<Prev in Thread] Current Thread [Next in Thread>