Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirecto

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD)
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Sat, 30 Mar 2002 15:43:55 +0100
Hello Alexandre,

The source of the userguide is some where on my company SAN... I need to correct some english problems :)

I agree :) I could proofread it for you if you want me to. Also I thought about giving a speech about LVS/HA/keepalived at the next Swiss Linux developer conference. You're invited too of course.

Have worked this night on netlink fetcher for IF events.

Cool, does it work?

Ok we are sync :), this kind of state machine is a little hard to explain :/

Yes, but I haven't really read your documentation thoroughly to that point so it is not your fault. I should be reading the documentation.

Now I get it, I hope. This is a 4-bit state diagram with the bits being: LVS1(eth0), LVS2(eth0), LVS1(eth1) and LVS2(eth1). And FAULT_STATE is a result of a test, either MII beat failure or IFF_DOWN or routing changes or fwrules. According to the state transition table which I haven't seen yet (but I will draw) you know what happens. Thank you Alexandre, my slow brain starts working now.

Exactly you got it. Adding a "sync_instance" VRRP extension, introduce side effect in sate machine and resulting of a protocol instability. The solution is adding a new state FAULT_STATE to workaround the instability.

I'm glad my brain still works.

Well that's why the HA folks invented the concept of a heartbeat. And that's what you need to implement too in your framework. I think I understand your approach now and I have to tell you that the heartbeat is crucial. Your software needs to be capable of sending adverts through physically independant heartbeat VRRP instances two each for L1x and L2x. They do the STOMITH monitoring to avoid such protocol loops you're mentioning. I would go as far and exchange our HA solution with your framework, if you can provide FS interaction from HBs to the L1x/L2x transition, user defined healthchecks and service reload on demand.

With your first approach, FS(x) is a function which has following set of interactions to derive it's new state:

o MII beat state changes
o routing changes

Normal desired path:
[L1x=M|L2x=B] ---> FS(L1x,link failure) ---> [L1x=B|L2x=M]

Unwanted path:
[L1x=M|L2x=B] ---> FS(L2x,cable cut)    ---> [proto loop noise, L2x=M]

With my approach you have interaction of the above mentioned 3 plus the status information of the HBs which are physically separated. This cuts out the unwanted path and leaves you with the desired failover/failback path or state transition.

wow ... very nice !... Just to be sync with you => HB VRRP <=> MII probe + IFF_UP|RUNNING + Routing update ?

Yes, exactly. Only that the MII probe should be optional since it is the only thing that is not generally applicable. I'm very happy that you like my framework and I hope that we don't reinvent SGI failover. Lars would know it. But AFAIC remember SGI's failover was more about application monitoring and clustering and not network failover/failback.

I'd rather have HB instances as a pool of resource. This is NIC independant and easy to implement.
Ok so for you during VRRP bootstrap, we register a HB thread (peerforming MII probe + IFF_..... + ...) and keep a global interface struct sync with the NIC states ? ... This is what I was thinking when starting coding.

Yes, just make the MII probe optional. The rest provides enough HA. I've been working and coding stuff in the HA environment since years already and it never occured to me that something wicked happened to the HBs. MII beat information is not needed for the HBs but a nice feature to have. Make it configurable. This makes the state machine a little bit more complex. I can try to draw it for you if you want.

Yes agreed. Will start first with the 2 checks and will add the routing monitoring after since it will demand more work :)

A working keepalived with HB VRRP threads would be a very nice start.

does HB is a crosscable VRRP protocol independent (serial, ethernet, ...) ? I can understand this is good for a hard moniroting because MII

Take it as a crosscable approach. Hardware vendors tend to use serial heartbeats too and while I agree this is a nice feature (protocol independancy in the kernel) we don't want to complicate our lives with serial protocol implementations.

probe & the link is a deductive approach (if link down so we can not send advert, but we can find state where link is up and advert can not be sent....). But if user use SWITCH/HUB... the probability of a crash is very low... For me introducing an extra protocol part for monitoring LXX sound a little workaround. Such a protocol like VRRP can handle natively... This is my current point of view with HB :) (but still open to discuss).

No hubs/switches for HB's. Such devices try to be more intelligent then we need them to be for a simple thing like HB functionality. You make your software behave like follows (something like that, a lot still missing but I need to go cooking):

if (HB) {
  pool_vs=create VRRP threads with L10/L20=M and L11/L21=B;
  if (poll(FS(M/pool_vs)==bad || poll(FS(B/pool_vs)==bad){
    check FS(HB) and work according to the state table;
    send advert over HB;
  if (more_than_1_HB) {
    pool_hb=create VRRP threads with HB11/HB21=M and HB12/HB22=B;
    send adverts over M;
    if (poll(FS(M/pool_hb))==bad || poll(FS(B/pool_hb))==bad){
       handle state transition of pool_hb;
       // state table for pool_hb is a lot different than the one for
       // pool_vs, since you're allowed to have asymmetric routing.
  else {
    create VRRP thread with HB11/HB21=M
    send adverts over M;
else {
  current VRRP/keepalived implementation;

Best regards,
Roberto Nibali, ratz

<Prev in Thread] Current Thread [Next in Thread>