Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirecto

To:	ratz@xxxxxxxxxxxx
Subject:	Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD)
Cc:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From:	Alexandre Cassen <Alexandre.Cassen@xxxxxxxxxx>
Date:	Sat, 30 Mar 2002 14:38:44 +0100

Hi ratz,

Sorry for not having read your excellent 5 page documentation (how theheck do you get such a nice PDF? Are the latex sources for the users guideavailable?). I'm trying to fully understand it now. I assume you refer tochapter 4.3 'VRRP instance synchronization' of LVS-HA-using-VRRPv2.pdf.

The source of the userguide is some where on my company SAN... I need tocorrect some english problems :)

Yes chapter 4.3 quickly describe the actual implementation of the"sync_instance" VRRP extension in keepalived. The current code for"sync_instance" is too buggy for the reasons explained before :/ ...activly patching this part :)


Have worked this night on netlink fetcher for IF events.

In detailed :
In a LVS-NAT env using VRRP (LVS1 & LVS2), realservers default GW is on
eth1 IP of LVS1 and virtual services are exposed to the Internet on eth0
LVS1. All is working => all is inbound/outbound traffic are going threw
LVS1. But, but, but : if LVS1(eth1) fails all associated IP address will be
unavailable and taked-over on the VRRP BACKUP LVS2(eth1) => outbound
traffic will go threw LVS2(eth1) and inbound threw LVS1(eth0) => This
disymetric routing will broke the LVS env.


Yes, I see now, pretty damn hard so.


Ok we are sync :), this kind of state machine is a little hard to explain :/

=> So in our VRRP software we must add a "sync_instance" capability to
preserve code against that (a VRRP extension for our specific needs). This
functionality function using the axiom : If LVS1(eth1) fails then
LVS2(eth1) takeover and LVS2(eth0) become MASTER (for owning IP address).
Now I get it, I hope. This is a 4-bit state diagram with the bits being:LVS1(eth0), LVS2(eth0), LVS1(eth1) and LVS2(eth1). And FAULT_STATE is aresult of a test, either MII beat failure or IFF_DOWN or routing changesor fwrules. According to the state transition table which I haven't seenyet (but I will draw) you know what happens. Thank you Alexandre, my slowbrain starts working now.

Exactly you got it. Adding a "sync_instance" VRRP extension, introduce sideeffect in sate machine and resulting of a protocol instability. Thesolution is adding a new state FAULT_STATE to workaround the instability.

=> This is the need : If now we use this VRRP extension and we are not able
to detect link state, we introduce the "noisy loop" => this mean :
In init state : LVS1(eth1) = MASTER, LVS1(eth0) = MASTER
                LVS2(eth1) = BACKUP, LVS2(eth0) = BACKUP
                LVS1 & LVS2 are using our axiom "sync_instance"
Now, for some reasons, someone unplug the wire on LVS2(eth1). So VRRP
instance will timeout and become MASTER. But axiom will force LVS2(eth0) to
transit to MASTER, so LVS1(eth0) will be BACKUP. But axiom say will force
the symetric on LVS1(eth1) (because all is going nice on LVS1(eth1)) => so
LVS1(eth1) will become BACKUP. Hear is the loop => LVS1(eth1) will timeout
receiving remote VRRP adverts since wire still unplugged ! so transit to
MASTER, then force LVS1(eth0) to MASTER and so LVS2(eth0) to BACKUP, finaly
LVS2(eth1) to BACKUP....... and infinite protocol loop :).... grrr...


For future references:

  MASTER          = M
  BLASTER         = B
  LVS1(eth0)      = L10
  LVS1(eth1)      = L11
  LVS2(eth0)      = L20
  LVS2(eth1)      = L21
  LVS1_Heartbeat1 = HB11
  LVS1_Heartbeat2 = HB12
  LVS2_Heartbeat1 = HB21
  LVS2_Heartbeat2 = HB22
  FAULT_STATE(x)  = FS(x)

OK.

Well that's why the HA folks invented the concept of a heartbeat. Andthat's what you need to implement too in your framework. I think Iunderstand your approach now and I have to tell you that the heartbeat iscrucial. Your software needs to be capable of sending adverts throughphysically independant heartbeat VRRP instances two each for L1x and L2x.They do the STOMITH monitoring to avoid such protocol loops you'rementioning. I would go as far and exchange our HA solution with yourframework, if you can provide FS interaction from HBs to the L1x/L2xtransition, user defined healthchecks and service reload on demand.
With your first approach, FS(x) is a function which has following set ofinteractions to derive it's new state:
o MII beat state changes
o routing changes
o IFF_UP & IFF_RUNNING

Normal desired path:
[L1x=M|L2x=B] ---> FS(L1x,link failure) ---> [L1x=B|L2x=M]

Unwanted path:
[L1x=M|L2x=B] ---> FS(L2x,cable cut)    ---> [proto loop noise, L2x=M]
With my approach you have interaction of the above mentioned 3 plus thestatus information of the HBs which are physically separated. This cutsout the unwanted path and leaves you with the desired failover/failbackpath or state transition.

wow ... very nice !... Just to be sync with you => HB VRRP <=> MII probe +IFF_UP|RUNNING + Routing update ?

The only way to break this noisy loop is to introduce a low-level MII
checker for probing physical state of the NIC.
Yes and the other/additional (for me preferred way) is to have HB VRRPinstances with crosscables. The probability of having FS(HB1x) && FS(HB2x)&& someone disconnects the cable at L20 if we had been in initial state isso close to zero that you can safely assume it is zero. If this situationencounters you're in the wrong job or should exchange the responsibleproject leader.

Ok.

Ugh, how is this possible? Do I understand you correctly that you would
like to put in a policy for handling FAULT state that every NIC driver
then must be able to handle?


no no :) my poor english again :) => This is just an extension of the VRRP
code to add a new state into the state machine drived with the NIC link
beat state => VRRP instance in MASTER or BACKUP will transit to FAULT_STATE
if MII reports bad things... And stay sticked until MII is OK.

I'd rather have HB instances as a pool of resource. This is NICindependant and easy to implement.

Ok so for you during VRRP bootstrap, we register a HB thread (peerformingMII probe + IFF_..... + ...) and keep a global interface struct sync withthe NIC states ? ... This is what I was thinking when starting coding.

Yes we are ok :) Two event condition our VRRP FAULT_STATE :
1. IFF_UP & IFF_RUNNING
2. MII registers values.
Well routing changes or other stupid user related interaction such aspacket filter rules to deny advert messages. Been there, done that.Especially people with little knowledge about packet filtering and HA tendto deploy such funny setups and blame the author of the softwareafterwards that it's not working.

Yes agreed. Will start first with the 2 checks and will add the routingmonitoring after since it will demand more work :)

Well, patching the kernel is not as bad as it sounds, LVS has done it
with success for years now. Noone is complaining, except maybe Joe and
me ;). That's why I would like to see Stefan's patch clean and
completely independant.


Yes, but can handle userspace in our VRRP framework until it is added into
the stable kernel banches to keep compatibility.


What do you think about the HB approach?

does HB is a crosscable VRRP protocol independent (serial, ethernet, ...) ?I can understand this is good for a hard moniroting because MII probe & thelink is a deductive approach (if link down so we can not send advert, butwe can find state where link is up and advert can not be sent....). But ifuser use SWITCH/HUB... the probability of a crash is very low... For meintroducing an extra protocol part for monitoring LXX sound a littleworkaround. Such a protocol like VRRP can handle natively... This is mycurrent point of view with HB :) (but still open to discuss).

Yes, have tested Garzik ethtool and it is not working properly on most of
the MII enabled NICs :/
He's working hard on getting them all to do what they are capable of. Iremember a recent discussion between him and DaveM about the PITA tosupport all kinds of different NIC driver implementations regarding theioctl support state.
So I am starting with the donald becker code wich is generic and working.
Ok.
But, the use of a MII enabled NIC can be a requierment for runing VRRP. If
people is warned it can be acceptable.
It might.
Well, you've got Easter time now. Send your wife to some nice holiday
trip and start coding.
:) I really do not know if she will agree :))) : "Wait darling this is for
MII register....blabla :))))"
Oh, I thought she'd understand that :). Well I hope you have managed towork around that problem. My girlfriend went to Montreux with her friendto have some fun and explore Switzerland which gives me 2 free days tofinally do all the boring non-IT related tasks (taxes, government andmilitary stuff).


=8-D

Give me his phone number and I talk to him.
:)
Seriously, I can also write him a letter. OLS is about 'what you see andhear is what you will get in the next 2 years in the Linux corner'. And ifhe doesn't agree tell him I'll never watch Canal+ anymore (not that I'dever watch TV, but you know ...)


:) yes will try

Yes :) not really easy for me. And if jerome etienne has not reply to my
email I assume he will drop me... And really don't like that kind of
situation...
I don't think this would happen. He's a serious programmer and anengineer. Engineers almost never work against each other but together toachive a goal. The 1% of them which don't get that show no impact on theinventions of the rest of the 99% related to direct productivity gain.


:) This is my opinion too.

c u latter,
Alexandre

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Redirector project for FreeBSD, (continued) Re: Redirector project for FreeBSD, Roberto Nibali Re: Redirector project for FreeBSD, mack@xxxxxxxxxxxx Re: Redirector project for FreeBSD, Roberto Nibali Message not available Re: Redirector project for FreeBSD, Alexandre Cassen Re: Redirector project for FreeBSD, mack@xxxxxxxxxxxx Message not available Re: Redirector project for FreeBSD, Roberto Nibali Re: Redirector project for FreeBSD, Alexandre CASSEN Re: Redirector project for FreeBSD, Joseph Mack Re: Redirector project for FreeBSD, Roberto Nibali Re: Redirector project for FreeBSD, Roberto Nibali Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Alexandre Cassen <= Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Julian Anastasov Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Roberto Nibali Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Julian Anastasov Re: VRRP & sync_instance & low-level NIC monitoring, Alexandre Cassen Re: VRRP & sync_instance & low-level NIC monitoring, Julian Anastasov Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Alexandre Cassen Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Julian Anastasov Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Roberto Nibali

Previous by Date:	problem with the weight, Octave
Next by Date:	Re: problem with the weight, Wensong Zhang
Previous by Thread:	Re: Redirector project for FreeBSD, Roberto Nibali
Next by Thread:	Re: VRRP & sync_instance & low-level NIC monitoring (Was: Re: Redirector project for FreeBSD), Julian Anastasov
Indexes:	[Date] [Thread] [Top] [All Lists]