LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: MAC address on dummy0

To: Alexandre Cassen <Alexandre.Cassen@xxxxxxxxxx>
Subject: Re: MAC address on dummy0
Cc: "lvs-users@xxxxxxxxxxxxxxxxxxxxxx" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
From: Roberto Nibali <ratz@xxxxxx>
Date: Fri, 24 Aug 2001 16:55:03 +0200
Hello Alexandre,

> nice to hear about you and no pbs (tu fais pas chier, je comprends c'est
> pareil pour moi, j suis overloaded)

merci, je me sens mieux maintenant :)
 
> > > I have played with 3COM card to play with 3 differents MAC but it is
> > > extrimly hardware dependent... :/
> >
> >Yes, remember the email I sent you about MAC addresses?
> 
> Yes exactly I have try on a vortex NIC.

I'm figuring out how to poll the mdio[1] bit from the MII beat interface.
The nice thing about this is, that if this flag atomically has the link
status set, independant of the software, so we don't need a daemon to
check the link connectivity. 

Basically you do:
  o mii_val[1] = mdio_read(sock, 1);
  o if (!(mii_val[1] & MII_BMSR_LINK_VALID)){
        dev->flags &= ~IFF_UP
    }
  o Then you take the patch (dead gateway detection) Julian sent to the 
    netdev list 1 month ago and this patch will detect this !IFF_UP and
    mark the routing entry as dead. This could work, but could also not
    work, I haven't tried it yet :)
  o The problem is, that this doesn't really solve your problem with the
    VMAC. But maybe we find a way where we don't need a VMAC on dummy0.
 
> For my topology I have 2 NICs, one to serve the LAN and the other for the
> WAN => LVS NAT in my setup test.

Is the second card a wanpipe card? If so, this will change a lot anyway.
Packet generation in wanpipe devices works different to "normal" net device
driver package build.
 
> Consider the PDF file I sent. When you have 2 LVS directors (LD1 & LD2)

:) Sorry, I have to have a look at it again this evening. It looks like
physics in highschool, where they try to explain you that electrons swirl
around a protons and that their lieu is given by a cloud. It's a very 
complex draft, IMHO and I'm not sure if you gain a lot with this. It looks
like some kind of geographical redundancy at the first sight. If so, then
you're reinventing BGP and you might join Horms who gave a phantastic talk
about this at OLS. Check out [http://www.supersparrow.org/].

> using LVS-NAT. On each ETH0 for WAN interface & ETH1 for LAN as default gw
> for the realserver pools.

Ahh, LVS_NAT. So you have to have different failover policies for the
WAN interface then for the LAN interface. The WAN redundancy can be
achieved by my idea described above or maybe device bonding, although
it is not per-connection, only on per-packet basis, as Julian pointed
out in a previous email. The LAN failover/failback may be done very
suitable with the vrrpd and the connection to LVS, because there we
care about the service and we want to do a failover, if the service
is down (which probability is >> then a link state down)
 
> If we run 2 VRRP instances on both LD1 & LD2 interfaces ETH0 & ETH1. If
> ETH0 on LD1 fails (hardware pb, cable cutted, NIC fire :)...), then ETH0 on

What if a bomb destroys your datacenter? Try to think how google.com
do their geographical loadbalancing.

> LD2 takeover according to VRRP protocol. During that takeover to preserve
> routing path we need to synchronize VRRP instance on ETH1. So LD2 VRRP
> instance running on ETH0 send a higher priority VRRP advert to LD1 VRRP
> instance running on ETH0. When LD1 VRRP instance on ETH0 receive that
> advert it shutdown the VRRP VIP. Then LD2 VRRP instance on ETH0 set the VIP.

Do you really need to send the new routing path info? I think it is enough
if you you advertise that LD1 is going to failover and then you send the
destination failover device and the LD2 instance changes the routing table
because if you change it too fast, you loose control over packets. But this
is only a heuristic talk of me, you have to test it by yourself.

> => So we have right now both VRRP instance fully sync.

Consider the tiny timeframe were LD1 is in failover process and the
advertisment has not yet been fully acknowledged by LD2. In this 
moment some bloody BOFH cuts the network cable of the physical segment
of LD2 and LD2 wants to failover. You also have to include a semaphore
mechanism for this case. Kind of spinlock_t advertisment; :)
 
> => Will document this tonight :)

Cool, could you cc it to my linux-vs.org address, please? I cannot
get @tac.ch emails out of work.
 
> In fact when setting ipvsadm rules, for VIP I use VRRP VIP. (even more
> mixing it with LVS sync daemon, when takeover we can preserve the
> connection table)

Wow, does this work? It's time for me to finally really do some
stress testing on the transition table sync.

> To set the VMAC daemon use ioctl system call. To set VRRP VIPs use netlink
> RTM_NEWADDR call
> 
> ---[ snipped code : Begin ]---
> 
> int ipaddr_op(int ifindex, uint32_t addr, int addF)
> ....
>    req.n.nlmsg_len    = NLMSG_LENGTH(sizeof(struct ifaddrmsg));
>    req.n.nlmsg_flags  = NLM_F_REQUEST;
>    req.n.nlmsg_type   = addF ? RTM_NEWADDR : RTM_DELADDR;
>    req.ifa.ifa_family = AF_INET;
>    req.ifa.ifa_index  = ifindex;
>    req.ifa.ifa_prefixlen  = 32;
> 
>    addr = htonl(addr);
>    addattr_l(&req.n, sizeof(req), IFA_LOCAL, &addr, sizeof(addr));
> 
>    if (rtnl_open(&rth, 0) < 0)
>      return -1;
>    if (rtnl_talk(&rth, &req.n, 0, 0, NULL, NULL, NULL) < 0)
>      return -1;
> ...
> ---[ snipped code : End ]---

Ok.
 
> => so it is a secondary ip => is it compatible with LVS ? can we set a LVS
> VIP using a secondary interface IP ?

This is what a alias is actually doing since 2.1.127 :) Unfortunately people
still think of a physically secluded device if they talk about alias but an
alias is nothing more then a label, and the IP assigned to it is a secondary
IP address to the underlying physical device. That's why when you have to
make routing work over aliased interfaces, where you need the outgoing packet
to get the sourceIP of the alias you need to use the undocumented 'src' of
the iproute2 tools.

ip route add default via DGW_IP src alias_IP dev phys_intf

Because if not, the outgoing packets, although received and accepted by
the 'alias' will not have the alias IP for outgoing.

Such is life, and life mostly sucks when you work with routing.
 
> > > The only way we can prevent against network failures in that scenario is 
> > > to
> > > use VRRP.
> >
> >Or you might put your vrrpd on top of a bonding interface.
> 
> hmm, will investigate on it :)

Excellent.
 
> > > My point of vue is runing one VRRP instance per physical instance.
> >
> >No question about that. Make sure you use an as secure transportation
> >and notification protocol as the least secure deployed one (TCP in
> >most cases)
> 
> I need to document that point.

Don't copy my english wording mistakes, though :)
 
> >Wow! Reading this I get the impression of a big project going on there.
> 
> Can be a nice LVS addon.

How do you intend to secure the vrrpd?
 
> >I'm sorry, I don't reply that often, but I just can't handle all the
> >projects right now and plus it's summer here with excellent 30 degrees
> >celsius. Half of the day I'm out playing beach volleyball or climbing.
> 
> No pbs :)

Just came back from a nice beach volleyball match. It's fantastic :)

Take care,
Roberto Nibali, ratz

-- 
mailto: `echo NrOatSz@xxxxxxxxx | sed 's/[NOSPAM]//g'`


<Prev in Thread] Current Thread [Next in Thread>