LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Strange keepalived failover behaviour

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Strange keepalived failover behaviour
From: Dominik Klein <dk@xxxxxxxxxxxxxxxx>
Date: Wed, 16 Nov 2005 16:44:58 +0100
Hi lvs users,

I am experiencing sth (to me) very strange with Keepalived:

Here's my setup:
VIP is 194.xxx.xxx.xxx eth0 connected to a Dell Switch, from there directly to the ISP DIP is 10.2.30.1 eth1 connected to another Dell Switch, from there directly to WWW1 and WWW2 Load Balancers are identical machines LB1 and LB2 with no static IPs on eth0 and eth1, as those are the interfaces the VIP and DIP shall run on and 10.2.40.[12] on eth2 for synchronisation
WWW1 is 10.2.30.3 eth0
WWW2 is 10.2.30.4 eth0

ISP DSL Modem
|
Dell Switch
|
VIP eth0
|*****|
LB1*LB2
|*****|
DIP eth1
|*****|
Dell Switch
|*****|
WWW1*WWW2

LB1 and LB2 Servers shall run in a MASTER/BACKUP Configuration and loadbalance connections to VIP:80. LB1 is the MASTER at startup, LB2 is BACKUP at startup. LB1 and LB2 are connected via eth2 directly for synch purposes (lvs_sync_daemon_interface).

Everything seems to work fine. LB2 takes over after eth0 or eth1 on LB1 goes down, LB2 then has the VIP and DIP, routes properly set up and everything looks good so far. As the interface on LB1 goes up again, LB1 becomes the MASTER again and re-gets VIP and DIP. Okay!

Here's my problem:
When LB2 has become the MASTER, it is able to ping WWW1 and WWW2, but not able to ping the gateway (or connect to the internet). VIP is not accessible from the internet. After a couple of minutes (~5), I am able to ping the gateway and connect to the VIP from the internet. "ip addr list eth0" shows that VIP is on eth0, "ip route" says default route is set up properly and I cannot see any problem in the logs (keepalived started with -D). I thought it might be an ARP-related problem. The upper Dell Switch may not accept the Grat.-ARP - but the switch between LBs and WWWs is exactly the same, both just being reset and both already test-exchanged for another one with no change regarding the problem.

Here's the configuration file I use:

global_defs {
   notification_email {
        my.email@xxxxxxxxxxx
   }
   notification_email_from keepalived@xxxxxxxxxxx
   smtp_server 195.xxx.xxx.xxx
   smtp_connect_timeout 30
   lvs_id TEST_LVS_MASTER
}

vrrp_sync_group TEST {
        group {
                vip_external
                vip_internal
        }
}

vrrp_instance vip_external {
    state MASTER
    interface eth0
    lvs_sync_daemon_interface eth2
    virtual_router_id 51
    priority 150
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass asdqweyxc
    }
    virtual_ipaddress {
        194.xxx.xxx.xxx/27 brd 194.xxx.xxx.xxx dev eth0
    }
}

vrrp_instance vip_internal {
    state MASTER
    interface eth1
    lvs_sync_daemon_interface eth2
    virtual_router_id 52
    priority 150
    advert_int 1
    smtp_alert
    authentication {
        auth_type PASS
        auth_pass qwerasdf
    }
    virtual_ipaddress {
        10.2.30.1/24 brd 10.2.30.255 dev eth1
    }
}

virtual_server 194.8.219.133 80 {
[left out as I think this is not about this problem]
}

Some things I have done already:
* Put dummy IPs for eth0 and eth1 on the LBs (10.2.something) - no difference at all * Put 194.xxx.xxx.xxx ISP IPs for eth0 on the LBs (that worked but would need 3 ISP IPs instead of one and from what I know, this should not be necessary)

What am I missing here?
If you need any more information, just ask!

Regards,
Dominik

<Prev in Thread] Current Thread [Next in Thread>
  • Strange keepalived failover behaviour, Dominik Klein <=