Alright... this is driving me nuts and countless searches have turned up
tons of help, but no solution.
Here's the situation: I have 2 directors set up as localnode (No actual
realservers). That all seems to work perfectly.
The problem is when keepalived starts up, both directors start in MASTER
(as expected), then the backup (lower priority) falls back to 'BACKUP'.
All is well. Then a few seconds later, the backup suddenly decides to
switch to master, sends out an GARP and starts advertising VRRP packets.
The real master sees this, forces a new election and sends out a GARP.
the backup switches back to backup state and all is well for a few more
seconds. Then it all starts over again...
On the master:
Feb 15 17:21:26 pd-lvs01 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower
prio advert, forcing new election
Feb 15 17:21:26 pd-lvs01 Keepalived_vrrp: VRRP_Instance(VI_1) IPSEC-AH :
Syncing seq_num - Increment seq
Feb 15 17:21:26 pd-lvs01 Keepalived_vrrp: VRRP_Instance(VI_1) Sending
gratuitous ARPs on eth1 for 10.1.1.110
Feb 15 17:21:27 pd-lvs01 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower
prio advert, forcing new election
Feb 15 17:21:27 pd-lvs01 Keepalived_vrrp: VRRP_Instance(VI_1) IPSEC-AH :
Syncing seq_num - Increment seq
Feb 15 17:21:27 pd-lvs01 Keepalived_vrrp: VRRP_Instance(VI_1) Sending
gratuitous ARPs on eth1 for 10.1.1.110
. . . Repeat ad nauseum.
On the backup:
Feb 15 17:21:49 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to
MASTER STATE
Feb 15 17:21:49 pd-lvs02 Keepalived_vrrp: VRRP_Group(VG1) Syncing instances to
MASTER state
Feb 15 17:21:50 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER
STATE
Feb 15 17:21:50 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) setting protocol
VIPs.
Feb 15 17:21:50 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) Sending
gratuitous ARPs on eth1 for 10.1.1.110
Feb 15 17:21:50 pd-lvs02 Keepalived_healthcheckers: Netlink reflector reports
IP 10.1.1.110 added
Feb 15 17:21:50 pd-lvs02 Keepalived_healthcheckers: Activating healtchecker for
service [10.1.1.111:22]
Feb 15 17:21:50 pd-lvs02 Keepalived_healthcheckers: Activating healtchecker for
service [10.1.1.112:22]
Feb 15 17:21:50 pd-lvs02 Keepalived_vrrp: Remote SMTP server [10.0.0.18:25]
connected.Feb 15 17:21:50 pd-lvs02 Keepalived_vrrp: Netlink reflector reports
IP 10.1.1.110 added
Feb 15 17:21:50 pd-lvs02 Keepalived_vrrp: SMTP alert successfully sent.
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) Sending
gratuitous ARPs on eth1 for 10.1.1.110
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher
prio advert
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) IPSEC-AH :
Syncing seq_num - Decrement seq
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP
STATE
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: VRRP_Instance(VI_1) removing protocol
VIPs.
Feb 15 17:21:51 pd-lvs02 Keepalived_healthcheckers: Netlink reflector reports
IP 10.1.1.110 removed
Feb 15 17:21:51 pd-lvs02 Keepalived_healthcheckers: Suspending healtchecker for
service [10.1.1.111:22]
Feb 15 17:21:51 pd-lvs02 Keepalived_healthcheckers: Suspending healtchecker for
service [10.1.1.112:22]
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: VRRP_Group(VG1) Syncing instances to
BACKUP state
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: Remote SMTP server [10.0.0.18:25]
connected.Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: Netlink reflector reports
IP 10.1.1.110 removed
Feb 15 17:21:51 pd-lvs02 Keepalived_vrrp: SMTP alert successfully sent.
. . . Shower rinse repeat.
Master's config:
global_defs {
notification_email {
admin@xxxxxxxxxxxxx
}
notification_email_from keepalived@pd-perim01
smtp_server 10.0.0.18
smtp_connect_timeout 30
router_id pd-perim01
}
vrrp_sync_group VG1 {
group {
VI_1
}
}
vrrp_instance VI_1 {
state MASTER
interface eth1
lvs_sync_daemon_inteface eth1
virtual_router_id 51
priority 150
advert_int 1
garp_master_delay 1
smtp_alert
authentication {
auth_type AH
auth_pass likOkeam
}
virtual_ipaddress {
10.1.1.110/24
}
notify_master "/usr/local/bin/keepalived-transition.sh MASTER"
notify_backup "/usr/local/bin/keepalived-transition.sh BACKUP"
notify_fault "/usr/local/bin/keepalived-transition.sh FAULT"
notify_stop "/usr/local/bin/keepalived-transition.sh FAULT"
}
And the backup:
global_defs {
notification_email {
admin@xxxxxxxxxxxxx
}
notification_email_from keepalived@pd-perim02
smtp_server 10.0.0.18
smtp_connect_timeout 30
router_id pd-perim02
}
vrrp_sync_group VG1 {
group {
VI_1
}
}
vrrp_instance VI_1 {
state MASTER
interface eth1
lvs_sync_daemon_inteface eth1
virtual_router_id 51
priority 100
advert_int 1
garp_master_delay 1
smtp_alert
authentication {
auth_type AH
auth_pass likOkeam
}
virtual_ipaddress {
10.1.1.110/24
}
notify_master "/usr/local/bin/keepalived-transition.sh MASTER"
notify_backup "/usr/local/bin/keepalived-transition.sh BACKUP"
notify_fault "/usr/local/bin/keepalived-transition.sh FAULT"
notify_stop "/usr/local/bin/keepalived-transition.sh FAULT"
}
The notify script is just:
#!/bin/bash
case $1 in
MASTER ) ip addr del 10.1.1.110/24 dev lo ;;
BACKUP ) ip addr del 10.1.1.110/24 dev eth1
ip addr add 10.1.1.110/24 dev lo ;;
FAULT ) ip addr del 10.1.1.110/24 dev lo
ip addr del 10.1.1.110/24 dev eth1 ;;
esac
Some mcast traffic back and forth:
Master:
17:42:23.528439 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x11):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
17:42:24.528923 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x12):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
17:42:24.530162 IP 10.1.1.112 > vrrp.mcast.net: AH(spi=0x0a010170,seq=0x10):
VRRPv2, Advertisement, vrid 51, prio 100, authtype ah, intvl 1s, length 20
17:42:24.530273 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x12):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
17:42:25.530892 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x13):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
Backup:
17:42:23.528921 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x11):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
17:42:24.529458 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x12):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
17:42:24.530550 IP 10.1.1.112 > vrrp.mcast.net: AH(spi=0x0a010170,seq=0x10):
VRRPv2, Advertisement, vrid 51, prio 100, authtype ah, intvl 1s, length 20
17:42:24.530710 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x12):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
17:42:25.531370 IP 10.1.1.111 > vrrp.mcast.net: AH(spi=0x0a01016f,seq=0x13):
VRRPv2, Advertisement, vrid 51, prio 150, authtype ah, intvl 1s, length 20
Any ideas? I'm stumped. I tried changing just about everything. I'm out of
ideas.
--
Sal Tepedino <stepedino@xxxxxxxxxxxxxx>
|