Hi all!!!
I've just started playing around with HA systems so please forgive me
if the answers have been already provided in the list or somewhere
else (in this case, could you please provide a link?). I've looked
around in older threads unsuccesfully...
I have followed the instructions in www.ultramonkey.org site for
setting up a Streamline High Availability and Load Balancing system
with a mysql cluster as real server. I know it's better to start with
simpler setups but I ran out of machines so I had to put the load
balancer and the replicated real servers in the same machines.
The config is the following: Virtual IP=192.168.1.125
node1=192.168.1.123 node2=192.168.1.124
Virt. IP=.125
------------------- | -----------------
| ldirectord1 | | | ldirectord2 |
| mysqlAPI1 |-------------------| mysqlAPI2 |
------------------- -----------------
node1 IP=.123 node2 IP= .124
The problem is that when a node fails, the survivor ldirectord does
not remove the failed node from the routing tables, with the funny
thing that one every two requests succeeds (algorithm wrr) and the
other fails with a myconnection error.
I add as much output as I have at the bottom so please take a look and
find the error I made (I hope not to exceed the list's limit).
Thanks a lot,
Samuel.
My config files are adaptions from the ultramonkey web site:
ha.cfg:
mcast eth0 225.255.255.2 695 1 0
auto_failback off
node cmysql_mysqld_1 #return from uname -n
node cmysql_mysqld_2
ping 192.168.1.254
respawn hacluster /usr/lib/heartbeat/ipfail
haresources:
node1 \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::192.168.1.125
ldirectord.cf:
checktimeout=10
checkinterval=2
autoreload=no
logfile="var/log/ldirectord.log"
logfile="local0"
quiescent=yes
virtual=192.168.1.125:3307
real=192.168.1.123:3307 gate
real=192.168.1.124:3307 gate
fallback=127.0.0.1:3307 gate
checktype=negotiate
login="ser"
passwd="heslo"
database="ser"
request="SELECT * from version"
scheduler=wrr
Succession of ping and ipvsadm....weight remains 1 although it is unreachable!!!
cmysql_mysqld_2:/etc/ha.d# ping 192.168.1.123
PING 192.168.1.123 (192.168.1.123) 56(84) bytes of data.
>From 192.168.1.124 icmp_seq=1 Destination Host Unreachable
>From 192.168.1.124 icmp_seq=2 Destination Host Unreachable
>From 192.168.1.124 icmp_seq=3 Destination Host Unreachable
--- 192.168.1.123 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time 4046ms
, pipe 3
cmysql_mysqld_2:/etc/ha.d# ipvsadm -L -n
IP Virtual Server version 1.0.11 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.1.125:3307 wrr
-> 192.168.1.124:3307 Local 1 0 0
-> 192.168.1.123:3307 Route 1 0 0
Extract from /var/log/messages...it restores the failed node with
weight=1 and do not remove it later...
Dec 16 08:19:16 localhost heartbeat[3732]: info: Received shutdown
notice from 'cmysql_mysqld_1'.
Dec 16 08:19:16 localhost heartbeat[3732]: info: Resources being
acquired from cmysql_mysqld_1.
Dec 16 08:19:16 localhost heartbeat[3793]: info: acquire local HA
resources (standby).
Dec 16 08:19:17 localhost heartbeat[3794]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys cmysql_mysq
ld_2] to acquire.
Dec 16 08:19:17 localhost heartbeat[3793]: info: local HA resource
acquisition completed (standby).
Dec 16 08:19:17 localhost heartbeat[3732]: info: Standby resource
acquisition done [all].
Dec 16 08:19:17 localhost heartbeat: info: Running /etc/ha.d/rc.d/status status
Dec 16 08:19:17 localhost heartbeat: info: Taking over resource group
ldirectord::ldirectord.cf
Dec 16 08:19:17 localhost heartbeat: info: Acquiring resource group:
cmysql_mysqld_1 ldirectord::ldirectord.cf LVSSyncDaemon
Swap::master IPaddr2::192.168.1.125
Dec 16 08:19:18 localhost ldirectord[3847]: ldirectord is stopped for
/etc/ha.d/conf/ldirectord.cf
Dec 16 08:19:18 localhost ldirectord[3847]: Exiting with exit_status
3: Exiting from ldirectord status
Dec 16 08:19:18 localhost heartbeat: info: Running
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 16 08:19:19 localhost ldirectord[3867]: Starting Linux Director
v1.77.2.32 as daemon
Dec 16 08:19:19 localhost ldirectord[3869]: Added virtual server:
192.168.1.125:3307
Dec 16 08:19:19 localhost ldirectord[3869]: Added fallback server:
127.0.0.1:3307 ( x 192.168.1.125:3307) (Weight set to 1)
Dec 16 08:19:20 localhost ldirectord[3869]: Quiescent real server:
192.168.1.123:3307 mapped from 192.168.1.123:3307 ( x 192
.168.1.125:3307) (Weight set to 0)
Dec 16 08:19:20 localhost heartbeat: info: Running
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Dec 16 08:19:20 localhost ldirectord[3869]: Quiescent real server:
192.168.1.124:3307 mapped from 192.168.1.124:3307 ( x 192
.168.1.125:3307) (Weight set to 0)
Dec 16 08:19:20 localhost ldirectord[3869]: Restored real server:
192.168.1.123:3307 ( x 192.168.1.125:3307) (Weight set to
1)
Dec 16 08:19:20 localhost kernel: IPVS: stopping sync thread 3393 ...
Dec 16 08:19:20 localhost kernel: IPVS: sync thread stopped!
Dec 16 08:19:20 localhost heartbeat: info: ipvs_syncbackup down
Dec 16 08:19:20 localhost ldirectord[3869]: Deleted fallback server:
127.0.0.1:3307 ( x 192.168.1.125:3307)
Dec 16 08:19:20 localhost kernel: IPVS: sync thread started.
Dec 16 08:19:21 localhost heartbeat: info: ipvs_syncmaster up
Dec 16 08:19:21 localhost heartbeat: info: ipvs_syncmaster obtained
Dec 16 08:19:21 localhost ldirectord[3869]: Restored real server:
192.168.1.124:3307 ( x 192.168.1.125:3307) (Weight set to
1)
Dec 16 08:19:21 localhost heartbeat: info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.1.125 start
Dec 16 08:19:21 localhost heartbeat: info: Removing conflicting loopback lo.
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -f inet addr delete
192.168.1.125 dev lo
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -o -f inet addr show lo
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip route delete
192.168.1.125 dev lo
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -f inet addr add
192.168.1.125/24 brd 192.168.1.255 dev eth0
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip link set eth0 up
Dec 16 08:19:21 localhost heartbeat: /usr/lib/heartbeat/send_arp -i
200 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-
192.168.1.125 eth0 192.168.1.125 auto 192.168.1.125 ffffffffffff
Dec 16 08:19:22 localhost heartbeat: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
Dec 16 08:19:22 localhost heartbeat[3732]: info: mach_down takeover complete.
Dec 16 08:19:22 localhost heartbeat: info: mach_down takeover complete
for node cmysql_mysqld_1.
Dec 16 08:19:47 localhost heartbeat[3732]: WARN: node cmysql_mysqld_1: is dead
Dec 16 08:19:47 localhost heartbeat[3732]: info: Dead node
cmysql_mysqld_1 gave up resources.
Dec 16 08:19:47 localhost heartbeat[3732]: info: Link cmysql_mysqld_1:eth0 dead.
Dec 16 08:19:47 localhost ipfail[3741]: info: Status update: Node
cmysql_mysqld_1 now has status dead
Dec 16 08:19:47 localhost ipfail[3741]: info: NS: We are still alive!
Dec 16 08:19:47 localhost ipfail[3741]: info: Link Status update: Link
cmysql_mysqld_1/eth0 now has status dead
Dec 16 08:19:47 localhost ipfail[3741]: info: Asking other side for
ping node count.
Dec 16 08:19:47 localhost ipfail[3741]: info: Checking remote count of
ping nodes.
|