LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

RE : ultramonkey´s "Streamline Highly Availability and Load Balancing"

To: "'LinuxVirtualServer.org users mailing list.'" <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: RE : ultramonkey´s "Streamline Highly Availability and Load Balancing"
From: <techp@xxxxxxxxxxx>
Date: Fri, 16 Dec 2005 12:44:21 +0100
I,

Trie with the option :
quiescent=no

and read the doc to see implications!

Laurent


-----Message d'origine-----
De : lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx
[mailto:lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx] De la part de samuel
Envoyé : vendredi 16 décembre 2005 11:56
À : lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Objet : ultramonkey´s "Streamline Highly Availability and Load
Balancing"

Hi all!!!

I've just started playing around with HA systems so please forgive me
if the answers have been already provided in the list or somewhere
else (in this case, could you please provide a link?). I've looked
around in older threads unsuccesfully...

I have followed the instructions in www.ultramonkey.org site for
setting up a Streamline High Availability and Load Balancing system
with a mysql cluster as real server. I know it's better to start with
simpler setups but I ran out of machines so I had to put the load
balancer and the replicated real servers in the same machines.
The config is the following: Virtual IP=192.168.1.125
node1=192.168.1.123 node2=192.168.1.124
                      Virt. IP=.125
      -------------------           |              -----------------
      |  ldirectord1  | |        | ldirectord2 |
      |  mysqlAPI1  |-------------------| mysqlAPI2 |
      -------------------                         -----------------
      node1 IP=.123             node2  IP= .124

The problem is that when a node fails, the survivor ldirectord does
not remove the failed node from the routing tables, with the funny
thing that one every two requests succeeds (algorithm wrr) and the
other fails with a myconnection error.

I add as much output as I have at the bottom so please take a look and
find the error I made (I hope not to exceed the list's limit).

Thanks a lot,
Samuel.



My config files are adaptions from the ultramonkey web site:

ha.cfg:
mcast eth0 225.255.255.2 695 1 0
auto_failback off
node cmysql_mysqld_1 #return from uname -n
node cmysql_mysqld_2
ping 192.168.1.254
respawn hacluster /usr/lib/heartbeat/ipfail

haresources:
node1 \
        ldirectord::ldirectord.cf \
        LVSSyncDaemonSwap::master \
        IPaddr2::192.168.1.125


ldirectord.cf:
checktimeout=10
checkinterval=2
autoreload=no
logfile="var/log/ldirectord.log"
logfile="local0"
quiescent=yes

virtual=192.168.1.125:3307
        real=192.168.1.123:3307 gate
        real=192.168.1.124:3307 gate
        fallback=127.0.0.1:3307 gate
        checktype=negotiate
        login="ser"
        passwd="heslo"
        database="ser"
        request="SELECT * from version"
        scheduler=wrr

Succession of ping and ipvsadm....weight remains 1 although it is
unreachable!!!
cmysql_mysqld_2:/etc/ha.d# ping 192.168.1.123
PING 192.168.1.123 (192.168.1.123) 56(84) bytes of data.
>From 192.168.1.124 icmp_seq=1 Destination Host Unreachable
>From 192.168.1.124 icmp_seq=2 Destination Host Unreachable
>From 192.168.1.124 icmp_seq=3 Destination Host Unreachable

--- 192.168.1.123 ping statistics ---
5 packets transmitted, 0 received, +3 errors, 100% packet loss, time
4046ms
, pipe 3
cmysql_mysqld_2:/etc/ha.d# ipvsadm -L -n
IP Virtual Server version 1.0.11 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.1.125:3307 wrr
  -> 192.168.1.124:3307           Local   1      0          0
  -> 192.168.1.123:3307           Route   1      0          0   


Extract from /var/log/messages...it restores the failed node with
weight=1 and do not remove it later...
Dec 16 08:19:16 localhost heartbeat[3732]: info: Received shutdown
notice from 'cmysql_mysqld_1'.
Dec 16 08:19:16 localhost heartbeat[3732]: info: Resources being
acquired from cmysql_mysqld_1.
Dec 16 08:19:16 localhost heartbeat[3793]: info: acquire local HA
resources (standby).
Dec 16 08:19:17 localhost heartbeat[3794]: info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys cmysql_mysq
ld_2] to acquire.
Dec 16 08:19:17 localhost heartbeat[3793]: info: local HA resource
acquisition completed (standby).
Dec 16 08:19:17 localhost heartbeat[3732]: info: Standby resource
acquisition done [all].
Dec 16 08:19:17 localhost heartbeat: info: Running /etc/ha.d/rc.d/status
status
Dec 16 08:19:17 localhost heartbeat: info: Taking over resource group
ldirectord::ldirectord.cf
Dec 16 08:19:17 localhost heartbeat: info: Acquiring resource group:
cmysql_mysqld_1 ldirectord::ldirectord.cf LVSSyncDaemon
Swap::master IPaddr2::192.168.1.125
Dec 16 08:19:18 localhost ldirectord[3847]: ldirectord is stopped for
/etc/ha.d/conf/ldirectord.cf
Dec 16 08:19:18 localhost ldirectord[3847]: Exiting with exit_status
3: Exiting from ldirectord status
Dec 16 08:19:18 localhost heartbeat: info: Running
/etc/ha.d/resource.d/ldirectord ldirectord.cf start
Dec 16 08:19:19 localhost ldirectord[3867]: Starting Linux Director
v1.77.2.32 as daemon
Dec 16 08:19:19 localhost ldirectord[3869]: Added virtual server:
192.168.1.125:3307
Dec 16 08:19:19 localhost ldirectord[3869]: Added fallback server:
127.0.0.1:3307 ( x 192.168.1.125:3307) (Weight set to 1)
Dec 16 08:19:20 localhost ldirectord[3869]: Quiescent real server:
192.168.1.123:3307 mapped from 192.168.1.123:3307 ( x 192
.168.1.125:3307) (Weight set to 0)
Dec 16 08:19:20 localhost heartbeat: info: Running
/etc/ha.d/resource.d/LVSSyncDaemonSwap master start
Dec 16 08:19:20 localhost ldirectord[3869]: Quiescent real server:
192.168.1.124:3307 mapped from 192.168.1.124:3307 ( x 192
.168.1.125:3307) (Weight set to 0)
Dec 16 08:19:20 localhost ldirectord[3869]: Restored real server:
192.168.1.123:3307 ( x 192.168.1.125:3307) (Weight set to
1)
Dec 16 08:19:20 localhost kernel: IPVS: stopping sync thread 3393 ...
Dec 16 08:19:20 localhost kernel: IPVS: sync thread stopped!
Dec 16 08:19:20 localhost heartbeat: info: ipvs_syncbackup down
Dec 16 08:19:20 localhost ldirectord[3869]: Deleted fallback server:
127.0.0.1:3307 ( x 192.168.1.125:3307)
Dec 16 08:19:20 localhost kernel: IPVS: sync thread started.
Dec 16 08:19:21 localhost heartbeat: info: ipvs_syncmaster up
Dec 16 08:19:21 localhost heartbeat: info: ipvs_syncmaster obtained
Dec 16 08:19:21 localhost ldirectord[3869]: Restored real server:
192.168.1.124:3307 ( x 192.168.1.125:3307) (Weight set to
1)
Dec 16 08:19:21 localhost heartbeat: info: Running
/etc/ha.d/resource.d/IPaddr2 192.168.1.125 start
Dec 16 08:19:21 localhost heartbeat: info: Removing conflicting loopback
lo.
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -f inet addr delete
192.168.1.125 dev lo
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -o -f inet addr show
lo
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip route delete
192.168.1.125 dev lo
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -f inet addr add
192.168.1.125/24 brd 192.168.1.255 dev eth0
Dec 16 08:19:21 localhost heartbeat: info: /bin/ip link set eth0 up
Dec 16 08:19:21 localhost heartbeat: /usr/lib/heartbeat/send_arp -i
200 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-
192.168.1.125 eth0 192.168.1.125 auto 192.168.1.125 ffffffffffff
Dec 16 08:19:22 localhost heartbeat: info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources
acquired
Dec 16 08:19:22 localhost heartbeat[3732]: info: mach_down takeover
complete.
Dec 16 08:19:22 localhost heartbeat: info: mach_down takeover complete
for node cmysql_mysqld_1.
Dec 16 08:19:47 localhost heartbeat[3732]: WARN: node cmysql_mysqld_1:
is dead
Dec 16 08:19:47 localhost heartbeat[3732]: info: Dead node
cmysql_mysqld_1 gave up resources.
Dec 16 08:19:47 localhost heartbeat[3732]: info: Link
cmysql_mysqld_1:eth0 dead.
Dec 16 08:19:47 localhost ipfail[3741]: info: Status update: Node
cmysql_mysqld_1 now has status dead
Dec 16 08:19:47 localhost ipfail[3741]: info: NS: We are still alive!
Dec 16 08:19:47 localhost ipfail[3741]: info: Link Status update: Link
cmysql_mysqld_1/eth0 now has status dead
Dec 16 08:19:47 localhost ipfail[3741]: info: Asking other side for
ping node count.
Dec 16 08:19:47 localhost ipfail[3741]: info: Checking remote count of
ping nodes.



<Prev in Thread] Current Thread [Next in Thread>