Thanks a lot!
Unfortunately it did not solve the problem: ldirector still does not
assign weight=0 to the failing nodes.
Might be related to the fact that when ldirectord starts there is one
node from the ldirectord.cf file already down???
Reading docs,
SAmuel.
2005/12/16, techp@xxxxxxxxxxx <techp@xxxxxxxxxxx>:
> I,
>
> Trie with the option :
> quiescent=no
>
> and read the doc to see implications!
>
> Laurent
>
>
> -----Message d'origine-----
> De: lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx
> [mailto:lvs-users-bounces@xxxxxxxxxxxxxxxxxxxxxx] De la part de samuel
> Envoyé: vendredi 16 décembre 2005 11:56
> À: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Objet: ultramonkey´s "Streamline Highly Availability and Load
> Balancing"
>
> Hi all!!!
>
> I've just started playing around with HA systems so please forgive me
> if the answers have been already provided in the list or somewhere
> else (in this case, could you please provide a link?). I've looked
> around in older threads unsuccesfully...
>
> I have followed the instructions in www.ultramonkey.org site for
> setting up a Streamline High Availability and Load Balancing system
> with a mysql cluster as real server. I know it's better to start with
> simpler setups but I ran out of machines so I had to put the load
> balancer and the replicated real servers in the same machines.
> The config is the following: Virtual IP=192.168.1.125
> node1=192.168.1.123 node2=192.168.1.124
> Virt. IP=.125
> ------------------- | -----------------
> | ldirectord1 | | | ldirectord2 |
> | mysqlAPI1 |-------------------| mysqlAPI2 |
> ------------------- -----------------
> node1 IP=.123 node2 IP= .124
>
> The problem is that when a node fails, the survivor ldirectord does
> not remove the failed node from the routing tables, with the funny
> thing that one every two requests succeeds (algorithm wrr) and the
> other fails with a myconnection error.
>
> I add as much output as I have at the bottom so please take a look and
> find the error I made (I hope not to exceed the list's limit).
>
> Thanks a lot,
> Samuel.
>
>
>
> My config files are adaptions from the ultramonkey web site:
>
> ha.cfg:
> mcast eth0 225.255.255.2 695 1 0
> auto_failback off
> node cmysql_mysqld_1 #return from uname -n
> node cmysql_mysqld_2
> ping 192.168.1.254
> respawn hacluster /usr/lib/heartbeat/ipfail
>
> haresources:
> node1 \
> ldirectord::ldirectord.cf \
> LVSSyncDaemonSwap::master \
> IPaddr2::192.168.1.125
>
>
> ldirectord.cf:
> checktimeout=10
> checkinterval=2
> autoreload=no
> logfile="var/log/ldirectord.log"
> logfile="local0"
> quiescent=yes
>
> virtual=192.168.1.125:3307
> real=192.168.1.123:3307 gate
> real=192.168.1.124:3307 gate
> fallback=127.0.0.1:3307 gate
> checktype=negotiate
> login="ser"
> passwd="heslo"
> database="ser"
> request="SELECT * from version"
> scheduler=wrr
>
> Succession of ping and ipvsadm....weight remains 1 although it is
> unreachable!!!
> cmysql_mysqld_2:/etc/ha.d# ping 192.168.1.123
> PING 192.168.1.123 (192.168.1.123) 56(84) bytes of data.
> >From 192.168.1.124 icmp_seq=1 Destination Host Unreachable
> >From 192.168.1.124 icmp_seq=2 Destination Host Unreachable
> >From 192.168.1.124 icmp_seq=3 Destination Host Unreachable
>
> --- 192.168.1.123 ping statistics ---
> 5 packets transmitted, 0 received, +3 errors, 100% packet loss, time
> 4046ms
> , pipe 3
> cmysql_mysqld_2:/etc/ha.d# ipvsadm -L -n
> IP Virtual Server version 1.0.11 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP 192.168.1.125:3307 wrr
> -> 192.168.1.124:3307 Local 1 0 0
> -> 192.168.1.123:3307 Route 1 0 0
>
>
> Extract from /var/log/messages...it restores the failed node with
> weight=1 and do not remove it later...
> Dec 16 08:19:16 localhost heartbeat[3732]: info: Received shutdown
> notice from 'cmysql_mysqld_1'.
> Dec 16 08:19:16 localhost heartbeat[3732]: info: Resources being
> acquired from cmysql_mysqld_1.
> Dec 16 08:19:16 localhost heartbeat[3793]: info: acquire local HA
> resources (standby).
> Dec 16 08:19:17 localhost heartbeat[3794]: info: No local resources
> [/usr/lib/heartbeat/ResourceManager listkeys cmysql_mysq
> ld_2] to acquire.
> Dec 16 08:19:17 localhost heartbeat[3793]: info: local HA resource
> acquisition completed (standby).
> Dec 16 08:19:17 localhost heartbeat[3732]: info: Standby resource
> acquisition done [all].
> Dec 16 08:19:17 localhost heartbeat: info: Running /etc/ha.d/rc.d/status
> status
> Dec 16 08:19:17 localhost heartbeat: info: Taking over resource group
> ldirectord::ldirectord.cf
> Dec 16 08:19:17 localhost heartbeat: info: Acquiring resource group:
> cmysql_mysqld_1 ldirectord::ldirectord.cf LVSSyncDaemon
> Swap::master IPaddr2::192.168.1.125
> Dec 16 08:19:18 localhost ldirectord[3847]: ldirectord is stopped for
> /etc/ha.d/conf/ldirectord.cf
> Dec 16 08:19:18 localhost ldirectord[3847]: Exiting with exit_status
> 3: Exiting from ldirectord status
> Dec 16 08:19:18 localhost heartbeat: info: Running
> /etc/ha.d/resource.d/ldirectord ldirectord.cf start
> Dec 16 08:19:19 localhost ldirectord[3867]: Starting Linux Director
> v1.77.2.32 as daemon
> Dec 16 08:19:19 localhost ldirectord[3869]: Added virtual server:
> 192.168.1.125:3307
> Dec 16 08:19:19 localhost ldirectord[3869]: Added fallback server:
> 127.0.0.1:3307 ( x 192.168.1.125:3307) (Weight set to 1)
> Dec 16 08:19:20 localhost ldirectord[3869]: Quiescent real server:
> 192.168.1.123:3307 mapped from 192.168.1.123:3307 ( x 192
> .168.1.125:3307) (Weight set to 0)
> Dec 16 08:19:20 localhost heartbeat: info: Running
> /etc/ha.d/resource.d/LVSSyncDaemonSwap master start
> Dec 16 08:19:20 localhost ldirectord[3869]: Quiescent real server:
> 192.168.1.124:3307 mapped from 192.168.1.124:3307 ( x 192
> .168.1.125:3307) (Weight set to 0)
> Dec 16 08:19:20 localhost ldirectord[3869]: Restored real server:
> 192.168.1.123:3307 ( x 192.168.1.125:3307) (Weight set to
> 1)
> Dec 16 08:19:20 localhost kernel: IPVS: stopping sync thread 3393 ...
> Dec 16 08:19:20 localhost kernel: IPVS: sync thread stopped!
> Dec 16 08:19:20 localhost heartbeat: info: ipvs_syncbackup down
> Dec 16 08:19:20 localhost ldirectord[3869]: Deleted fallback server:
> 127.0.0.1:3307 ( x 192.168.1.125:3307)
> Dec 16 08:19:20 localhost kernel: IPVS: sync thread started.
> Dec 16 08:19:21 localhost heartbeat: info: ipvs_syncmaster up
> Dec 16 08:19:21 localhost heartbeat: info: ipvs_syncmaster obtained
> Dec 16 08:19:21 localhost ldirectord[3869]: Restored real server:
> 192.168.1.124:3307 ( x 192.168.1.125:3307) (Weight set to
> 1)
> Dec 16 08:19:21 localhost heartbeat: info: Running
> /etc/ha.d/resource.d/IPaddr2 192.168.1.125 start
> Dec 16 08:19:21 localhost heartbeat: info: Removing conflicting loopback
> lo.
> Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -f inet addr delete
> 192.168.1.125 dev lo
> Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -o -f inet addr show
> lo
> Dec 16 08:19:21 localhost heartbeat: info: /bin/ip route delete
> 192.168.1.125 dev lo
> Dec 16 08:19:21 localhost heartbeat: info: /bin/ip -f inet addr add
> 192.168.1.125/24 brd 192.168.1.255 dev eth0
> Dec 16 08:19:21 localhost heartbeat: info: /bin/ip link set eth0 up
> Dec 16 08:19:21 localhost heartbeat: /usr/lib/heartbeat/send_arp -i
> 200 -r 5 -p /var/lib/heartbeat/rsctmp/send_arp/send_arp-
> 192.168.1.125 eth0 192.168.1.125 auto 192.168.1.125 ffffffffffff
> Dec 16 08:19:22 localhost heartbeat: info:
> /usr/lib/heartbeat/mach_down: nice_failback: foreign resources
> acquired
> Dec 16 08:19:22 localhost heartbeat[3732]: info: mach_down takeover
> complete.
> Dec 16 08:19:22 localhost heartbeat: info: mach_down takeover complete
> for node cmysql_mysqld_1.
> Dec 16 08:19:47 localhost heartbeat[3732]: WARN: node cmysql_mysqld_1:
> is dead
> Dec 16 08:19:47 localhost heartbeat[3732]: info: Dead node
> cmysql_mysqld_1 gave up resources.
> Dec 16 08:19:47 localhost heartbeat[3732]: info: Link
> cmysql_mysqld_1:eth0 dead.
> Dec 16 08:19:47 localhost ipfail[3741]: info: Status update: Node
> cmysql_mysqld_1 now has status dead
> Dec 16 08:19:47 localhost ipfail[3741]: info: NS: We are still alive!
> Dec 16 08:19:47 localhost ipfail[3741]: info: Link Status update: Link
> cmysql_mysqld_1/eth0 now has status dead
> Dec 16 08:19:47 localhost ipfail[3741]: info: Asking other side for
> ping node count.
> Dec 16 08:19:47 localhost ipfail[3741]: info: Checking remote count of
> ping nodes.
> _______________________________________________
> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to http://www.in-addr.de/mailman/listinfo/lvs-users
>
>
> _______________________________________________
> LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
> Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
> or go to http://www.in-addr.de/mailman/listinfo/lvs-users
>
|