LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

load balancing trouble at a high load

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: load balancing trouble at a high load
From: Hideaki Kondo <kondo.hideaki@xxxxxxxxxxxxx>
Date: Thu, 25 May 2006 15:30:42 +0900
Hello,

I have a load balancing trouble at a high load.
The trouble always occurs by the follwing process.

#I'm sorry for my poor English in advance.

The network environment for LVS is NAT and very simple.
(One clinet, One loadbalancer, Two real servers.)
The scheduling of LVS is Round-Robin(rr).

<<Network Environment>>
[CL1]----[LB1(LVS)]---------[RS1(Apache)]
           |
            -----[RS2(Apache)]
(Example)
CL1(eth0):192.168.0.1
LB1(eth0):192.168.0.101
LB1(eth1):192.168.1.101
RS1(eth0):192.168.1.1
RS2(eth0):192.168.1.2

<<LoadBalancer(LB1) Environment>>
OS:RHEL4 Update2(kernel-2.6.9-22.EL)
LVS(ip_vs):1.2.0
ipvsadm:1.24-1

(LVS Setting)
ipvsadm -A -t 192.168.0.101:80 -s rr
ipvsadm -a -t 192.168.0.101:80 -r 192.168.1.1:80 -w 1 -m
ipvsadm -a -t 192.168.0.101:80 -r 192.168.1.2:80 -w 1 -m

<<RealServer(RS1,RS2) Environment>>
OS:RHEL3 Update6(kernel-2.4.21-37.EL)
Apache:2.0.46 (HTTP KeepAlive Off)

<<Client(CL1) Environment>>
OS:RHEL3 Update6(kernel-2.4.21-37.EL)
HTTP Load Software:wget and ApacheBench(ab)

<<Trouble Process>>
(1)give a high load to LB1 by while_wget & while_ab from CL1.
  (while_wget and while_ab are simple shell scripts.
   while_wget repeats "wget -O index.html http://192.168.0.101:80/index.hmtl";
   without sleep. 
   while_ab also repeats "ab -n 10000 -c 10 http://192.168.0.101:80/index.html";
   without sleep.)
   LB1 is correctly loadbalancing to RS1 and RS2 by RoundRobin at this time.
   
(2)After a few minutes, it seems to be reached to max limit? of
   ActiveConns + InActiveConns. 
   Then crash the NIC(eth0) of RS2 intentionally by executing manually 
   "ifconfig eth0 down".
  (while_wget & while_ab become to be the state of freeze at this time.
   This state is no problem.)
   And change weight 1 to 0 by executing manually "ipvsadm -e -t 
192.168.0.101:80
   -r 192.168.1.2:80 -w 0 -m". (I tested this process without ldirectord
   for checking the behavior of LVS in detail.)

(3)give a new high load to LB1 by while_wget & while_ab from CL1
   instead of the old high load in (1).
   LB1 is correctly sending http packtes only to RS1 at this time.

(4)And then recover the NIC(eth0) of RS2 intentionally by executing manually
   "/etc/init.d/network restart".
   After a while, LB1 starts sending http packets to RS1 and RS2 in spite of
   still weight 0 of RS2. Moreover, LB1 is sending the packets to RS2 much
   less than RS1.
  (This strange behavior continues permanently. So I think the cause of 
   the behavior isn't always in a retransmit process of TCP Layer.
   In fact, the strange behavior stops when i stop the high load from CL1)

(5)After checking this strage behavior for a while, change weight 0 to 1 by
   executing manually "ipvsadm -e -t 192.168.0.101:80 -r 192.168.1.2:80 -w 1 
-m".
   But the strange behaivor still continues eternally, LB1 is sending the 
packets
   to RS2 much less than RS1 in spite of Round-Robin.

(6)Then stop all high load (while_wget & while_ab) from CL1, and wait for a few
   minutes by becoming to be close to 0 about ActiveConns + InActiveConns.
   And start a new high load from CL1 by while_wget & while_ab, then
   LB1 is correctly and evenly loadbalancing to RS1 and RS2 as same as (1)

Is this trouble related with some timers, u_threshold, dest->flag
(IP_VS_DEST_F_OVERLOAD/IP_VS_DEST_F_AVAILABLE) etc in LVS ?
Is this strange behavior correct for the specification of LVS ?
(I can't understand the specification of LVS in detail.)
Is there anything about how to cope with this trouble ?

I'm sorry for my many questions.
If you have information or hints etc about the trouble,
Would you please teach me about them ?

Thanks in advance.
Best regards,

Hideaki Kondo




<Prev in Thread] Current Thread [Next in Thread>