I have a load balancing trouble at a high load.
The trouble always occurs by the follwing process.
#I'm sorry for my poor English in advance.
The network environment for LVS is NAT and very simple.
(One clinet, One loadbalancer, Two real servers.)
The scheduling of LVS is Round-Robin(rr).
ipvsadm -A -t 192.168.0.101:80 -s rr
ipvsadm -a -t 192.168.0.101:80 -r 192.168.1.1:80 -w 1 -m
ipvsadm -a -t 192.168.0.101:80 -r 192.168.1.2:80 -w 1 -m
Apache:2.0.46 (HTTP KeepAlive Off)
HTTP Load Software:wget and ApacheBench(ab)
(1)give a high load to LB1 by while_wget & while_ab from CL1.
(while_wget and while_ab are simple shell scripts.
while_wget repeats "wget -O index.html http://192.168.0.101:80/index.hmtl"
while_ab also repeats "ab -n 10000 -c 10 http://192.168.0.101:80/index.html"
LB1 is correctly loadbalancing to RS1 and RS2 by RoundRobin at this time.
(2)After a few minutes, it seems to be reached to max limit? of
ActiveConns + InActiveConns.
Then crash the NIC(eth0) of RS2 intentionally by executing manually
"ifconfig eth0 down".
(while_wget & while_ab become to be the state of freeze at this time.
This state is no problem.)
And change weight 1 to 0 by executing manually "ipvsadm -e -t
-r 192.168.1.2:80 -w 0 -m". (I tested this process without ldirectord
for checking the behavior of LVS in detail.)
(3)give a new high load to LB1 by while_wget & while_ab from CL1
instead of the old high load in (1).
LB1 is correctly sending http packtes only to RS1 at this time.
(4)And then recover the NIC(eth0) of RS2 intentionally by executing manually
After a while, LB1 starts sending http packets to RS1 and RS2 in spite of
still weight 0 of RS2. Moreover, LB1 is sending the packets to RS2 much
less than RS1.
(This strange behavior continues permanently. So I think the cause of
the behavior isn't always in a retransmit process of TCP Layer.
In fact, the strange behavior stops when i stop the high load from CL1)
(5)After checking this strage behavior for a while, change weight 0 to 1 by
executing manually "ipvsadm -e -t 192.168.0.101:80 -r 192.168.1.2:80 -w 1
But the strange behaivor still continues eternally, LB1 is sending the
to RS2 much less than RS1 in spite of Round-Robin.
(6)Then stop all high load (while_wget & while_ab) from CL1, and wait for a few
minutes by becoming to be close to 0 about ActiveConns + InActiveConns.
And start a new high load from CL1 by while_wget & while_ab, then
LB1 is correctly and evenly loadbalancing to RS1 and RS2 as same as (1)
Is this trouble related with some timers, u_threshold, dest->flag
(IP_VS_DEST_F_OVERLOAD/IP_VS_DEST_F_AVAILABLE) etc in LVS ?
Is this strange behavior correct for the specification of LVS ?
(I can't understand the specification of LVS in detail.)
Is there anything about how to cope with this trouble ?
I'm sorry for my many questions.
If you have information or hints etc about the trouble,
Would you please teach me about them ?
Thanks in advance.