LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: load balancing trouble at a high load

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: load balancing trouble at a high load
From: Hideaki Kondo <kondo.hideaki@xxxxxxxxxxxxx>
Date: Thu, 25 May 2006 19:22:36 +0900
Hello,

i give a supplementary explanation about my report
because of the lack of my information.

> <<Trouble Process>>
> (1)give a high load to LB1 by while_wget & while_ab from CL1.
>   (while_wget and while_ab are simple shell scripts.
>    while_wget repeats "wget -O index.html http://192.168.0.101:80/index.hmtl";
>    without sleep. 
>    while_ab also repeats "ab -n 10000 -c 10 
> http://192.168.0.101:80/index.html";
>    without sleep.)
>    LB1 is correctly loadbalancing to RS1 and RS2 by RoundRobin at this time.
>    
> (2)After a few minutes, it seems to be reached to max limit? of
>    ActiveConns + InActiveConns. 

It seems that the trouble always occurs when ActiveConn + InActConn is
close to total 28231 as follows.

------------------------------------------------------------------------
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.101:http rr
  -> rs02:http                    Masq    1      0          1         
  -> rs01:http                    Masq    1      1          28229
------------------------------------------------------------------------

>    Then crash the NIC(eth0) of RS2 intentionally by executing manually 
>    "ifconfig eth0 down".
>   (while_wget & while_ab become to be the state of freeze at this time.
>    This state is no problem.)
>    And change weight 1 to 0 by executing manually "ipvsadm -e -t 
> 192.168.0.101:80
>    -r 192.168.1.2:80 -w 0 -m". (I tested this process without ldirectord
>    for checking the behavior of LVS in detail.)
> 
> (3)give a new high load to LB1 by while_wget & while_ab from CL1
>    instead of the old high load in (1).
>    LB1 is correctly sending http packtes only to RS1 at this time.
> 
> (4)And then recover the NIC(eth0) of RS2 intentionally by executing manually
>    "/etc/init.d/network restart".
>    After a while, LB1 starts sending http packets to RS1 and RS2 in spite of
>    still weight 0 of RS2. Moreover, LB1 is sending the packets to RS2 much
>    less than RS1.
>   (This strange behavior continues permanently. So I think the cause of 
>    the behavior isn't always in a retransmit process of TCP Layer.
>    In fact, the strange behavior stops when i stop the high load from CL1)

i applied "IP virtual server debugging" in "make menuconfig",
made kernel-2.6.9-22.EL, and then applied "net.ipv4.vs.debug_level=15"
in sysctl.conf and applied "kern.*  /var/log/kernel.log" in syslog.conf.
As far as checking kernel.log for LVS,  LB1 doesn't seems to be
loadbalancing to RS2 at this stage (4). ???
So the cause of the strange behavior can't deny to be related with
the retransmit process etc of TCP Layer.

Checking by "ipvsadm -Lc", there are many TIME_WAIT states,
it seems that InActConn number is reflected them.
By the way, refering to ip_vs source code (ip_vs_proto_tcp.c),
IP_VS_TCP_S_TIME_WAIT is 2*60*HZ.
When i changed IP_VS_TCP_S_TIME_WAIT 2*60*HZ to 10*Hz etc (much smaller
than 2*60*Hz), i think it seems to be improved the strange behavior. 

Is IP_VS_TCP_S_TIME_WAIT related with the cause of the trouble ?
i think some timers in LVS are related with the behavior ...??

> 
> (5)After checking this strage behavior for a while, change weight 0 to 1 by
>    executing manually "ipvsadm -e -t 192.168.0.101:80 -r 192.168.1.2:80 -w 1 
> -m".
>    But the strange behaivor still continues eternally, LB1 is sending the 
> packets
>    to RS2 much less than RS1 in spite of Round-Robin.

Certainly, there are two cases.
One case is that LB1 is sending the packets to RS2 much less than RS1,
the other case is  that LB1 is not sending the packets to RS2 at all.
In fact, these cases are also same in (4).

> 
> (6)Then stop all high load (while_wget & while_ab) from CL1, and wait for a 
> few
>    minutes by becoming to be close to 0 about ActiveConns + InActiveConns.
>    And start a new high load from CL1 by while_wget & while_ab, then
>    LB1 is correctly and evenly loadbalancing to RS1 and RS2 as same as (1)
> 
> Is this trouble related with some timers, u_threshold, dest->flag
> (IP_VS_DEST_F_OVERLOAD/IP_VS_DEST_F_AVAILABLE) etc in LVS ?
> Is this strange behavior correct for the specification of LVS ?
> (I can't understand the specification of LVS in detail.)
> Is there anything about how to cope with this trouble ?
> 
> I'm sorry for my many questions.
> If you have information or hints etc about the trouble,
> Would you please teach me about them ?
> 

Thanks in advance.
Best regards,

--
Hideaki Kondo




<Prev in Thread] Current Thread [Next in Thread>