Re: [lvs-users] ldirectord unbalance load distribution

To: " users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [lvs-users] ldirectord unbalance load distribution
From: Simon Horman <horms@xxxxxxxxxxxx>
Date: Sun, 27 Sep 2009 10:13:44 +1000
On Sun, Sep 27, 2009 at 02:56:53AM +0800, Liu Yan wrote:
> hi List,
> I have a 2-server heartbeat plus ldirectord setup to load balance my nginx
> services. I have the following setup in my specific case:
> -- both servers are used as node of heartbeat, and the real server (nginx)
> -- have 2 IPs bound to eth0 and eth0:0 (I have to HA these 2 IPs)
> -- lo:0 and lo:1 are used to stored the VIPs when the node is in standby
> mode
> -- using direct routing in the ldirectord
> I added the following section to /etc/ha.d/haresources so all the two IPs
> can be took over by the standby machine when I shutdown the active one, so
> this is basically working:
> s1 \
> \
> LVSSyncDaemonSwap::master \
> IPaddr2::VIP1/26/eth0/x.x.x.x \
> IPaddr2::VIP2/28/eth0:0/x.x.x.x
> When I use ipvsadm, I see something like this:
> Prot LocalAddress:Port Scheduler Flags
>   -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
> TCP  VIP1:http rr
>   -> RIP1-2:http           Route   1      174        0
>   -> RIP1-1:http            Local   1      2          13
> TCP  VIP2:http rr
>   -> RIP2-2:http            Route   1      712        0
>   -> RIP2-1:http            Local   1      20         45
> My questions are:
> 1) Are RIP1-1 and RIP2-1 being recognized as "Local" correct? RIP1-1 and
> RIP2-1 are the IPs of the active node.


> 2) Why the remote (Route) server is taking much more requests than the local
> one?

I suspect its an anomaly of the way that round-robin (rr) deals with
the pattern of connections you are receiving. Can you try least connected
(lc) instead?

> 3) Why the local server has a large number of "InActConn"?
> I tried to shutdown the nginx on the remote server and saw the RIP1-2 and
> RIP2-2 being removed from the above table correctly, and my service is still
> up, which means the RIP1-1 and RIP2-1 are serving properly.

This could indicate a time-out issue. Or it could just indicate that RIP1-1
and RIP2-1 handled some connections recently. It probably relates to
question 2), so could you try lc and see if that improves the situation?

