LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: having trouble with load balancing

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Re: having trouble with load balancing
From: Justin Georgeson <jgeorgeson@xxxxxxxxxxxxxxx>
Date: Tue, 12 Nov 2002 14:03:44 -0600

Roberto Nibali wrote:

Hello,

Could you do me a small favour? Do not top post, please. I do get a lot
of emails and it is just a lot more convenient the other way, because
then I see immediately what issue we've solved and what is remaining

Sorry about that :)


Justin Georgeson wrote:

> If I understand all the VIP/RIP/CIP, than yes, that is the VIP. Those


And the DGW of the RS point to the director, right?

DGW is what? I guess I need to mention something explicitly at this point. I don't have anything downloaded directly from linuxvirtualserver.org. The Red Hat kernel comes with ipvs modules and source for ipvsadm. I built/installed ipvsadm from the Red Hat kernel source package. I put rules of in /etc/sysconfig/ipvsadm and use the /etc/init.d/ipvsadm script to start/stop/restart. So it's possible that there are configuration files that the lvs.org stuff use and RH does not.


> two telnet commands should both work, if you sit and try it over and
> over, it will succeed every other time. I'm not blocking it with IP


Ok, I just tried it and it indeed is just like you described it.

> tables. tcpdump on ~.18 shows no packets when coming in this way. The


Could you tcpdump on the outgoing (towards the private net) interface to
see if the packets are crafted correctly for both RSs? Please send it to
the list (should not be too long if we dump only for 2 connection
requests).


I ran this command, and did two telnets, no packets showed up

tcpdump -i eth1 dst host 192.168.10.18

In contrast, I did the same thing with 192.168.10.17, and saw this

13:53:09.656311 <cip>.33136 > 192.168.10.17.5222: S 559431180:559431180(0) win 5840 <mss 1460,sackOK,timestamp 27710578 0,nop,wscale 0> (DF) [tos 0x10] 13:53:09.684493 <cip>.33136 > 192.168.10.17.5222: . ack 577373375 win 5840 <nop,nop,timestamp 27710591 48127857> (DF) [tos 0x10] 13:53:12.715573 <cip>.33136 > 192.168.10.17.5222: P 0:5(5) ack 1 win 5840 <nop,nop,timestamp 27712145 48127857> (DF) [tos 0x10] 13:53:12.746126 <cip>.33136 > 192.168.10.17.5222: . ack 41 win 5840 <nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10] 13:53:12.748959 <cip>.33136 > 192.168.10.17.5222: F 5:5(0) ack 42 win 5840 <nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]

I replaced my client IP address with <cip> in the above output listing. So for grins, I ran `tcpdump dst port 5222` to see if I get incoming packets on the director regardless of which RS is up in the round-robin. The packets arrive just fine, they just aren't being forwarded for some reason.


> director has a dozen or so aliased interfaces (eth0:1-n). I bind those
> aliased interfaces to other public IPs and use LVS to NAT to particular


That seems resonable. One thing I wonder is how the routing table looks
like on the director and the RS. If you could provide us with those and
maybe the link configuration?

ip rule show
ip route show table main
ip addr show dev eth0

Due to paranoia, I'm cutting out most IPs, but not in a manner that looses track of which ones are unique from others, so hopefully this will still give what you wanted.

[root@tetsuo root]# ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup 253
[root@tetsuo root]# ip route show table main
<public network/mask> dev eth0  scope link
192.168.10.0/24 dev eth1  scope link
127.0.0.0/8 dev lo  scope link
default via <default public gateway> dev eth0
[root@tetsuo root]# ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:02:b3:b9:f8:c6 brd ff:ff:ff:ff:ff:ff
    inet 66.150.129.229/27 brd <public bcast ip> scope global eth0
inet <public ip1/mask> brd <public bcast ip> scope global secondary eth0:1 inet <public ip2/mask> brd <public bcast ip> scope global secondary eth0:2 inet <public ip3/mask> brd <public bcast ip> scope global secondary eth0:4 inet <public ip4/mask> brd <public bcast ip> scope global secondary eth0:5 inet <public ip5/mask> brd <public bcast ip> scope global secondary eth0:3 inet <public ip6/mask> brd <public bcast ip> scope global secondary eth0:6 inet <public ip7/mask> brd <public bcast ip> scope global secondary eth0:7 inet <public ip8/mask> brd <public bcast ip> scope global secondary 1eth0:8 inet <public ip9/mask> brd <public bcast ip> scope global secondary eth0:9 inet <public ip10/mask> brd <public bcast ip> scope global secondary eth0:10 inet <public ip11/mask> brd <public bcast ip> scope global secondary eth0:11 inet <public ip12/mask> brd <public bcast ip> scope global secondary eth0:12 inet <public ip13/mask> brd <public bcast ip> scope global secondary eth0:13


> machines on the private lan. So I can actually telnet directly to port
> 5222 on the IPs I have aliased for the two boxes in question. In this


You mean you have other public IPs (not the VIP) which will get port
forwarded with a 1:1 NAT to the assigned RS?

yes. each of eth0:1-13 has a unique private IP. Here's where it gets fun. I don't actually have enough machines to fill each uniquely, so plenty of the private machines also have aliased interfaces. The two real servers in question (192.168.10.17 and 18) are separate interfaces (not aliases, but eth0 and eth1) on the same machine. Before you ask why I would want to do this. I'm working on a proof of concept and don't have the resources to spread it all out properly, and this is actually an intended production scenario for the customer.


> particular case, I need to have one FQDN/IP to load balance between a
> couple of them.


It is not a DNS problem. And the FQDN is only for the VIP. You do not
want to people to connect to the RS directly anyway, so keep them on a
IP basis.

> After one connection
>
> TCP  66.150.129.229:5222 wrr
>   -> 192.162.10.18:5222           Masq    1      0          0
>   -> 192.168.10.17:5222           Masq    1      0          1
>
> After 2nd attempt (says Trying 66.150.129.229... then nothing, so I
> +)
>
> TCP  66.150.129.229:5222 wrr
>   -> 192.162.10.18:5222           Masq    1      0          1
>   -> 192.168.10.17:5222           Masq    1      0          1


Verified with telnet from here. This indicates to me that the second RS
is not set up the same way as the first one (routing issue, firewall
rules on the RS, VIP not correctly set or missing). Normally the above
indicates that the daemon somehow died in a select loop but didn't close
the listener. Since you mentioned that you can successfully connect to
both RS on port 5222 and do get a telnet prompt, we have to assume that
both daemons are working correctly.

I haven't had to configure any of the machines behind the LAN with anything related specifically to LVS. So I'm a little lost here. And like I mentioned, both the RIP's go to the same machine (different NICs). I have checked with tcpdump on the director and RS that the packets aren't going to 17 when they should be going to 18. I see outbound packets for 17 every successfull attempt, and no packets on unsuccessful attempts. And I do have two server processes running, each bound explicitly to a single RIP.


> All of my ipvsadm rules are LVS-NAT, but they probably don't need to be.


What rules? Do you mean setup?

I mean the '-A -t ip:port -s wlc' and '-a -t ip:port -r ip:port -m' lines my /etc/sysconfig/ipvsadm file. The init script uses the ipvsadm restore and save (just like ipchains and iptables have) to load rules from this file.


> I'm fully prepared to accept that I'm using lvs all wrong, but so far
> it's been working for me. :) If there is a better configuration for me


What do you mean with '... so far it's been working for me ...'? Did it
work up to a certain point with this setup and layout and it stopped
working afterwards?

Everything except this load balancing has been working, and continues to work. Up until now, I've just been using ipvsadm to NAT all the aliased interfaces 1:1. (which is where the "using lvs all wrong" thing came in, since I could probably do it all the 1:1 forwarding pretty easily with iptables) I only just now started to try load balancing and am having this problem. :)


> to use, I'll certainly open to trying it.


Since for me everything indicates that your second RS is not configured
like the first one, we do not need to change the LVS configuration.

Regards,
Roberto Nibali, ratz


Ugh.

--
Justin Georgeson
UnBound Technologies, Inc.
http://www.unboundtech.com
Main   713.329.9330
Fax    713.460.4051
Mobile 512.789.1962

5295 Hollister Road
Houston, TX 77040
Real Applications using Real Wireless Intelligence(tm)



<Prev in Thread] Current Thread [Next in Thread>