Re: having trouble with load balancing

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	Re: having trouble with load balancing
From:	Justin Georgeson <jgeorgeson@xxxxxxxxxxxxxxx>
Date:	Tue, 12 Nov 2002 14:03:44 -0600


Roberto Nibali wrote:

Hello,

Could you do me a small favour? Do not top post, please. I do get a lot
of emails and it is just a lot more convenient the other way, because
then I see immediately what issue we've solved and what is remaining


Sorry about that :)


Justin Georgeson wrote:

> If I understand all the VIP/RIP/CIP, than yes, that is the VIP. Those


And the DGW of the RS point to the director, right?

DGW is what? I guess I need to mention something explicitly at thispoint. I don't have anything downloaded directly fromlinuxvirtualserver.org. The Red Hat kernel comes with ipvs modules andsource for ipvsadm. I built/installed ipvsadm from the Red Hat kernelsource package. I put rules of in /etc/sysconfig/ipvsadm and use the/etc/init.d/ipvsadm script to start/stop/restart. So it's possible thatthere are configuration files that the lvs.org stuff use and RH does not.


> two telnet commands should both work, if you sit and try it over and
> over, it will succeed every other time. I'm not blocking it with IP


Ok, I just tried it and it indeed is just like you described it.

> tables. tcpdump on ~.18 shows no packets when coming in this way. The


Could you tcpdump on the outgoing (towards the private net) interface to
see if the packets are crafted correctly for both RSs? Please send it to
the list (should not be too long if we dump only for 2 connection
requests).



I ran this command, and did two telnets, no packets showed up

tcpdump -i eth1 dst host 192.168.10.18

In contrast, I did the same thing with 192.168.10.17, and saw this

13:53:09.656311 <cip>.33136 > 192.168.10.17.5222: S559431180:559431180(0) win 5840 <mss 1460,sackOK,timestamp 277105780,nop,wscale 0> (DF) [tos 0x10]13:53:09.684493 <cip>.33136 > 192.168.10.17.5222: . ack 577373375 win5840 <nop,nop,timestamp 27710591 48127857> (DF) [tos 0x10]13:53:12.715573 <cip>.33136 > 192.168.10.17.5222: P 0:5(5) ack 1 win5840 <nop,nop,timestamp 27712145 48127857> (DF) [tos 0x10]13:53:12.746126 <cip>.33136 > 192.168.10.17.5222: . ack 41 win 5840<nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]13:53:12.748959 <cip>.33136 > 192.168.10.17.5222: F 5:5(0) ack 42 win5840 <nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]

I replaced my client IP address with <cip> in the above output listing.So for grins, I ran `tcpdump dst port 5222` to see if I get incomingpackets on the director regardless of which RS is up in the round-robin.The packets arrive just fine, they just aren't being forwarded forsome reason.


> director has a dozen or so aliased interfaces (eth0:1-n). I bind those
> aliased interfaces to other public IPs and use LVS to NAT to particular


That seems resonable. One thing I wonder is how the routing table looks
like on the director and the RS. If you could provide us with those and
maybe the link configuration?

ip rule show
ip route show table main
ip addr show dev eth0

Due to paranoia, I'm cutting out most IPs, but not in a manner thatlooses track of which ones are unique from others, so hopefully thiswill still give what you wanted.


[root@tetsuo root]# ip rule show
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup 253
[root@tetsuo root]# ip route show table main
<public network/mask> dev eth0  scope link
192.168.10.0/24 dev eth1  scope link
127.0.0.0/8 dev lo  scope link
default via <default public gateway> dev eth0
[root@tetsuo root]# ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:02:b3:b9:f8:c6 brd ff:ff:ff:ff:ff:ff
    inet 66.150.129.229/27 brd <public bcast ip> scope global eth0

inet <public ip1/mask> brd <public bcast ip> scope global secondaryeth0:1inet <public ip2/mask> brd <public bcast ip> scope global secondaryeth0:2inet <public ip3/mask> brd <public bcast ip> scope global secondaryeth0:4inet <public ip4/mask> brd <public bcast ip> scope global secondaryeth0:5inet <public ip5/mask> brd <public bcast ip> scope global secondaryeth0:3inet <public ip6/mask> brd <public bcast ip> scope global secondaryeth0:6inet <public ip7/mask> brd <public bcast ip> scope global secondaryeth0:7inet <public ip8/mask> brd <public bcast ip> scope global secondary1eth0:8inet <public ip9/mask> brd <public bcast ip> scope global secondaryeth0:9inet <public ip10/mask> brd <public bcast ip> scope globalsecondary eth0:10inet <public ip11/mask> brd <public bcast ip> scope globalsecondary eth0:11inet <public ip12/mask> brd <public bcast ip> scope globalsecondary eth0:12inet <public ip13/mask> brd <public bcast ip> scope globalsecondary eth0:13


> machines on the private lan. So I can actually telnet directly to port
> 5222 on the IPs I have aliased for the two boxes in question. In this


You mean you have other public IPs (not the VIP) which will get port
forwarded with a 1:1 NAT to the assigned RS?

yes. each of eth0:1-13 has a unique private IP. Here's where it getsfun. I don't actually have enough machines to fill each uniquely, soplenty of the private machines also have aliased interfaces. The tworeal servers in question (192.168.10.17 and 18) are separate interfaces(not aliases, but eth0 and eth1) on the same machine. Before you ask whyI would want to do this. I'm working on a proof of concept and don'thave the resources to spread it all out properly, and this is actuallyan intended production scenario for the customer.


> particular case, I need to have one FQDN/IP to load balance between a
> couple of them.


It is not a DNS problem. And the FQDN is only for the VIP. You do not
want to people to connect to the RS directly anyway, so keep them on a
IP basis.

> After one connection
>
> TCP  66.150.129.229:5222 wrr
>   -> 192.162.10.18:5222           Masq    1      0          0
>   -> 192.168.10.17:5222           Masq    1      0          1
>
> After 2nd attempt (says Trying 66.150.129.229... then nothing, so I
> +)
>
> TCP  66.150.129.229:5222 wrr
>   -> 192.162.10.18:5222           Masq    1      0          1
>   -> 192.168.10.17:5222           Masq    1      0          1


Verified with telnet from here. This indicates to me that the second RS
is not set up the same way as the first one (routing issue, firewall
rules on the RS, VIP not correctly set or missing). Normally the above
indicates that the daemon somehow died in a select loop but didn't close
the listener. Since you mentioned that you can successfully connect to
both RS on port 5222 and do get a telnet prompt, we have to assume that
both daemons are working correctly.

I haven't had to configure any of the machines behind the LAN withanything related specifically to LVS. So I'm a little lost here. Andlike I mentioned, both the RIP's go to the same machine (differentNICs). I have checked with tcpdump on the director and RS that thepackets aren't going to 17 when they should be going to 18. I seeoutbound packets for 17 every successfull attempt, and no packets onunsuccessful attempts. And I do have two server processes running, eachbound explicitly to a single RIP.

> All of my ipvsadm rules are LVS-NAT, but they probably don't need tobe.
What rules? Do you mean setup?

I mean the '-A -t ip:port -s wlc' and '-a -t ip:port -r ip:port -m'lines my /etc/sysconfig/ipvsadm file. The init script uses the ipvsadmrestore and save (just like ipchains and iptables have) to load rulesfrom this file.


> I'm fully prepared to accept that I'm using lvs all wrong, but so far
> it's been working for me. :) If there is a better configuration for me


What do you mean with '... so far it's been working for me ...'? Did it
work up to a certain point with this setup and layout and it stopped
working afterwards?

Everything except this load balancing has been working, and continues towork. Up until now, I've just been using ipvsadm to NAT all the aliasedinterfaces 1:1. (which is where the "using lvs all wrong" thing came in,since I could probably do it all the 1:1 forwarding pretty easily withiptables) I only just now started to try load balancing and am havingthis problem. :)


> to use, I'll certainly open to trying it.


Since for me everything indicates that your second RS is not configured
like the first one, we do not need to change the LVS configuration.

Regards,
Roberto Nibali, ratz



Ugh.

--
Justin Georgeson
UnBound Technologies, Inc.
http://www.unboundtech.com
Main   713.329.9330
Fax    713.460.4051
Mobile 512.789.1962

5295 Hollister Road
Houston, TX 77040
Real Applications using Real Wireless Intelligence(tm)

<Prev in Thread]	Current Thread	[Next in Thread>
having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Justin Georgeson <= Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Jeremy Kerr Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Julian Anastasov Re: having trouble with load balancing, Roberto Nibali

Previous by Date:	Re: having trouble with load balancing, Roberto Nibali
Next by Date:	Re: having trouble with load balancing, Justin Georgeson
Previous by Thread:	Re: having trouble with load balancing, Roberto Nibali
Next by Thread:	Re: having trouble with load balancing, Justin Georgeson
Indexes:	[Date] [Thread] [Top] [All Lists]