Re: having trouble with load balancing

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	Re: having trouble with load balancing
From:	Roberto Nibali <ratz@xxxxxxxxxxxx>
Date:	Tue, 12 Nov 2002 22:02:35 +0100

Hi

Sorry about that :)


Thanks for doing it correctly ;).

DGW is what? I guess I need to mention something explicitly at this


DGW = default gateway

point. I don't have anything downloaded directly fromlinuxvirtualserver.org. The Red Hat kernel comes with ipvs modules andsource for ipvsadm. I built/installed ipvsadm from the Red Hat kernelsource package. I put rules of in /etc/sysconfig/ipvsadm and use the/etc/init.d/ipvsadm script to start/stop/restart. So it's possible thatthere are configuration files that the lvs.org stuff use and RH does not.

Hmm, I actually don't know what kind of other patches the RH kernelincludes. This could get hard to debug if you're not using a vanillakernel. Most of us LVS guys are only familiar with the raw kernelsources from kernel.org for good reasons. You'll see further down, whyit can be a problem.

I ran this command, and did two telnets, no packets showed up

tcpdump -i eth1 dst host 192.168.10.18

Damn, and the ipvsadm output clearly showed two '1's in the InActConnrow! We're getting closer to really need to debug the LVS flow on your box.

In contrast, I did the same thing with 192.168.10.17, and saw this
13:53:09.656311 <cip>.33136 > 192.168.10.17.5222: S559431180:559431180(0) win 5840 <mss 1460,sackOK,timestamp 277105780,nop,wscale 0> (DF) [tos 0x10]13:53:09.684493 <cip>.33136 > 192.168.10.17.5222: . ack 577373375 win5840 <nop,nop,timestamp 27710591 48127857> (DF) [tos 0x10]13:53:12.715573 <cip>.33136 > 192.168.10.17.5222: P 0:5(5) ack 1 win5840 <nop,nop,timestamp 27712145 48127857> (DF) [tos 0x10]13:53:12.746126 <cip>.33136 > 192.168.10.17.5222: . ack 41 win 5840<nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]13:53:12.748959 <cip>.33136 > 192.168.10.17.5222: F 5:5(0) ack 42 win5840 <nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]

This looks perfectly ok. Now why doesn't it work for ~.18 while thekernel sets InActConn to '1' even for this RS?

I replaced my client IP address with <cip> in the above output listing.So for grins, I ran `tcpdump dst port 5222` to see if I get incomingpackets on the director regardless of which RS is up in the round-robin.The packets arrive just fine, they just aren't being forwarded for somereason.


Now there could be quite a lot of reasons why this occurs:

o wrong routing
o unfortunate proc-fs settings for:
  - rp_filter
  - medium_id
  - proxy_arp
  - whatnot

That seems resonable. One thing I wonder is how the routing table looks
like on the director and the RS. If you could provide us with those and

                               ^^^^
Please also submit those outputs for the RS.

Due to paranoia, I'm cutting out most IPs, but not in a manner thatlooses track of which ones are unique from others, so hopefully thiswill still give what you wanted.

I'm not so sure what you mean by paranoia, I've already got almost allof your addresses :).

[root@tetsuo root]# ip route show table main
<public network/mask> dev eth0  scope link

I'm confused because you cut out this entry while below you leave theentry? They should be the same IP, right?

192.168.10.0/24 dev eth1  scope link
127.0.0.0/8 dev lo  scope link
default via <default public gateway> dev eth0
[root@tetsuo root]# ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
    link/ether 00:02:b3:b9:f8:c6 brd ff:ff:ff:ff:ff:ff
    inet 66.150.129.229/27 brd <public bcast ip> scope global eth0


[snipped away public IP addresses]

You mean you have other public IPs (not the VIP) which will get port
forwarded with a 1:1 NAT to the assigned RS?
yes. each of eth0:1-13 has a unique private IP. Here's where it gets


Huh? Your 'ip addr ... | sed ...' output mentioned them being public!

fun. I don't actually have enough machines to fill each uniquely, soplenty of the private machines also have aliased interfaces. The tworeal servers in question (192.168.10.17 and 18) are separate interfaces(not aliases, but eth0 and eth1) on the same machine. Before you ask why

Dohhhhhh! This is probably the most important information. Either I'mtoo dumb or you really did nowhere indicate that you only run _1_ RSwith different NICs. This changes the whole situation. Please send methe routing and the NIC configuration of the RS.

Also, how did you connect those two NICs to the director? Over a switchwhere the director is connected too?

I would want to do this. I'm working on a proof of concept and don'thave the resources to spread it all out properly, and this is actuallyan intended production scenario for the customer.

Just a note in advance: If you're trying to make out for a possiblybadly written service that tends to crash a lot with forking off severalcopies listening on different RS and load balance them to minimize theimpact, you're on an extremely wrong path. If not, forget those lines.

I haven't had to configure any of the machines behind the LAN withanything related specifically to LVS. So I'm a little lost here. And


Yes, but nevertheless, the routing for example has to be correct.

like I mentioned, both the RIP's go to the same machine (differentNICs). I have checked with tcpdump on the director and RS that thepackets aren't going to 17 when they should be going to 18. I seeoutbound packets for 17 every successfull attempt, and no packets onunsuccessful attempts. And I do have two server processes running, eachbound explicitly to a single RIP.

Fair enough. I need to look at your routing tables of the RS but ifpossible without the sed-paranoia.

I mean the '-A -t ip:port -s wlc' and '-a -t ip:port -r ip:port -m'


I thought you said wrr in the beginning?

lines my /etc/sysconfig/ipvsadm file. The init script uses the ipvsadmrestore and save (just like ipchains and iptables have) to load rulesfrom this file.

Ok.

Everything except this load balancing has been working, and continues towork. Up until now, I've just been using ipvsadm to NAT all the aliasedinterfaces 1:1. (which is where the "using lvs all wrong" thing came in,since I could probably do it all the 1:1 forwarding pretty easily withiptables) I only just now started to try load balancing and am havingthis problem. :)

I have not the faintest clue what you mean by using ipvsadm to NAT allthe aliased interfaces 1:1. But this has no relevance to the problem anyway.


Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc

<Prev in Thread]	Current Thread	[Next in Thread>
Re: having trouble with load balancing, (continued) Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Joseph Mack Re: having trouble with load balancing, Roberto Nibali <= Re: having trouble with load balancing, Jeremy Kerr Re: having trouble with load balancing, Justin Georgeson Re: having trouble with load balancing, Roberto Nibali Re: having trouble with load balancing, Julian Anastasov Re: having trouble with load balancing, Roberto Nibali

Previous by Date:	Re: having trouble with load balancing, Joseph Mack
Next by Date:	Hosting, Steve Vernon
Previous by Thread:	Re: having trouble with load balancing, Joseph Mack
Next by Thread:	Re: having trouble with load balancing, Jeremy Kerr
Indexes:	[Date] [Thread] [Top] [All Lists]