Hi
Sorry about that :)
Thanks for doing it correctly ;).
DGW is what? I guess I need to mention something explicitly at this
DGW = default gateway
point. I don't have anything downloaded directly from
linuxvirtualserver.org. The Red Hat kernel comes with ipvs modules and
source for ipvsadm. I built/installed ipvsadm from the Red Hat kernel
source package. I put rules of in /etc/sysconfig/ipvsadm and use the
/etc/init.d/ipvsadm script to start/stop/restart. So it's possible that
there are configuration files that the lvs.org stuff use and RH does not.
Hmm, I actually don't know what kind of other patches the RH kernel
includes. This could get hard to debug if you're not using a vanilla
kernel. Most of us LVS guys are only familiar with the raw kernel
sources from kernel.org for good reasons. You'll see further down, why
it can be a problem.
I ran this command, and did two telnets, no packets showed up
tcpdump -i eth1 dst host 192.168.10.18
Damn, and the ipvsadm output clearly showed two '1's in the InActConn
row! We're getting closer to really need to debug the LVS flow on your box.
In contrast, I did the same thing with 192.168.10.17, and saw this
13:53:09.656311 <cip>.33136 > 192.168.10.17.5222: S
559431180:559431180(0) win 5840 <mss 1460,sackOK,timestamp 27710578
0,nop,wscale 0> (DF) [tos 0x10]
13:53:09.684493 <cip>.33136 > 192.168.10.17.5222: . ack 577373375 win
5840 <nop,nop,timestamp 27710591 48127857> (DF) [tos 0x10]
13:53:12.715573 <cip>.33136 > 192.168.10.17.5222: P 0:5(5) ack 1 win
5840 <nop,nop,timestamp 27712145 48127857> (DF) [tos 0x10]
13:53:12.746126 <cip>.33136 > 192.168.10.17.5222: . ack 41 win 5840
<nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]
13:53:12.748959 <cip>.33136 > 192.168.10.17.5222: F 5:5(0) ack 42 win
5840 <nop,nop,timestamp 27712159 48128163> (DF) [tos 0x10]
This looks perfectly ok. Now why doesn't it work for ~.18 while the
kernel sets InActConn to '1' even for this RS?
I replaced my client IP address with <cip> in the above output listing.
So for grins, I ran `tcpdump dst port 5222` to see if I get incoming
packets on the director regardless of which RS is up in the round-robin.
The packets arrive just fine, they just aren't being forwarded for some
reason.
Now there could be quite a lot of reasons why this occurs:
o wrong routing
o unfortunate proc-fs settings for:
- rp_filter
- medium_id
- proxy_arp
- whatnot
That seems resonable. One thing I wonder is how the routing table looks
like on the director and the RS. If you could provide us with those and
^^^^
Please also submit those outputs for the RS.
Due to paranoia, I'm cutting out most IPs, but not in a manner that
looses track of which ones are unique from others, so hopefully this
will still give what you wanted.
I'm not so sure what you mean by paranoia, I've already got almost all
of your addresses :).
[root@tetsuo root]# ip route show table main
<public network/mask> dev eth0 scope link
I'm confused because you cut out this entry while below you leave the
entry? They should be the same IP, right?
192.168.10.0/24 dev eth1 scope link
127.0.0.0/8 dev lo scope link
default via <default public gateway> dev eth0
[root@tetsuo root]# ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
link/ether 00:02:b3:b9:f8:c6 brd ff:ff:ff:ff:ff:ff
inet 66.150.129.229/27 brd <public bcast ip> scope global eth0
[snipped away public IP addresses]
You mean you have other public IPs (not the VIP) which will get port
forwarded with a 1:1 NAT to the assigned RS?
yes. each of eth0:1-13 has a unique private IP. Here's where it gets
Huh? Your 'ip addr ... | sed ...' output mentioned them being public!
fun. I don't actually have enough machines to fill each uniquely, so
plenty of the private machines also have aliased interfaces. The two
real servers in question (192.168.10.17 and 18) are separate interfaces
(not aliases, but eth0 and eth1) on the same machine. Before you ask why
Dohhhhhh! This is probably the most important information. Either I'm
too dumb or you really did nowhere indicate that you only run _1_ RS
with different NICs. This changes the whole situation. Please send me
the routing and the NIC configuration of the RS.
Also, how did you connect those two NICs to the director? Over a switch
where the director is connected too?
I would want to do this. I'm working on a proof of concept and don't
have the resources to spread it all out properly, and this is actually
an intended production scenario for the customer.
Just a note in advance: If you're trying to make out for a possibly
badly written service that tends to crash a lot with forking off several
copies listening on different RS and load balance them to minimize the
impact, you're on an extremely wrong path. If not, forget those lines.
I haven't had to configure any of the machines behind the LAN with
anything related specifically to LVS. So I'm a little lost here. And
Yes, but nevertheless, the routing for example has to be correct.
like I mentioned, both the RIP's go to the same machine (different
NICs). I have checked with tcpdump on the director and RS that the
packets aren't going to 17 when they should be going to 18. I see
outbound packets for 17 every successfull attempt, and no packets on
unsuccessful attempts. And I do have two server processes running, each
bound explicitly to a single RIP.
Fair enough. I need to look at your routing tables of the RS but if
possible without the sed-paranoia.
I mean the '-A -t ip:port -s wlc' and '-a -t ip:port -r ip:port -m'
I thought you said wrr in the beginning?
lines my /etc/sysconfig/ipvsadm file. The init script uses the ipvsadm
restore and save (just like ipchains and iptables have) to load rules
from this file.
Ok.
Everything except this load balancing has been working, and continues to
work. Up until now, I've just been using ipvsadm to NAT all the aliased
interfaces 1:1. (which is where the "using lvs all wrong" thing came in,
since I could probably do it all the 1:1 forwarding pretty easily with
iptables) I only just now started to try load balancing and am having
this problem. :)
I have not the faintest clue what you mean by using ipvsadm to NAT all
the aliased interfaces 1:1. But this has no relevance to the problem anyway.
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
|