| 
 
Hi
I have been trying for a few weeks now to make LVS work, but I think I 
am missing something fundamental.  I'll try to go over the setup as 
briefly as possible.. 
Please note: IP addresses have been obscured for security reasons.  The 
real addresses are routable. 
We have a load balancer (with the lvs kernel stuff) at 100.1.1.1, with a 
second IP address 100.1.1.2. 
We have two mail servers at 120.1.1.1 and 120.1.1.2.  The load balancer 
is supposed to balance connections between the two mailservers.  We have 
another load balancer at 130.1.1.1 which works fine, but the new load 
balancer is set up seemingly the same and yet it just does not work. 
Load balancer configuration (100.1.1.2)
===========================
net.ipv4.conf.all.forwarding = 1
eth0 has 100.1.1.1, eth0:0 has 100.1.1.2
# ipvsadm --list -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
 -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  100.1.1.2:25 wlc
 -> 120.1.1.1:25            Tunnel  1      0          0
 -> 120.1.1.2:25            Tunnel  1      0          0
iptables has no rules and is default-to-accept.  There is no firewall in 
front of the box. 
Mail server 1 (120.1.1.1)
=================
relevant iptables rules:
$IPTABLES -A INPUT -i eth0 -s 100.1.1.2 -p ipencap -j ACCEPT
$IPTABLES -A INPUT -i tunl0 -p tcp --dport smtp -j ACCEPT
Mail daemon listening on all IPs:
# netstat -natp |grep TEN |grep 25
tcp        0      0 0.0.0.0:25              0.0.0.0:*               
LISTEN     14505/exim4 
tunl0:0 is the tunnel interface for the existing load balancer (that works)
# ifconfig tunl0:0
tunl0:0   Link encap:IPIP Tunnel  HWaddr
         inet addr:130.1.1.2  Mask:255.255.255.255
         UP RUNNING NOARP  MTU:1480  Metric:1
tunl0:3 is the tunnel interface for the new load balancer that doesn't work
# ifconfig tunl0:3
tunl0:3   Link encap:IPIP Tunnel  HWaddr
         inet addr:100.1.1.2  Mask:255.255.255.255
         UP RUNNING NOARP  MTU:1480  Metric:1
Mail server 2 (120.1.1.2)
=================
Same as mailserver 1
The current load balancer at 130.1.1.1 uses 130.1.1.2 for load balancing 
inbound smtp/25 connections to the two mailservers.  If i telnet to 
130.1.1.2 from my work machine at 140.1.1.1, this is the tcpdump sequence:
load 1
10:28:34.904382 IP 140.1.1.1.3948 > 130.1.1.2.25: S 
3712043866:3712043866(0) win 65535 <mss 1260,nop,nop,sackOK> 
10:28:34.909107 IP 140.1.1.1.3948 > 130.1.1.2.25: . ack 151923593 win 65535
10:28:55.134362 IP 140.1.1.1.3948 > 130.1.1.2.25: . ack 52 win 65484
mail 2 eth0  (this could be mail 1 here, in the example the connection 
was passed to mail 2) 
10:28:34.583491 IP 130.1.1.2.25 > 140.1.1.1.3948: S 
151923592:151923592(0) ack 3712043867 win 5840 <mss 1460,nop,nop,sackOK> 
10:28:54.608731 IP 130.1.1.2.25 > 140.1.1.1.3948: P 1:52(51) ack 1 win 5840
mail 2 tunl0
10:28:34.583459 IP 140.1.1.1.3948 > 130.1.1.2.25: S 
3712043866:3712043866(0) win 65535 <mss 1260,nop,nop,sackOK> 
10:28:34.588206 IP 140.1.1.1.3948 > 130.1.1.2.25: . ack 151923593 win 65535
10:28:54.813191 IP 140.1.1.1.3948 > 130.1.1.2.25: . ack 52 win 65484
This tcpdump shows a full tcp connection via the working load balancer 
to one of the mail servers, in this case mail 2.  The SMTP servers are 
configured to pause for 20 seconds before showing their banner, which 
accounts for the delay between the packets. 
Now, if I try the same thing but telnet to 100.1.1.2:25 (the new load 
balancer), the connection times out.  tcpdumps show: 
load balancer
=========
11:01:48.231327 IP 140.1.1.1.4042 > 100.1.1.2.25: S 
1821058469:1821058469(0) win 65535 <mss 1260,nop,nop,sackOK>
11:01:51.195252 IP 140.1.1.1.4042 > 100.1.1.2.25: S 
1821058469:1821058469(0) win 65535 <mss 1260,nop,nop,sackOK>
11:01:57.230423 IP 140.1.1.1.4042 > 100.1.1.2.25: S 
1821058469:1821058469(0) win 65535 <mss 1260,nop,nop,sackOK> 
Both of the mail servers show no traffic whatsoever on eth0 or the 
tunnel interface. 
On the load balancer, /proc/sys/net/ipv4/vs/debug_level is set to 9 and 
the follow messages were observed in syslog: 
Mar 29 11:01:48 dev1 kernel: IPVS: lookup/in TCP 
140.1.1.1:4042->100.1.1.2:25 not hit
Mar 29 11:01:48 dev1 kernel: IPVS: lookup service: fwm 0 TCP 
100.1.1.2:25 hit 
Mar 29 11:01:48 dev1 kernel: IPVS: ip_vs_wlc_schedule(): Scheduling...
Mar 29 11:01:48 dev1 kernel: IPVS: WLC: server 120.1.1.1:25 activeconns 
0 refcnt 1 weight 1 overhead 0
Mar 29 11:01:48 dev1 kernel: IPVS: Bind-dest TCP c:140.1.1.1:4042 
v:100.1.1.2:25 d:120.1.1.1:25 fwd:T s:0 conn->flags:182 conn->refcnt:1 
dest->refcnt:2
Mar 29 11:01:48 dev1 kernel: IPVS: Schedule fwd:T c:140.1.1.1:4042 
v:100.1.1.2:25 d:120.1.1.1:25 conn->flags:1C2 conn->refcnt:2
Mar 29 11:01:48 dev1 kernel: IPVS: TCP input  [S...] 
120.1.1.1:25->140.1.1.1:4042 state: NONE->SYN_RECV conn->refcnt:2
Mar 29 11:01:51 dev1 kernel: IPVS: lookup/in TCP 
140.1.1.1:4042->100.1.1.2:25 hit
Mar 29 11:01:57 dev1 kernel: IPVS: lookup/in TCP 
140.1.1.1:4042->100.1.1.2:25 hit
Mar 29 11:02:04 dev1 kernel: IPVS: Unbind-dest TCP c:140.1.1.1:4039 
v:100.1.1.2:25 d:120.1.1.2:25 fwd:T s:3 conn->flags:182 conn->refcnt:1 
dest->refcnt:2 
I really am at a loss as to why this doesn't work, the debug log seems 
to show IPVS passing traffic to mail 1 (120.1.1.1) however the tcpdump 
for that server shows absolutely nothing.  If anyone can point me in the 
right direction here I would be very grateful. 
Thanks,
--
Mark Wadham
e: mark.wadham@xxxxxxxxx t: +44 (0)20 8315 5800 f: +44 (0)20 8315 5801
Areti Internet Ltd., http://www.areti.net/ 
===================================================================
Areti Internet Ltd: BS EN ISO 9001:2000
Providing corporate Internet solutions for more than 10 years.
===================================================================
 |