LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

ARP problem persists on 2.6.5 kernel (gentoo)

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: ARP problem persists on 2.6.5 kernel (gentoo)
From: Todd Lyons <tlyons@xxxxxxxxxx>
Date: Tue, 29 Jun 2004 08:43:39 -0700
I have a system that I'm working on that doesn't seem to quite do what
I'm expecting.  I'll lay out the system with obfuscated external IP
addresses.  It's not that it's super secret, I'm just paranoid enough
that I don't like handing out too much info about the internal
configuration of the network.  We're using LVS-DR.

The problem that we are having seems to be with the pop and imap
services being load balanced across the same set of machines.  Here's
how the system is laid out, then I'll get more specific with my question
at the end.

We're load balancing smtp across 2 machines (sendmail), pop and imap
across 2 machines (courier-imap), and www across 2 machines (apache).
The issue I have is that the webmail box uses imap for authentication so
we want it to access the VIP'd (load balanced) external address rather
than specifying the RIP's (would defeat some of the goals of load
balancing).

All machines are Gentoo boxen with a 2.6.5 kernel.

The configuration on the director is as follows:
miniip root # ipvsadm --list -n
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  64.xxx.xxx.31:25 rr
  -> 10.1.1.240:25                Route   1      0          0         
  -> 10.1.1.241:25                Route   1      0          0         
TCP  64.xxx.xxx.32:110 rr
  -> 10.1.1.242:110               Route   1      0          0         
  -> 10.1.1.243:110               Route   1      0          0         
TCP  64.xxx.xxx.34:80 rr
  -> 10.1.1.245:80                Route   1      0          0         
  -> 10.1.1.244:80                Route   1      0          0         
TCP  64.xxx.xxx.33:143 rr
  -> 10.1.1.242:143               Route   1      0          0         
  -> 10.1.1.243:143               Route   1      0          0         

miniip root # ifconfig 2>&1 | egrep -v "(RX|TX|collisions)"
eth0      Link encap:Ethernet  HWaddr 00:90:27:E0:1E:81  
          inet addr:64.xxx.xxx.6  Bcast:64.14.201.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth0:1    Link encap:Ethernet  HWaddr 00:90:27:E0:1E:81  
          inet addr:64.xxx.xxx.33  Bcast:64.14.201.255  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth0:2    Link encap:Ethernet  HWaddr 00:90:27:E0:1E:81  
          inet addr:64.xxx.xxx.32  Bcast:64.14.201.255  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth0:3    Link encap:Ethernet  HWaddr 00:90:27:E0:1E:81  
          inet addr:64.xxx.xxx.31  Bcast:64.14.201.255  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth0:4    Link encap:Ethernet  HWaddr 00:90:27:E0:1E:81  
          inet addr:64.xxx.xxx.34  Bcast:64.14.201.255  Mask:255.255.255.255
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

eth1      Link encap:Ethernet  HWaddr 00:50:DA:7C:54:F9  
          inet addr:10.1.1.15  Bcast:10.1.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:18 Base address:0x1080 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1

The 64.xxx.xxx.6 IP address is not part of the load balanced system.  We
have not put static arp entries in the router for these load balanced
IP's.  We would like to not have to do that, but if it becomes
necessary, then we will (but we think it should not be necessary).

Here is what the pop and imap realservers look like:
mail1 root # ifconfig
eth0      Link encap:Ethernet  HWaddr 00:0B:DB:95:1B:50  
          inet addr:10.1.1.242  Bcast:10.255.255.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:737079 errors:0 dropped:0 overruns:0 frame:0
          TX packets:697588 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:143763814 (137.1 Mb)  TX bytes:135652668 (129.3 Mb)
          Interrupt:16 Memory:fcf30000-fcf40000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING NOARP  MTU:16436  Metric:1
          RX packets:822 errors:0 dropped:0 overruns:0 frame:0
          TX packets:822 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:66285 (64.7 Kb)  TX bytes:66285 (64.7 Kb)

lo:0      Link encap:Local Loopback  
          inet addr:64.xxx.xxx.32  Mask:255.255.255.255
          UP LOOPBACK RUNNING NOARP  MTU:16436  Metric:1
          RX packets:822 errors:0 dropped:0 overruns:0 frame:0
          TX packets:822 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:66285 (64.7 Kb)  TX bytes:66285 (64.7 Kb)

lo:1      Link encap:Local Loopback  
          inet addr:64.xxx.xxx.33  Mask:255.255.255.255
          UP LOOPBACK RUNNING NOARP  MTU:16436  Metric:1
          RX packets:822 errors:0 dropped:0 overruns:0 frame:0
          TX packets:822 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:66285 (64.7 Kb)  TX bytes:66285 (64.7 Kb)

To try and avoid the arp problem:
echo '2' > /proc/sys/net/ipv4/conf/lo/arp_announce
echo '1' > /proc/sys/net/ipv4/conf/lo/arp_ignore
ifconfig lo -arp

mail1 root # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
64.xxx.xxx.34   10.1.1.15       255.255.255.255 UGH   0      0        0 eth0
64.xxx.xxx.31   10.1.1.15       255.255.255.255 UGH   0      0        0 eth0
10.1.1.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
127.0.0.0       127.0.0.1       255.0.0.0       UG    0      0        0 lo
0.0.0.0         10.1.1.1        0.0.0.0         UG    0      0        0 eth0

Note that we had to add specific routes for the other LVS'd services.  I
do not know why we had to do this, but I know that it started working
consistently to/from the smtp and www realservers.

The www and smtp realservers look similar except for IP addresses and
that they each only have 1 VIP.

In summary we're load balancing 64.xxx.xxx.31 across two smtp machines,
64.xxx.xxx.32 and 64.xxx.xxx.33 across two pop/imap machines, and
64.xxx.xxx.34 across two www machines.  I can draw an ASCII picture of
this if necessary, though it will be tight.

Question:  Load balancing across the www and smtp machines works great
from the outside AND from the other load balanced machines.  Load
balancing across the pop/imap machines works fine from the outside,
inconsistently from the other load balanced machines, and never from my
workstation.  Can anyone explain why?  Can anyone suggest a fix?  Is
there any more information required to try and pinpoint this problem?

I just ssh'd to the two imap machines and ran tests at the time of this
message.  It load balanced incoming imap requests from an external IP,
and from one of the www boxen, but did NOT load balance from my
workstation (a 192.168.100.* address, across two Cisco routers).  Using
pop, it load balanced properly for both external and internal IP
addresses.  Repeating the test with imap, it only worked properly from
the external IP and the www box (but not from my workstation).

I have a theory that load balancing two different services across the
same two machines is causing the arp issue that (I think) I am seeing.
Any comments and suggestions would be appreciated.
-- 
Regards...              Todd
They that can give up essential liberty to obtain a little temporary 
safety deserve neither liberty nor safety.       --Benjamin Franklin
Linux kernel 2.6.3-8mdkenterprise   2 users,  load average: 0.02, 0.03, 0.00
<Prev in Thread] Current Thread [Next in Thread>