LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

LVS newbie needs help

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: LVS newbie needs help
From: John Cronin <jsc3@xxxxxxxxxxxxx>
Date: Tue, 10 Oct 2000 11:49:23 -0400 (EDT)
Hi,

I am trying to implement a Piranha/LVS cluster.  My company is putting up
a display for the upcoming Atlanta Linux Showcase (very upcoming, starts
Thursday).  One of the things we would like to demo is a simple load
balancing LVS web farm.  Configuration details are listed later.

We are having problems that I suspect may be related to arp (at least
some of them).  For example, some systems just disappear from the network
and I can't contact them, but they seem pretty much OK from the console
(both my directors and one real server are in that state now).  Other
times, when I am logged into, for example, realserver2, and I ssh into
realserver1, I find I am on realserver2 again.  In other words, when
I try to connect to realserver1 from realserver2, realserver2 honestly
thinks it is connecting me to realserver1, but in fact it connects me
to itself.

Here is the network configuration on realsrv2:

-----------------------------------------------------------------------------
[root@realsrv2 /]# ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:D0:B7:89:EA:99
          inet addr:192.168.11.202  Bcast:192.168.11.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:140621 errors:0 dropped:0 overruns:0 frame:0
          TX packets:127703 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:21 Base address:0x4000
 
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3924  Metric:1
          RX packets:26514 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26514 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
 
lo:1      Link encap:Local Loopback
          inet addr:192.168.11.200  Mask:255.255.255.254
          UP LOOPBACK RUNNING  MTU:3924  Metric:1 

[root@realsrv2 /]# netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
192.168.11.202  0.0.0.0         255.255.255.255 UH        0 0          0 eth0
192.168.11.0    0.0.0.0         255.255.255.0   U         0 0          0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         192.168.11.1    0.0.0.0         UG        0 0          0 eth0

[root@realsrv2 /]# nslookup realsrv2
Server:  firewall.als.transtech.cc
Address:  192.168.11.1
 
Non-authoritative answer:
Name:    realsrv2.als.transtech.cc
Address:  192.168.11.202
 
[root@realsrv2 /]# nslookup realsrv1
Server:  firewall.als.transtech.cc
Address:  192.168.11.1
 
Non-authoritative answer:
Name:    realsrv1.als.transtech.cc
Address:  192.168.11.201

[root@realsrv2 /]# ssh realsrv1
Last login: Tue Oct 10 11:20:42 2000 from realsrv1.als.transtech.cc
[root@realsrv2 /root]# hostname
realsrv2

[root@realsrv2 /root]# arp -a
firewall.als.transtech.cc (192.168.11.1) at 00:D0:B7:00:B6:5A [ether] on eth0
realsrv4.als.transtech.cc (192.168.11.204) at 00:D0:B7:1E:7B:32 [ether] on eth0
realsrv3.als.transtech.cc (192.168.11.203) at 00:D0:B7:89:F1:0F [ether] on eth0
[NOTE: no entry for realsrv1]
[root@realsrv2 /]# exit
Connection to realsrv1.als.transtech.cc closed.
[root@realsrv2 /]#
[root@realsrv2 /]# exit
exit
bash$ exit
logout
Connection to realsrv2.als.transtech.cc closed.
bash$ hostname
firewall
bash$ arp -a
? (192.168.1.1) at 00:A0:C5:E2:4F:F8 [ether] on eth0
dhcp-31.als.transtech.cc (192.168.10.31) at 08:00:46:05:5D:46 [ether] on eth2
realsrv4.als.transtech.cc (192.168.11.204) at 00:D0:B7:1E:7B:32 [ether] on eth1
realsrv1.als.transtech.cc (192.168.11.201) at 00:D0:B7:89:87:86 [ether] on eth1
realsrv3.als.transtech.cc (192.168.11.203) at 00:D0:B7:89:F1:0F [ether] on eth1
realsrv2.als.transtech.cc (192.168.11.202) at 00:D0:B7:89:EA:99 [ether] on eth1
bash$ ping -c 1 realsrv1
PING realsrv1.als.transtech.cc (192.168.11.201) from 192.168.11.1 : 56(84) 
bytes of data.
64 bytes from realsrv1.als.transtech.cc (192.168.11.201): icmp_seq=0 ttl=255 
time=0.2 ms
 
--- realsrv1.als.transtech.cc ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.2/0.2/0.2 ms
bash$ ssh realsrv1
jsc@xxxxxxxxxxxxxxxxxxxxxxxxx's password:
Last login: Wed Oct 11 00:21:22 2000 from firewall.als.transtech.cc
bash$ hostname
realsrv1
bash$sh$ arp -a
firewall.als.transtech.cc (192.168.11.1) at 00:D0:B7:00:B6:5A [ether] on eth0
bash$
[NOTE: nothing but the firewall in the arp cache]
[I try to ping the other realservers, and the primary LVS director]
[root@realsrv1 jsc]# arp -a
lvsd1.als.transtech.cc (192.168.11.222) at 00:D0:B7:00:CF:90 [ether] on eth0
firewall.als.transtech.cc (192.168.11.1) at 00:D0:B7:00:B6:5A [ether] on eth0
realsrv4.als.transtech.cc (192.168.11.204) at <incomplete> on eth0
realsrv2.als.transtech.cc (192.168.11.202) at <incomplete> on eth0
realsrv3.als.transtech.cc (192.168.11.203) at <incomplete> on eth0
[NOTE: I had somebody take down lo:1 on lvsd1]
----------------------------------------------------------------------------   

So, it appears that realsrv1 cannot connect to any of the other realservers,
and realsrv2 connects to itself when trying to connect to realsrv1.
When I "ifconfig lo:0 down" on realsrv2, then realsrv1 shows up in the
arp cache and I can connect to realsrv1 from realsrv2.  realsrv2's
arp entry also shows up in realsrv1's arp cache, and ping from realsrv1
to realsrv2 works.

Things get better between systems when I remove the VIP from lo:0, but of
course that defeats the whole purpose of things.   Everything leads me
to believe this is an arp problem - NAT from the firewall should not
be involved as all these connections are behind the firewall on the
same subnet.

Our configuration:
-----------------

I started with UltraMonkey, but the kernel patches were for an older
version of the kernel than I was already using, and I wanted a onestop
solution to (hopefully) make things easier.

I downloaded the Piranha software and related items (all of them) from
Keith Barrett's page (ftp://people.redhat.com/kbarrett):

README                            kernel-pcmcia-cs-2.2.16-4.i386.rpm
ha-installer.tar.gz               kernel-smp-2.2.16-4.i386.rpm
ipvsadm-1.11-4.i386.rpm           kernel-source-2.2.16-4.i386.rpm
ipvsadm-1.11-4.src.rpm            kernel-utils-2.2.16-4.i386.rpm
kernel-2.2.16-4.i386.rpm          piranha-0.4.17-2.i386.rpm
kernel-2.2.16-4.src.rpm           piranha-0.4.17-2.src.rpm
kernel-BOOT-2.2.16-4.i386.rpm     piranha-docs-0.4.17-2.i386.rpm
kernel-doc-2.2.16-4.i386.rpm      piranha-gui-0.4.17-2.i386.rpm
kernel-headers-2.2.16-4.i386.rpm  rpm-3.0.5-9.6x.i386.rpm
kernel-ibcs-2.2.16-4.i386.rpm                             

Actually, the RPM rpm is from linuxberg.com - I had to get that
in order to install the piranha* and ipvsadm* packages.  I also
had to install those two with "--nodeps" as they seemed to depend
upon each other.

The hardware:

We have two types of server systems:

        Intel ISP1100 1U server, P3-650, 256MB, two IDE disks, two Pro100+
                Ethernet interfaces.  Henceforth referred to as "1U".

        Intel ISP2<something> 2U server, dual P3-750, 512 MB, two SCSI disks,
                one Pro100+ Ethernet Interfaces.  Henceforth referred to as
                "2U".

We have one if the Intel 1Us as a firewall/router/NAT/DHCP-server.  It has
three network interfaces (we added a 3Com 3C905B to it):

        To the Internet, IP 192.168.1.221, currently attached to Netgear
        RT314 router which is connected to a cable-modem.  Irrelevant
        to this discussion, in my opinion.

        To the internal server network, IP 192.168.11.1 (class C subnet),
        which is the subnet the two directors and four real servers are
        connected to via an Intel 460T 10/100 switch.

        To our client network, IP 192.168.10.1, and serves DHCP out to
        this class C subnet.  We attach laptops here via an Intel 550T
        switch.  Clients are having no problems getting on the Internet
        from here.

        Software is Redhat Linux 6.2:

                bash$ uname -msrpv
                Linux 2.2.14-6.1 #1 Tue Mar 14 14:22:53 EST 2000 i686 unknown

        I think this system might have a hand-rolled kernel with some
        patches required by the VPN 

The firewall seems to be working fine, and we are fairly certain we have
NAT set up properly.

The two redundant directors are Intel 1Us.  We are currently only using
one interface, and direct routing.

Three of the real servers are Intel 2Us, as above, and one of them is
an Intel 1U, as above.

The OS is Redhat 6.2 with the kernel updates from Keith Barrett's page:

        From 2U real server:
        bash$ uname -msrpv
        Linux 2.2.16-4smp #1 SMP Tue Jun 20 16:00:57 EDT 2000 i686 unknown

        From 1U real server:

To handle the ARP problem with direct routing, we put the VIP on
lo:1 and run the following commands on both real servers and directors:

        echo 1 > /proc/sys/net/ipv4/conf/all/hidden
        echo 1 > /proc/sys/net/ipv4/conf/eth0/hidden

ssh is installed and configured to let the directors and realservers log
into each other as root without passwords.  I am not that happy about
this, but we are in a testing phase and behind two firewalls right now,
so I can live with it.

Here is the current /etc/lvs.cf:
------------------------------------------------------------------
primary = 192.168.11.222
service = lvs
rsh_command = ssh
backup_active = 1
backup = 192.168.11.223
heartbeat = 1
heartbeat_port = 539
keepalive = 6
deadtime = 18
network = direct
virtual lvsweb {
     active = 1
     address = 192.168.11.200 lo:1
     port = 80
     send = "GET / HTTP/1.0\r\n\r\n"
     expect = "HTTP"
     load_monitor = uptime
     scheduler = wrr
     protocol = tcp
     timeout = 6
     reentry = 15
     server RealWeb1 {
         address = 192.168.11.201
         active = 1
         weight = 2
     }
     server RealWeb2 {
         address = 192.168.11.202
         active = 1
         weight = 2
     }
     server RealWeb3 {
         address = 192.168.11.203
         active = 1
         weight = 2
     }
     server RealWeb4 {
         address = 192.168.11.204
         active = 1
         weight = 1
     }
}
------------------------------------------------------------------

I created it using the Piranha GUI.  By the way, the GUI has yet
to ask me for a password when I connect, even though I tried
"/usr/sbin/piranha-passwd" and set the piranha user password
in /etc/shadow too.  I consider this a bug (or operator error,
please tell me what I am doing wrong), not a feature.

Unfortunately, I am not on site right now, and both LVS directors
are not talking on the network.  I had somebody "ifconfig lo:1 down"
but that has not helped (it did in other situations).  When I did
run "ipvsadm -l" on the directors, it seemed fine.  A look at the
logs showed that "nanny" was happily using ssh to run uptime
on the various boxes.  I expected nanny to run on the directors,
which it does, but I did not expect it to run on the realservers,
watching each other, which is also does (successfully, when network
connectivity allows).

We did try connecting to web service.  Our configuration requires
going through the firewall, which forwards all traffic for port
80 to the VIP (192.168.11.200).  We did get results - often we
were connected to the same server over and over.  Sometimes we
got behavior that seemed to be load balancing.  The sad thing
is that with two laptops side-by-side, one might see behavior
that might be load-balancing while the other seemed to get the
same page over and over.  Our testing method is to put a root
index.html on each realserver that shows the realserver's
hostname, and then connecting to the web page and hitting reload
repeatedly.

I get the feeling that we are just a few feet away from success
on this simple project, but the final pieces of the puzzle are
eluding me at the moment.  Help!

--
John Cronin


<Prev in Thread] Current Thread [Next in Thread>