Re: My LVS Server is on-line

To:	"Ryan Hulsker" <RHulsker@xxxxxxxxxxxxxxxxxxxxxxx>, <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	Re: My LVS Server is on-line
From:	"David D.W. Downey" <qixo@xxxxxxxxxxxxxx>
Date:	Tue, 11 Jul 2000 12:35:00 -0500

hmm, I'm still having problems on my end. But at least it's good to hear a success story so I know I'm not laboring in vain.

OK folks, to reiterate the problem..

I have the LVS front end working (it rolls the IPs right) but on the backend it's not rr at all. If I make changes to the ipchains and ipmasqadm/ipvsadm rulsets it changes which server it hits but then it, once again, gets stuck serving from that one machine only. I do not know what the hell the problem is. I've tried working with Horms on this to a larger degree and even he seems kind of stumped as to why it's doing what it's doing. I was considering ripping everything out on the 2 LVS nodes and restarting from scratch but that's going to be a no go due to time constraints.

All I know is I've been working on this for 3 weeks now and can NOT get it going. (Kind of disheartening too when I see a news report saying Red Hat has released a clustering server and, even after calling and talking to them, I got nowhere with their help.)

We are currently running UltraMonkey after the fiasco with Piranha. UM got further than piranha did to working right. (At least it could assign IPs) The problem now appears to be the back end (load balancing)

I have the following modules installed..

ip_masq_portfw, ip_vs_rr, ip_masq_mfw, softdog, ip_masq_autofw

I am using kernel version 2.2.14-5.0.14.um.3smp which comes with the UltraMonkey project at ultramonkey.sourceforge.net.

I am using the following rulesets for ipchains, ipmasqadm and ipvsadm..

ipchains -A forward -s 192.168.1.0/24 -j MASQ
ipchains -A input -j ACCEPT -i eth1
ipchains -A output -j ACCEPT -i eth1
ipchains -A input -j ACCEPT -p tcp -y -d 216.200.192.111 www
ipchains -A input -j ACCEPT -p tcp -y -d 216.200.192.111 domain
ipchains -A input -j ACCEPT -p udp -d 216.200.192.111 domain
ipchains -A input -j ACCEPT -p tcp -y -d 216.200.192.111 ssh
ipchains -A input -j ACCEPT -p udp -d 216.200.192.111 ssh
ipchains -A input -j ACCEPT -p tcp -y -d 216.200.192.111 telnet

(Needed?) ipchains -A forward -s 192.168.1.0/24 -d 192.168.1.0/24 -j ACCEPT
(Needed?) ipchains -A forward -s 216.200.192.0/24 -d 192.168.1.0/24 -j ACCEPT
ipchains -M -S 7200 10 160
ipchains -I input -p tcp -y -d 192.168.1.0/32 80 -m 1
ipmasqadm mfw -I -m 3 -r 192.168.1.12 80 -p 10
ipmasqadm mfw -I -m 2 -r 192.168.1.11 80 -p 10
ipmasqadm mfw -I -m 1 -r 192.168.1.10 80 -p 10
ipmasqadm autofw -A -r tcp 80 80 -h 192.168.1.12
ipmasqadm autofw -A -r tcp 80 80 -h 192.168.1.11
ipmasqadm autofw -A -r tcp 80 80 -h 192.168.1.10
ipvsadm -A -t 216.200.192.111:80 -s rr
ipvsadm -a -t 216.200.192.111:80 -r 192.168.1.12 -m 1
ipvsadm -a -t 216.200.192.111:80 -r 192.168.1.11 -m 2
ipvsadm -a -t 216.200.192.111:80 -r 192.168.1.10 -m 3

The output of ipchains -L is as follows

[root@vs-00 /root]# ipchains -L
Chain input (policy ACCEPT):
target     prot opt     source                destination           ports
-          tcp -y---- anywhere             192.168.1.0           any ->   www
ACCEPT     all ------ anywhere             anywhere              n/a
ACCEPT     tcp -y---- anywhere             www.qixo.org          any ->   www
ACCEPT     tcp -y---- anywhere             www.qixo.org          any ->   domain
ACCEPT     udp ------ anywhere             www.qixo.org          any ->   domain
ACCEPT     tcp -y---- anywhere             www.qixo.org          any ->   ssh
ACCEPT     udp ------ anywhere             www.qixo.org          any ->   ssh
ACCEPT     tcp -y---- anywhere             www.qixo.org          any ->   telnet
Chain forward (policy ACCEPT):
target     prot opt     source                destination           ports
MASQ       all ------ 192.168.1.0/24       anywhere              n/a
ACCEPT     all ------ 192.168.1.0/24       192.168.1.0/24        n/a
ACCEPT     all ------ 216.200.192.0/24     192.168.1.0/24        n/a
Chain output (policy ACCEPT):
target     prot opt     source                destination           ports
ACCEPT     all ------ anywhere             anywhere              n/a
[root@vs-00 /root]#

My /etc/ha.d/ha.cf file is as follows

[root@vs-00 /root]# cat /etc/ha.d/ha.cf

#
# /etc/ha.d/ha.cf
#
# ha.cf file to configure two nodes connected by ethernet (eth0) and
# a null modem (/dev/ttyS0).
#
# Based on sample ha.cf shipped with heartbeat
#
# Prepared: April 2000
#

#
#       There are lots of options in this file. All you have to have is a set
#       of nodes listedJ {"node ...}
#       and one of {serial, udp, or ppp-udp}
#
#
#       Note on logging:
#       If any of debugfile, logfile and logfacility are defined then they
#       will be used. If debugfile and/or logfile are not defined and
#       logfacility is defined then the respective logging and debug
#       messages will be loged to syslog. If logfacility is not defined
#       then debugfile and logfile will be used to log messges. If
#       logfacility is not defined and debugfile and/or logfile are not
#       defined then defaults will be used for debugfile and logfile as
#       required and messages will be sent there.
#
#       File to wirte debug messages to
#debugfile /var/log/ha-debug
#
#
#       File to write other messages to
#
#logfile        /var/log/ha-log
#
#
#       Facility to use for syslog()/logger
#
#logfacility    local0
#
#       keepalive: how many seconds between heartbeats
#
keepalive 3
#
#       deadtime: seconds-to-declare-host-dead
#
deadtime 10
#       hopfudge maximum hop count minus number of nodes in config
#hopfudge 1
#
#       serial serialportname ...
serial /dev/ttyS0
#
#       Only for serial ports. It applies to both PPP/UDP and "raw" ports
#
#       This means run PPP over ports ttyS1 and ttyS2
#       Their respective IP addresses are as listed.
#       Note that I enforce that these are local addresses. Other addresses
#       are almost certainly a mistake.
#ppp-udp        /dev/ttyS1 10.0.0.1 /dev/ttyS2 10.0.0.2
#
#       Baud rate for both serial and ppp-udp ports...
#
baud    19200
#
#       What UDP port to use for udp or ppp-udp communication?
#
udpport 694
#
#       What interfaces to heartbeat over?
#
udp eth0
udp eth1
#
#       Watchdog is the watchdog timer. If our own heart doesn't beat for
#       a minute, then our machine will reboot.
#
watchdog /dev/watchdog
#
#       Nice_failback sets the behavior when performing a failback:
#
#       - if it's on, when the primary node starts or comes back from any
#         failure and the cluster is already active, i.e. the secondary
#         server performed a failover, the primary stays quiet, acting as a
#         secondary. This way some operations like syncing disks can be
#         easily done.
#       - if it's off (default), the primary node will always be the primary,
#         whenever it's powered on.
#
nice_failback off
#
# Note on logging:
# If any of debugfile, logfile and logfacility are defined then they will
# be used. If debugfile and/or logfile are not defined and logfacility
# is defined then the respective logging and debug messages will be loged
# to syslog. If logfacility is not defined then debugfile and logfile will
# be used to log messges. If logfacility is not defined and debugfile
# and/or logfile are not defined then defaults will be used for debugfile
# and logfile as required and messages will be sent there.
#
#       File to wirte debug messages to
debugfile /var/log/ha-debug
#
#
#       File to write other messages to
#
logfile /var/log/ha-log
#
#
#       Facility to use for syslog()/logger
#
logfacility     local0
#
#
#       Tell what machines are in the cluster
#       node    nodename ...    -- must match uname -n
node    vs-00.qixo.org
node    vs-01.qixo.org
[root@vs-00 /root]#

My /etc/ha.d/haresources is as follows.

[root@vs-00 /root]# cat /etc/ha.d/haresources

#-------------------------------------------------------------------
vs-00.qixo.org IPaddr::216.200.192.111/24/eth0 IPaddr::192.168.1.254/24/eth1 ldirectord::ldirectord.cf

[root@vs-00 /root]#

My /etc/ha.d/conf/ldirectord.cf is as follows

[root@vs-00 /root]# cat /etc/ha.d/conf/ldirectord.cf
#
# Sample ldirectord configuration file to configure a vitual
# http service on TCP port 80 for a single IP address with 2 real
# servers using network address translation (masquerading).
#
# Ldirectord will connect to each real server once per second
# and request /index.html. If the data returned by the server
# does not contain the string "Test Message" then the
# test fails and the real server will be taken out of the available
# pool. The real server will be added back into the pool once the
# test succeeds. If all real servers are removed from the pool then
# localhost:80 is added to the pool as a fallback measure.
#
# Prepared: April 2000

timeout=3
checkinterval=1
fallback=192.168.1.12:80
virtual=216.200.192.111:80
        real=192.168.1.12:80 masq
        real=192.168.1.10:80 masq
        real=192.168.1.11:80 masq
        service=http
        request="index.html"
        receive="QIXO"
        scheduler=rr
        #persistent=600
        protocol=tcp

#virtual=1
#        real=192.168.6.4:80 masq
#        real=192.168.6.5:80 masq
#        service=http
#        request="index.html"
#        receive="Test Page"
#        scheduler=rr
#        #persistent=600

[root@vs-00 /root]#

Right now we've had to throw an external IP on one of our core web servers so we can develop.

I've tested to make sure that rr is NOT working by creating a script to have lynx slam the site via the "fake" IP grepping for the letters WS in the page. We put WS-00 and WS-01 into the html of 2 of the servers and left the main box's page alone (no WS).

Here is the script I'm using.

pgpkeys@manowar:~[ 2:26PM]>cat bin/slamit
#!/bin/bash

while (true); do lynx -reload -dump http://216.200.192.111:80/index.html | fgrep WS ; done

pgpkeys@manowar:~[ 2:26PM]>

All requests come in and go to WS-00 **ONLY**, as evidenced by the output from the slamit script..

pgpkeys@manowar:~[ 2:26PM]>slamit
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00
                                     WS-00

pgpkeys@manowar:~[ 2:30PM]>

Really fur-burning weird. I have no clue where to go from here. I've of the mind that it's either

A) the kernel modules are not doing their jobs

B) my rulsets are messed BAD.

I have been hunting for some tool that will follow an inbound packet through the rulesets telling me exactly which rules it's affected by, the contents of that rule, which machines it's hitting exactly, and the same thing for the return trip. I need a tool that gives me finite enough control to follow that packet every step of the way.

---
David D.W. Downey                    RHCE, UNIX/Linux/Win 9x Administrator
Linux Systems Administrator       Member OSWG, LPI, SAGE, HTML Writers Guild
QIXO, Inc.                                  Certified Internet Security Specialist
http://www.QIXO.com                  W: (408) 514-6400   F: (408) 516-9090

<Prev in Thread]	Current Thread	[Next in Thread>
My LVS Server is on-line, Ryan Hulsker Re: My LVS Server is on-line, Joseph Mack Re: My LVS Server is on-line, David D.W. Downey <= Message not available Re: My LVS Server is on-line, SchmausJ RE: My LVS Server is on-line, Ryan Hulsker RE: My LVS Server is on-line, Joseph Mack

Previous by Date:	Re: will this work (direct routing)?, tc lewis
Next by Date:	Re: My LVS Server is on-line, SchmausJ
Previous by Thread:	Re: My LVS Server is on-line, Joseph Mack
Next by Thread:	Re: My LVS Server is on-line, SchmausJ
Indexes:	[Date] [Thread] [Top] [All Lists]