hmm, I'm still having problems on my end. But at least
it's good to hear a success story so I know I'm not laboring in
vain.
OK folks, to reiterate the problem..
I have the LVS front end working (it rolls the IPs right)
but on the backend it's not rr at all. If I make changes to the ipchains and
ipmasqadm/ipvsadm rulsets it changes which server it hits but then it, once
again, gets stuck serving from that one machine only. I do not know what the
hell the problem is. I've tried working with Horms on this to a larger degree
and even he seems kind of stumped as to why it's doing what it's doing. I was
considering ripping everything out on the 2 LVS nodes and restarting from
scratch but that's going to be a no go due to time constraints.
All I know is I've been working on this for 3 weeks now
and can NOT get it going. (Kind of disheartening too when I see a news report
saying Red Hat has released a clustering server and, even after calling and
talking to them, I got nowhere with their help.)
We are currently running UltraMonkey after the fiasco with
Piranha. UM got further than piranha did to working right. (At least it could
assign IPs) The problem now appears to be the back end (load
balancing)
I have the following modules installed..
ip_masq_portfw, ip_vs_rr, ip_masq_mfw, softdog,
ip_masq_autofw
I am using kernel version 2.2.14-5.0.14.um.3smp which
comes with the UltraMonkey project at ultramonkey.sourceforge.net.
I am using the following rulesets for ipchains, ipmasqadm
and ipvsadm..
ipchains -A forward -s 192.168.1.0/24 -j MASQ ipchains
-A input -j ACCEPT -i eth1 ipchains -A output -j ACCEPT -i eth1 ipchains
-A input -j ACCEPT -p tcp -y -d 216.200.192.111 www ipchains -A input -j
ACCEPT -p tcp -y -d 216.200.192.111 domain ipchains -A input -j ACCEPT -p udp
-d 216.200.192.111 domain ipchains -A input -j ACCEPT -p tcp -y -d
216.200.192.111 ssh ipchains -A input -j ACCEPT -p udp -d 216.200.192.111
ssh ipchains -A input -j ACCEPT -p tcp -y -d 216.200.192.111
telnet
(Needed?) ipchains -A forward -s 192.168.1.0/24 -d
192.168.1.0/24 -j ACCEPT (Needed?) ipchains -A forward -s 216.200.192.0/24 -d
192.168.1.0/24 -j ACCEPT ipchains -M -S 7200 10 160 ipchains -I input -p
tcp -y -d 192.168.1.0/32 80 -m 1 ipmasqadm mfw -I -m 3 -r 192.168.1.12 80 -p
10 ipmasqadm mfw -I -m 2 -r 192.168.1.11 80 -p 10 ipmasqadm mfw -I -m 1 -r
192.168.1.10 80 -p 10 ipmasqadm autofw -A -r tcp 80 80 -h
192.168.1.12 ipmasqadm autofw -A -r tcp 80 80 -h 192.168.1.11 ipmasqadm
autofw -A -r tcp 80 80 -h 192.168.1.10 ipvsadm -A -t 216.200.192.111:80 -s
rr ipvsadm -a -t 216.200.192.111:80 -r 192.168.1.12 -m 1 ipvsadm -a -t
216.200.192.111:80 -r 192.168.1.11 -m 2 ipvsadm -a -t 216.200.192.111:80 -r
192.168.1.10 -m 3
The output of ipchains -L is as follows
[root@vs-00 /root]# ipchains -L Chain input (policy
ACCEPT): target prot opt
source
destination
ports - tcp
-y----
anywhere
192.168.1.0 any
-> www ACCEPT all ------
anywhere
anywhere
n/a ACCEPT tcp -y----
anywhere
www.qixo.org
any -> www ACCEPT tcp
-y----
anywhere
www.qixo.org
any -> domain ACCEPT udp
------
anywhere
www.qixo.org
any -> domain ACCEPT tcp
-y----
anywhere
www.qixo.org
any -> ssh ACCEPT udp
------
anywhere
www.qixo.org
any -> ssh ACCEPT tcp
-y----
anywhere
www.qixo.org
any -> telnet Chain forward (policy
ACCEPT): target prot opt
source
destination
ports MASQ all ------
192.168.1.0/24
anywhere
n/a ACCEPT all ------
192.168.1.0/24
192.168.1.0/24
n/a ACCEPT all ------
216.200.192.0/24
192.168.1.0/24 n/a Chain output
(policy ACCEPT): target prot
opt
source
destination
ports ACCEPT all ------
anywhere
anywhere
n/a [root@vs-00 /root]#
My /etc/ha.d/ha.cf file is as follows
[root@vs-00 /root]# cat /etc/ha.d/ha.cf
# # /etc/ha.d/ha.cf # # ha.cf file to configure
two nodes connected by ethernet (eth0) and # a null modem (/dev/ttyS0).
# # Based on sample ha.cf shipped with heartbeat # # Prepared:
April 2000 #
# # There are lots
of options in this file. All you have to have is a
set # of nodes listedJ {"node
...} # and one of {serial, udp, or
ppp-udp} # # # Note on
logging: # If any of debugfile, logfile
and logfacility are defined then they #
will be used. If debugfile and/or logfile are not defined
and # logfacility is defined then the
respective logging and debug # messages
will be loged to syslog. If logfacility is not
defined # then debugfile and logfile will
be used to log messges. If # logfacility
is not defined and debugfile and/or logfile are
not # defined then defaults will be used
for debugfile and logfile as # required
and messages will be sent there. # #
File to wirte debug messages to #debugfile
/var/log/ha-debug # # # File to
write other messages
to # #logfile
/var/log/ha-log # # # Facility to
use for syslog()/logger # #logfacility
local0 # # keepalive: how many seconds
between heartbeats # keepalive
3 # # deadtime:
seconds-to-declare-host-dead # deadtime
10 # hopfudge maximum hop count minus
number of nodes in config #hopfudge
1 # # serial serialportname
... serial /dev/ttyS0 # #
Only for serial ports. It applies to both PPP/UDP and "raw"
ports # # This means run PPP over
ports ttyS1 and ttyS2 # Their respective
IP addresses are as listed. # Note that I
enforce that these are local addresses. Other
addresses # are almost certainly a
mistake. #ppp-udp /dev/ttyS1
10.0.0.1 /dev/ttyS2 10.0.0.2 # # Baud
rate for both serial and ppp-udp ports... # baud
19200 # # What UDP port to use for udp
or ppp-udp communication? # udpport
694 # # What interfaces to heartbeat
over? # udp eth0 udp eth1 # #
Watchdog is the watchdog timer. If our own heart doesn't beat
for # a minute, then our machine will
reboot. # watchdog
/dev/watchdog # # Nice_failback sets
the behavior when performing a
failback: # # - if it's on, when the
primary node starts or comes back from
any # failure and the cluster
is already active, i.e. the
secondary # server performed
a failover, the primary stays quiet, acting as
a # secondary. This way
some operations like syncing disks can
be # easily
done. # - if it's off (default), the
primary node will always be the
primary, # whenever it's
powered on. # nice_failback off # # Note on logging: # If any of
debugfile, logfile and logfacility are defined then they will # be used. If
debugfile and/or logfile are not defined and logfacility # is defined then
the respective logging and debug messages will be loged # to syslog. If
logfacility is not defined then debugfile and logfile will # be used to log
messges. If logfacility is not defined and debugfile # and/or logfile are not
defined then defaults will be used for debugfile # and logfile as required
and messages will be sent there. # #
File to wirte debug messages to debugfile
/var/log/ha-debug # # # File to
write other messages to # logfile
/var/log/ha-log # # # Facility to
use for syslog()/logger # logfacility
local0 # # # Tell what machines are
in the cluster # node
nodename ... -- must match uname -n node
vs-00.qixo.org node vs-01.qixo.org [root@vs-00 /root]#
My /etc/ha.d/haresources is as follows.
[root@vs-00 /root]# cat
/etc/ha.d/haresources
#------------------------------------------------------------------- vs-00.qixo.org
IPaddr::216.200.192.111/24/eth0 IPaddr::192.168.1.254/24/eth1
ldirectord::ldirectord.cf
[root@vs-00 /root]#
My /etc/ha.d/conf/ldirectord.cf is as follows
[root@vs-00 /root]# cat /etc/ha.d/conf/ldirectord.cf
# # Sample ldirectord configuration file to configure a vitual # http
service on TCP port 80 for a single IP address with 2 real # servers using
network address translation (masquerading). # # Ldirectord will connect to
each real server once per second # and request /index.html. If the data
returned by the server # does not contain the string "Test Message" then
the # test fails and the real server will be taken out of the available #
pool. The real server will be added back into the pool once the # test
succeeds. If all real servers are removed from the pool then # localhost:80
is added to the pool as a fallback measure. # # Prepared: April
2000
timeout=3 checkinterval=1 fallback=192.168.1.12:80 virtual=216.200.192.111:80
real=192.168.1.12:80 masq
real=192.168.1.10:80 masq
real=192.168.1.11:80 masq
service=http
request="index.html"
receive="QIXO"
scheduler=rr
#persistent=600
protocol=tcp
#virtual=1 #
real=192.168.6.4:80 masq #
real=192.168.6.5:80 masq #
service=http #
request="index.html" #
receive="Test Page" #
scheduler=rr #
#persistent=600
[root@vs-00 /root]#
Right now we've had to throw an external IP on one of our
core web servers so we can develop.
I've tested to make sure that rr is NOT working by
creating a script to have lynx slam the site via the "fake" IP grepping for the
letters WS in the page. We put WS-00 and WS-01 into the html of 2 of the servers
and left the main box's page alone (no WS).
Here is the script I'm using.
pgpkeys@manowar:~[ 2:26PM]>cat bin/slamit
#!/bin/bash
pgpkeys@manowar:~[ 2:26PM]>
All requests come in and go to WS-00 **ONLY**, as
evidenced by the output from the slamit script..
pgpkeys@manowar:~[
2:26PM]>slamit
WS-00
WS-00
WS-00
WS-00
WS-00
WS-00
WS-00
WS-00
WS-00
WS-00
pgpkeys@manowar:~[ 2:30PM]>
Really fur-burning weird. I have no clue where to go
from here. I've of the mind that it's either
A) the kernel
modules are not doing their jobs
B) my rulsets are
messed BAD.
I have been hunting for some tool that will follow an
inbound packet through the rulesets telling me exactly which rules it's affected
by, the contents of that rule, which machines it's hitting exactly, and
the same thing for the return trip. I need a tool that gives me finite enough
control to follow that packet every step of the way.
--- David D.W.
Downey
RHCE, UNIX/Linux/Win 9x Administrator Linux Systems
Administrator Member OSWG, LPI, SAGE, HTML
Writers Guild QIXO,
Inc.
Certified Internet Security Specialist http://www.QIXO.com
W: (408) 514-6400 F: (408) 516-9090
|