So i've debugged quite a bit of this, and have simplified down the
issue to just HTTP for now. I have it working within the cluster now,
so the director is communicating properly with the apache real
servers, I can bring the node on or offline by moving the file that it
is requesting. And it is finally properly sending test packets every
2 seconds as configured.
But . . .
I still can't get a connection to the VIP via HTTP request from an
outside box. The director is representing the VIP properly. I am
using DR (gate) packet forewarding, and representing the VIP on a lo:0
interface on the real server. This is working, because if i take lo:0
down, the node goes offline on the cluster.
I have done some debugging on this and narrowed the problem down a bit:
1. The real server is up and running. I can make a HTTP request to
the RIP of the real server and get the desired response.
2. The packets are never getting to the real server. I figured this
out by putting up a firewall restricting requests from my IP on port
80 on the real server. The error was still "connection refused" not
the timeout error that should have occured. (the director was not
blocked by the firewall, only the IP i was requesting from.)
Conversly, when i put a firewall up on port 80 to the same IP on the
director box it fails with the expected time out error, not the
"connection refused error."
3. My first instict at this point is to make sure that forewarding is
set up properly, checked my sysctl.conf file and sure enough:
net.ipv4.ip_forward = 1
is set properly. I checked the sysctl.conf on the real server too,
and everything apears to be in order, but that isn't the concern yet
as when I firewalled that server it should have timed out regardless
of the sysctl settings.
Next steps:
My next idea is to install apache on the director to see if its trying
to handle HTTP requests to the VIP by itself, and not forewarding it,
but this is a bit of a hassle, and I don't know if it would show me
anything usefull.
Given all that does anyone have any thoughts? Have a similar error
they've championed?
many thanks in advance.
On 8/29/06, Matthew Story <matthewstory@xxxxxxxxx> wrote:
So i've got a better handle on the HTTP error now. Firstly the set up
is two dual core AMD Athlon 64 servers are serving as the director
boxes. I'm running ultramonkey 3 on each of these. All of the
servers are at a data center and are running directly on the WAN.
What seems to be happening is that when i start heartbeat and
ldirector on one of the directors it makes an HTTP request to the
Apache real servers. Though when i do a tcpdump on the apache real
servers it only seems to make a request the first time ipvsadm is run
on the director box. After this however it makes no HTTP requests to
the machine at all, and though the box appears to be clustered when i
do an ipvsadm -L -n, the connection is refused. I have no firewalls
running right now, so that is not the issue. Here is the section of
the ldirectord.cf file pertaining to the HTTP services:
virtual=64.34.209.34:80
fallback=127.0.0.1:80
real=64.34.174.215:80 gate
real=64.34.180.165:80 gate
service=http
request="/update/index.html"
receive="Test Page"
scheduler=rr
#persistent=600
protocol=tcp
checktype=negotiate
As you can see both of the webservers are on different subnets than
each other, and also on a different subnet than both of the
ultramonkey directors, though the director boxes are on the same
subnet (170) and share a common default gateway.
the ha.cf file sets up a ucast between the two servers, and the names
are properly configured using the uname -n output as the names of the
two hosts.
The haresources file looks like this:
Server06.example.com \
ldirectord::ldirectord.cf \
LVSSyncDaemonSwap::master \
IPaddr2::64.34.209.34/24/eth0/64.34.209.255 \
IPaddr2::64.34.183.97/24/eth0/64.34.209.255 \
IPaddr2::64.34.209.50/24/eth0/64.34.209.255
Any ideas as to why it is behaving in this weird way?
--
regards,
matt
--
regards,
matt
|