I don't have the co-operation of our uplinks so
I fake the
BGP and with a few scripts it also handles failover to one site. My
employer's site www.worldhosting.org is handled this way.
Explain in more detail how you got this working and the scripts used.
First you have to run a patched version of bind9 (I have debian packages
for anyone who needs them) - get the source from
http://www.supersparrow.org/
Or add the following to your /etc/apt/sources.list for my supersparrow
and patched bind9 packages
deb http://debian.worldhosting.org/supersparrow sarge main
(woody packages also available, replace sarge with woody)
Create in your bind config something like:
zone "www.worldhosting.org" {
type master;
database "ss --host 127.0.0.1 --route_server ssrs --password
XXXX --debug --peer 64600=210.18.215.100,64601=193.173.27.8 --self
193.173.27.8 --port 7777 --result_count 1 --soa_host
ns.worldhosting.org. --soa_email hostmaster.worldhosting.org. --ns
ns.worldhosting.org. --ns ns.au.worldhosting.org. --ttl 7 --ns_ttl 60";
};
This snippet sets the www to use 210.18.215.100 if the peer is set to
64600 and 193.173.27.8 if the peer is 64601, the ttl for the A record is
60 seconds and the self is the default response for this nameserver (on
the secondary nameserver make this the other address). Set the password
to the same as in /etc/supersparrow.conf
Create three files to describe the routes in normal and failed modes. In
our setup:
$ cat ssrs.routes.AUonly
0.0.0.0/0 64600
$ cat ssrs.routes.NLonly
0.0.0.0/0 64601
$ head ssrs.routes.normal
128.184.0.0/16 64600
128.250.0.0/16 64600
129.78.0.0/16 64600
129.94.0.0/16 64600
129.96.0.0/16 64600
129.127.0.0/16 64600
129.180.0.0/16 64600
130.56.0.0/16 64600
130.95.0.0/16 64600
130.102.0.0/16 64600
The ssrs.routes.normal file contains all the subnets you wish to force
to use the respective peer.
Create a script that does a http test periodically (we do it every 5
minutes as the web servers don't go down frequently) if both sites work,
symlink the file to /etc/ssrs.routes. If only one works, symlink the
file for the site that works (ie AUonly or NLonly) to /etc/ssrs.routes.
Then check to see if the config has changed and if so, restart
supersparrow. I use the check_http script from the nagios package to do
the test. See below for my script:
----------------
#!/bin/sh
PATH=/sbin:$PATH
# Supersparrow results
SSNORMAL=0
SSAUONLY=1
SSNLONLY=2
AUIP=210.18.215.100
NLIP=193.173.27.8
AUW=0
NLW=0
#ping -c 2 $AUIP >/dev/null && AUP=1
#ping -c 2 $NLIP >/dev/null && NLP=1
/sbin/check_http -H $NLIP -u /index.html -p 80 -t 20 >/dev/null && NLW=1
/sbin/check_http -H $AUIP -u /index.html -p 80 -t 20 >/dev/null && AUW=1
# Do the tests again in case there was a hiccup
/sbin/check_http -H $NLIP
/sbin/check_http -H $AUIP -u /index.html -p 80 -t 20 >/dev/null && AUW=1
if [ $NLW -eq 1 ]
then
if [ $AUW -eq 1 ]
then
OPMODE="Normal Operation"
SPARROW=$SSNORMAL
else
OPMODE="NL running but AU down"
SPARROW=$SSNLONLY
fi
else
if [ $AUW -eq 1 ]
then
OPMODE="AU running but NL down"
SPARROW=$SSAUONLY
else
OPMODE="AU and NL down"
SPARROW=$SSNORMAL
fi
fi
if [ $SPARROW -eq $SSNORMAL ]
then
ln -sf /var/named/supersparrow/ssrs.routes.normal /etc/ssrs.routes
fi
if [ $SPARROW -eq $SSAUONLY ]
then
ln -sf /var/named/supersparrow/ssrs.routes.AUonly /etc/ssrs.routes
fi
if [ $SPARROW -eq $SSNLONLY ]
then
ln -sf /var/named/supersparrow/ssrs.routes.NLonly /etc/ssrs.routes
fi
md5sum -c /etc/ssrs.routes.md5sum &>/dev/null && exit
/etc/init.d/supersparrow reload
md5sum /etc/ssrs.routes > /etc/ssrs.routes.md5sum
echo Supersparrow: $OPMODE
-------------
With a DNS server at each location, if there is a international routing
problem that prohibits them communicating with each other, then the
server will set all responses to point the www at the local hosting
location. Then any sites on the net that can get to that DNS server will
use the www that is there (and therefore, high chances of working)
Please let me know if there's anything I've missed. This might be worth
going into some kind of HOWTO somewhere.
Regards,
Josh.
|